CS Summer Research Projects - 2021

The CS+, Data+, and Code+ undergraduate summer programs held an online Plus Summer 2021 Program Expo to showcase student projects leveraging big data, mobile app and web development, and computer science on August 6, 2021. Student teams — over 150 students in all — presented their projects. CS+ presentation videos and project descriptions are available below, or view all presentation videos in the YouTube playlist: CS+ Undergraduate Summer 2021 Research Projects.

Leads: Cynthia Rudin, Sudeepa Roy, Alex Volfovsky

Participating Students:

Sravan Parim
I am a first-year from Connecticut planning to pursue a double major in Computer Science and Economics. After my time at Duke, I intend to go to graduate school and pursue a career that involves researching Artificial Intelligence and shaping technology policy with the goal of reducing economic disparity. Outside of class, I am involved with the Duke International Relations Association and Duke Chamber Ensembles. In my free time I also enjoy reading, meditating, and running. Through CS+ I look forward to learning about machine learning and applying it to estimate causal effects.

Haoning Jiang
Hi! I'm Haoning, a rising junior majoring in Computer Science with a concentration in AI and Machine Learning. I'm very grateful for the opportunity to work on this project.

Description: The Almost Matching Exactly lab designs software tools for matching in causal inference. However, our code and interfaces are not perfect, and we would love to have users try out the code on applications. A self-motivated team of students can help us to improve, and at the same time, learn how to troubleshoot and apply this type of code to real problems.

Outcomes: This is up to the student, but we expect the students to be troubleshooting code, writing code, and designing cool applications.

Skills: Coding, communication, machine learning, and causal inference.

Lead: Debmalya Panigrahi

Participating Students:

Feng Cong
I'm Feng Cong, an international student from Singapore. I'm a rising junior majoring in Mathematics and Computer Science. I'm interested in algorithms, optimization, and problem solving, and love playing Minesweeper and Sudoku.

William He
I am from Houston majoring in mathematics and computer science. I enjoy studying many topics in math, especially algebra and algorithms. After graduating from Duke, I hope to pursue a Ph.D. in math or computer science.

Grace Tian
I'm from San Jose, CA, and I'm a rising junior studying Computer Science and Mathematics. I'm interested in Computer Science research and plan on applying to graduate school after Duke.

Annie Wang
I am from Cary, NC and majoring in computer science and math. After Duke, I am interested in pursuing graduate school.

Description: Recent progress in both combinatorial and algebraic techniques in graph algorithms has given us hope that some of the hardest, and longest standing, challenges in the field might finally be within our reach. This includes, for a graph on n vertices and m edges, the following problems:

  1. find all pairs min-cuts in an undirected graph faster than n-1 max-flows
  2. find a global min-cut of a directed graph in o(mn) time
  3. find the reliability of a graph in near-linear time
  4. find the min-cut in a hypergraph in near-linear time

All these problems have been open for at least two decades, some of them even for more than five decades. But, for each of these problems, there is increasing evidence that we are close to finally solving them. For (a.), Li and Panigrahi recently showed that the problem can be solved if one allows approximations. For (b.), Panigrahi and others recently gave an o(mn) algorithm for vertex connectivity, which takes us halfway to directed connectivity. For (c.), Karger recently showed that the problem is amenable to techniques from randomized connectivity algorithms, and near-linear time algorithms are already known for this category. For (d.), Panigrahi and Zhang recently solved the problem for hypergraphs of fixed rank, thereby extending this beyond graphs for the first time. This project will explore one of these (or related) problems. The project will be theoretical in nature.

Outcomes: If the project leads to a new result, then it will be published in a research paper. Students will also learn about the process of theoretical research.

Skills: Design and analysis of algorithms, formal proofs, and reading theoretical computer science research papers.

Lead: Kristin Stephens-Martinez

Participating Students:

Brian Janger
My name is Brian Janger and I'm from Houston, Texas. I'm a rising sophomore here at Duke and I plan to complete the IDM in Computer Science and Statistics and potentially minor in Physics. I'm currently an executive for the newly founded Sports Analytics Club and I am interested in creating software and performing analytics for a sports organization or a Big Tech company.

Hao Xu
Belle Xu is a sophomore from Vancouver, Canada who is studying Computer Science and Statistical Science. Passionate about both software engineering and data science, Belle enjoys challenging herself by exploring new types of algorithms to solve nuanced data-processing and data standardization problems. After college, she is excited to work on more backend development projects with her like-minded peers.

Manith Luthria
I'm a sophomore from Houston, TX. I'm an ECE+CS major. After Duke, I'd like to continue my computer science career. I hope this experience with CS+ will help me learn more about CS research.

Sona Suryadevara
My name is Sona Suryadevara, and I am a rising senior with an Interdepartmental major in Computer Science and Neuroscience. I am excited to join the CS 101 Reviewer App project and use my interdisciplinary skills to further develop the app! After I graduate from Duke, I plan to either go to medical school or become a full-time software engineer.

Description: CS101 Reviewer App is a web application that provides an online quiz tool to students enrolled in CS101 at Duke University. It enables students to quiz themselves on CS101 topics with carefully designed questions that check for specific misunderstandings of the content. A recent feature includes an autogenerated quiz that chooses what topics to focus on for the student based on their past performance. This project has the potential to go in different directions. We have ideas to improve the app, such as adding different question types, improving the algorithm that generates the auto-generated quiz, or adding automated hints based on the student's wrong answers. We also have data analysis needs that will inform future features.

Outcomes: If students want to improve the app, they will modify the existing app codebase and produce new code. If they want to do data analysis, they will need to write code to clean and analyze the data and produce a small report describing what they did and explaining their results.

Skills: Web app development for a Python app OR data analysis in either Python or R.

Leads: Kamesh Munagala and Brandon Fain

Participating Students:

Xingyu Zhu
I am a rising senior from Beijing, China. I am double majoring in mathematics and computer science, and I am broadly interested in the theoretical aspects of computer science, in particular, theoretical machine learning and algorithms. After graduation, I plan to continue pursuing a PhD in related fields, and hopefully stay in academia. Academics unrelated: I am a amateur guitarist and I love riding road bikes.

Zeyu Shen
I'm a first year student at Duke University, intending to major in mathematics and computer science, and minoring in either economics or statistics. My main interest and expertise lies in applying mathematics and programming for problem solving. I enjoy brain teasers of any kind and seeking innovative solutions to them.

George Wang
I am a rising junior majoring in computer science and mathematics. I am broadly interested in algorithms, the theory and applications of blockchain technology, and machine learning.

Description: Fairness is an emerging concept within algorithm design. This project will consider settings where multiple participants use the solution to a certain optimization routine, and this solution provides them with different utility. The question we ask is how should these routines be designed so that different demographic groups obtain comparable utility. Though this question sounds abstract, we will work with well-defined notions of fairness, and well-defined optimization problems. The research will be largely theoretical and analytical in nature, but there will be opportunity to test out the resulting procedures on datasets.

Outcomes: Algorithmic insights and a research paper.

Skills: Background in design and analysis of algorithms. Please list algorithms and related courses you have taken.

Lead: Rong Ge

Participating Students:

Cindy Weng
Hello! I'm a junior from Florida studying computer science and statistics. I'm interested in the intersection of these areas and how we can use quantitative methods to understand and learn from data, as well as how our findings may connect to real-world phenomenon.

Tony Wu
Hi! I'm Tony Wu, a rising sophomore at Duke. I intend to double major in Computer Science and Math. I'm very interested in Machine Learning and big data, and I hope to work as a Machine Learning Engineer in the future, combining the cutting-edge research from academia with industrial scenarios. I'm excited about working with Prof Ge this summer and exploring self-supervised learning.

Zeping Luo
Hi! I am Danny, a rising junior majoring in CS/Stats. I am from Shenyang, China. I am interested in the intersection of Computer Science and Statistics, and am excited to explore the theoretical aspects of ML and understand the "why" behind model performance during this summer research project.

Description: Recently self-supervised learning has become a popular way to do unsupervised learning (learning without labels). In self-supervised learning, the algorithm will hide some information from the input and try to predict the hidden information. Empirically, such learning algorithms are successful in many domains such as natural language processing and image understanding.

Traditionally, unsupervised learning problems are often solved using latent variable models. The goal of this project is to try to understand why self-supervised learning might have a better performance.

Outcomes: The project will start by experimenting self-supervised learning ideas on data generated for some traditional latent variable models, such as HMM (Hidden Markov Model) or topic models. Then the goal is to systematically change the setting to find scenarios where self-supervised learning can outperform traditional latent variable models, and understand why.

Skills: Math: probabilities, calculus, willingness to learn new things. Machine learning: Used or willing to learn standard deep learning packages (e.g. pytorch).

Classifying Vaccine Misinformation in Text with students Dev Seth and Aakash Kothapally:

Recommending Interventions for Vaccine Misinformation with students Isa Mellody and James Liao:

Leads: Bhuwan Dhingra (lead), Ashwin Machanavajjhala, Jun Yang
Contributor: Lavanya Vasudevan

Participating Students:

Aakash Kothapally
Hi everyone! I am a rising sophomore from Cary, North Carolina, and I plan on majoring in computer science and statistics at Duke. I'm interested in applying machine learning methods to real-world problems, and I am excited to do that this summer.

Dev Seth
I'm a junior from Indore, India. I am double majoring in Computer Science and Philosophy, with a concentration in AI and Machine Learning. After I graduate, I plan to pursue a PhD and research the problem of intelligence--what it is, how it works, and how we can recreate it artificially.

Isa Mellody
Isa Mellody (class of 2024) plans to major in Computer Science with possible minors in Gender Sexuality and Feminist Studies and Theater Studies. She hails from New York City, but is in love with the Durham area. She hopes to use her computer science knowledge to think critically and solve the problems of inequity in the country.

Shuaichen Liao
My name is James and I am a first-year student studying cs, math and linguistics. I am originally from Toronto, Canada. I am interested in exploring the intersection between technology and language.

Description: Misinformation about vaccines has led to vaccination hesitancy, which is listed by WHO as a TOP-10 threat to global health. This project aims at developing automated tools that assist health workers and the public in dispelling myths about vaccines. A key observation is that for interventions to be effective, they must address the individuals’ specific concerns and be perceived as credible, which means they must be highly contextualized and personalized.

To help pinpoint the specific concerns, we have created a taxonomy of common misconceptions about vaccines, collected a corpus of articles containing vaccine misinformation, and worked on developing techniques for labeling articles with specific misconceptions. We plan to curate a corpus of intervention articles that help dispel specific misconceptions and are also diverse enough to appeal to individuals with different backgrounds. Putting these techniques together, our ultimate goal is to develop tools/apps that can be used by health workers or deployed alongside web/social media platforms to combat vaccine misinformation.

One specific aim of this summer will be to develop NLP techniques for identifying different categories of vaccine misinformation from text. A secondary aim will be to apply these techniques to a corpus of articles and social media posts and develop visualizations which aid in understanding how vaccine misinformation evolves over time and with the introduction of new vaccines such as for COVID.

Outcomes: Given a small amount of labeled data and a corpus of articles containing vaccine misinformation, we expect students to train multiple machine learning models that classify sentences, paragraphs or whole articles into our taxonomy of common misconceptions about vaccines. Specifically the students will be expected to adapt existing large-scale language models such as BERT for the task. Using the best model they will then develop visualizations and tools which help understand how the misconceptions change over time, both in our corpus and separate collection of social media data. Students will complete a research report on their experiments and also produce an extensible codebase which will aid further research after the summer. If appropriate, the report may also be submitted as a research paper to a conference.

Skills:

  • Ability to survey papers on machine learning and natural language processing
  • Running machine learning models in Python, both off-the-shelf implementations and with minor modifications
  • Data visualization and preparation in Python

Leads: Jun Yang, Sudeepa Roy, Kristin Stephens-Martinez

Participating Students:

James Lin

Allen Pan
My name is Allen and I will be a sophomore this coming Fall semester. I am studying computer science and hope to become a software developer after graduating. I love music, whether singing, playing the guitar, or performing for others! I also enjoy working out and playing basketball. I can't wait for CS+ this summer and the opportunity to create something amazing!

Zachary Zheng
I'm Zach and I'm a freshman from Chapel Hill, NC interested in majoring in computer science and economics. After Duke, I would be interested in pursuing a career in data science as well as software engineering or AI/ML. Other than that, I enjoy playing tennis, working out, practicing guitar, and baking in my free time.

Description: The goal of our project is to create an interactive debugger called I-Rex for SQL, which is a ubiquitous query language for accessing and modifying data stored in relational databases. SQL can quickly get complex in practice and it is a challenge for novice to learn and debug. I-Rex allows users to interactively “trace” through highly complex SQL queries (e.g., those involving aggregation, nesting, and correlation), understand how they execute, and debug wrong queries.

As the need for data manipulation and analysis becomes ever more important to more people, tools like I-Rex are sorely needed. We plan to deploy I-Rex in our courses (CompSci 216/316/516) in Fall 2021. We are looking for help to improve the backend so I-Rex supports all of SQL and to make it robust. We are also looking for help on the frontend to improve both usability and effectiveness. Finally, we are also interested in anyone who wants to help evaluate how well I-Rex helps novices learn relational querying.

Outcomes: The desired deliverables include a fully working I-Rex system and a clean codebase with proper documentation. If students make progress on related research problems, there are opportunities for writing research/demonstration papers.

Skills: Knowledge/experience with at least one of following areas; must be able to learn quickly as needed:

  • SQL (CompSci 316 or CompSci 516 would suffice) and Python/Java programming
  • Frontend design and implementation (e.g., JavaScript, Web frameworks like Flask, Apache)

Lead: Alberto Bartesaghi

Participating Students:

Flora Shi
Hi, I am Flora. I am from China and currently majoring in computer science and statistics. I want to go to grad school after Duke and imagine myself being a data scientist in the future.

Lucy Zhang
I'm a BME major from NJ and my interests lie primarily in regenerative medicine. After Duke, I hope to pursue an PhD in biomaterials or biomedical engineering and integrate what I learn in undergrad into my research and career.

Neelam Runton
My name's Neel Runton, and I'm an Electrical and Computer Engineering major from Cary, NC. I'm currently interested in machine learning, specifically computer vision, and after Duke I'd like to work in the overlap between machine learning and computer hardware.

Description: Cryogenic electron microscopes – or cryo-EM for short – allow researchers to peer at the microscopic shape of cellular proteins like never before. These machines blast proteins with a 300,000-volt beam of electrons so that highly sensitive detectors underneath can tease out their shapes based on the interaction that occurs. Being able to “see” proteins – life’s crucial building materials – can help determine how they work. Recognizing protein structure and function is essential for scientists trying to design better drugs to tackle some the world’s most devastating diseases, including HIV, cancer, COVID-19 and Alzheimer’s disease. A 300,000-volt electron beam is, however, extremely damaging to the proteins it is trying to image. To help protect the samples in the machine, researchers cryogenically freeze them to help maintain their integrity and use very low electron doses to prevent structural damage which results in extremely noisy images.

An emerging modality of cryo-EM called cryo-electron tomography (cryo-ET) uses computerized tomography principles to provide an accurate representation of the 3D molecular architecture of entire cells. The mining of the rich information contained in the native cellular environment is hindered by the crowded nature of cells populated by many different molecular species. The accurate detection of individual molecules in 3D is a critical step towards allowing the visualization of these molecular machines at high-resolution. Motivated by recent advances in deep neural network approaches for object detection in natural images and autonomous navigation, this project seeks to apply these methods to detect the position of macromolecules within 3D images of frozen hydrated cells with the ultimate goal of understanding cellular function and disease at the molecular level.

Outcomes: As part of this project, students will write computer code that will take as input 3D volumes of cells and automatically detect the location of multiple molecular species so they can later be extracted and used for high-resolution 3D visualization. Students will carry out the development in a dedicated high-performance computing (HPC) environment and at the end of the project will write a research paper to describe their approach and present results obtained on real datasets.

Skills: Knowledge of Python and background or interest in deep learning, image processing or computer vision.

Lead: Xiaowei Yang

Participating Students:

Vineel Vanam
My name is Vineel Vanam. I am a Sophomore CS student from Charlotte, NC. After Duke, I am interested in Grad School but I'd want to work for a while first. I'm interested in Software Engineering, specifically back-end design.

Dominic Ritchey
I’m a rising sophomore from the Chicagoland area with plans to major in computer science and biomedical engineering. I enjoy studying molecular computing and software engineering. After graduating, I plan on working for a tech company and then pursuing a PhD.

Description: As global Internet traffic grows, more and more content networks depend on IP anycast to serve their global requests from multiple content caches. Unlike DNS-based content load balancing, the anycast network distributes clients' requests at the mercy of the inter-domain routing protocol Border Gateway Protocol (BGP)[3]. Previous work measured the real performance and benefits of the anycast network and observed highly skewed load distribution and sub-optimal load distribution[1,2].

In order to understand what causes the inefficiency of IP anycast, we propose to measure to what extent Network Providers optimize the anycast network in the wild. Unlike previous anycast measurement projects, which focus on application-level performance, we focus on mining the control plane, i.e., BGP prefix configuration parameters of various routers for different anycast service providers.

Outcomes: The ideal deliverables include a project write-up that can lead to a publication at a high-quality networking conference, including but not limited to Internet Measurement Conference, ACM SIGCOMM, and USENIX NSDI.

Skills Required:

  1. Familiarity with common Linux commands;
  2. Familiarity with Python programming and requests package; and
  3. Familiarity with the data processing in Python, such as regex, and pandas.

Any knowledge of BGP and inter-AS routing is preferred; and any previous experience in network measurement is preferred. Students who took a previous offering of CS356 should have sufficient background knowledge to participate in this project.

Decentralized Finance: Blockchain and Cryptocurrencies with students Dylan Paul, Oum Lahade, Malika Rawal, and Rhys Banerjee:

ICy Demo, A Decentralized Lending Protocol for the Internet Computer with students Malika Rawal and Dylan Paul:

Co-Leads : Luyao Zhang (Assistant Professor of Economics at DKU), Kartik Nayak, Yulin Liu, and Fan Zhang

Participating Students:

Dylan Paul
Hi I am Dylan Paul and I am a sophomore from New York majoring in Computer Science with a concentration in AI/ML as well as minoring in Finance. I am very interested in Decentralized Finance as well as machine learning and I hope to work in one of these fields after Duke.

Malika Rawal
Hi! My name is Malika Rawal. I'm from Charlotte, NC, and I'm majoring in Economics with a concentration in Finance, minoring in Computer Science, and doing the Markets & Management Certificate. I am interested in fintech, investment strategy, and product management. For fun, I like to do anything outdoors, hang out with friends, and take my new puppy on walks.

Oum Lahade
I'm a freshman, from Morrisville, NC, studying Electrical & Computer Engineering and Mathematics. I'm primarily interested in blockchain and the intersection of finance & technology, and after Duke I'd like to work at the cross-section of these fields.

Urjit Banerjee
I live in Charlotte, North Carolina and I am currently pursuing a major in Computer Science with a concentration in AI and Machine Learning, as well as a minor in Math. After I graduate from Duke, I am interested in studying AI and Machine Learning at Graduate School. Outside of school, I enjoy art and watching movies!

Description: According to the World Bank’s Global Findex data, 1.7 billion adults remain unbanked. Financial exclusion is a global issue. Besides, financial services, such as loans, insurance, derivatives, and fundraising, are controlled by financial intermediaries. These financial service providers often lack transparency and charge high fees.Distributed Ledger Technology (DLT), often known as blockchain, empowers smart contracts to automate enforceable agreements, enable financial inclusion, and cut out middlemen. Financial services based on the DLT have been gaining momentum since 2015. The total value locked in Decentralized Finance (DeFi) smart contracts has quickly increased from a few million in 2017 to more than 10 billion in 2020. Decentralized platforms such as Ethereum and Polkadot are attracting more and more developers and users. However, all the existing decentralized public blockchains are constrained by long finality time, low scalability, and low throughput. By contrast, the Internet Computer, developed by DFINITY Foundation, provides a tamperproof, scalable, and efficient environment, where software is secured by default. The Internet Computer is a highly fault-tolerant decentralized network protocol that combines the computing power of independent data centers around the world.

Outcomes: In this project, student-teams are guided to build DeFi applications on the Internet Computer, which will then be used to design experiments for research on DeFi. Students will create a live web application on the Internet Computer, present the application design at SciEcon Accelerator Seminar, document products on a project website, and submit a proposal for further publishable research. Example applications include the following:

  • Algorithmic Stable Coin: Cryptocurrencies often experience sharp price fluctuations. To solve this problem, stable coins aim at pegging to sovereign currencies, such as USD. Ampleforth is one promising stable coin. In this project, students design an algorithmic stable coin protocol on the Internet Computer by referring to the design of Ampleforth.
  • Decentralized Exchange (DEX): DEX allows users to exchange cryptocurrencies directly with each other on the blockchain without trusting an intermediary. Uniswap is the most popular DEX with the most users and the largest trading volume. In this project, students design a DEX on the Internet Computer that inherits the ideal features of Uniswap.
  • Decentralized Bank: Decentralized Banks connect borrowers to lenders efficiently by utilizing smart contracts and allowing users to interact without permission. Moreover, the interest rates in decentralized banks are determined algorithmically following the law of supply and demand rather than being controlled by the central bank. Compound has the largest total value locked among all the existing decentralized banks. In this project, students design a decentralized bank on the Internet Computer following the protocol of Compound.

Skills: Students should have some background in design and analysis of algorithms and in programming/software development. In addition, interest and experience in financial tech is helpful.

Participating Students:

Neel Gajjar
Hi! I am a sophomore from Charlotte, NC studying computer science interested in pursuing graduate school or entering the industry (SWE/Data Science) after Duke. I am really excited to spend this summer with CS+!

Rui Xin
I'm a sophomore majoring in Mathematics and Computer Science. I'm interested in Computational Biology and Machine Learning.

Summer Research Projects:

Main    2023    2022    2021    2020    2019