What is CS+?
CS+ is a ten week summer program exclusively for Duke undergraduates to get involved in computer science research projects with faculty in a fast-paced but supportive community environment. Students participate in teams of 3-4 and are jointly mentored by a faculty project lead and a graduate student mentor. The experience is meant as a rich entry point into computer science research and applications beyond the classroom.
- Only students enrolled at Duke University are eligible to apply.
- The program this summer will run from Tuesday, May 30th, 2023 through Friday, August 4th, 2023.
- The program is held in-person, following Duke guidelines for summer programs. There is no virtual option available, and students must reside in Durham during the summer (on or off campus) to participate.
- Students participate in this program full-time (40 hours/week). You cannot take summer courses or do other internships/fellowships while doing CS+.
- Participants receive a stipend of $5,000 to cover expenses.
- Applications received by Friday, February 17th will receive full consideration (afterwards applications will be considered depending on whether positions have been filled).
If you have questions about the program, please email firstname.lastname@example.org.
CS+ Project Offerings Summer 2023
Lead: Kristin Stephens-Martinez
Description: There are known best practices for running in-person or completely online classes, but how to run a hybrid class effectively is mostly still unknown. In a hybrid class, students self-choose whether they attend the class meeting in the classroom, over zoom, or even neither while still engaging with the class (e.g., through peer instructions) synchronously. It is unclear what factors (time of the day, location, weather, social anxiety, class policies) may influence students' actions and how this in turn shapes their learning. It is also unclear whether pedagogical strategies like peer instructions are still effective and carried out in an online/hybrid setting. As we recover from the pandemic, better and holistic understanding of what happens in hybrid classes becomes increasingly important to inform the decision of whether we keep our classes hybrid.
In this project, we plan to analyze class data collected from CS216 (Everything Data) spanning the Spring 2022, Fall 2022, and Spring 2023 semesters. This class is hybrid and similarly in size for all three semesters, but the class policies are different in the details, allowing for comparing and contrasting cross semester. We will start from analyzing students' class participation footprints collected via the in-class peer instructions and students' logs from attending the Zoom meeting. Depending on progress and interest, data for other aspects of the class (e.g., Office Hours, Ed discussion, Sakai quizzes, etc.) is also available.
Goals/Deliverables: The base goal of this project is to obtain actionable insights that can help inform and improve not only CS216 but also other classes that were/are hybrid or may become hybrid in the future. Depending on students' interests, the findings can then be shaped into a research poster/paper in a CS education venue, or a publicly available infographics webpage/tool demonstrating the findings, or both.
Student Background/Prerequisites: Having taken CS216 by the time of the project will help tremendously, because CS216 provides both the data skills and the class context needed in the project. Having taken CS290 (Computing education research) or UTA experience will also be helpful be is less essential. Statistical background is also a plus. None of this is a hard constraint.
Leads: Matthew Lentz, Danyang Zhuo
Description: Networking is the backbone for most modern applications. The networking stack is complex, spread across hardware, the operating system (OS), and applications themselves. The OS traditionally provides a general “socket” interface which is useful to many applications for supporting their higher-level protocols. However, as applications’ demands have grown, this choice of interface has caused some tension. For example, some modern applications even bypass the OS networking stack entirely and provide their own more-efficient, task-specific stack (potentially taking advantage of advances in network hardware). While this is fantastic for performance, it presents a number of challenges, such as: How can we efficiently enforce policies on application-level traffic (e.g., load balancing across different networked machines)? How can we enable upgrading of the network stack without needing to take down running applications?
Our insight to addressing these challenges is that we should provide managed OS services that operate at the same level of application protocols. This involves decoupling the network stack from the application itself to instead be executed within the managed service, thus enabling us to address the management challenges while providing strong efficiency. As part of our prior work, we built mRPC, a managed service that supports the Remote Procedure Call (RPC) protocol for applications. We are looking to take the insights from our work on mRPC to support arbitrary application-level networking protocols and requirements. For this summer project, our goal is to design and implement a system that takes in a developer-specified protocol (e.g., RPC) and generates an application library to transparently interface with the managed service. To drive the development of this system, we will go beyond RPC protocols to also implement support for collective communication protocols used in modern distributed ML training/serving applications.
- Design and implementation of an application that builds on our current application-level networking managed service infrastructure (Phoenix). The current target for this application is a distributed ML model training system that leverages new advances in networking hardware, such as Remote Direct Memory Access (RDMA).
- Changes to the application interface and implementation of Phoenix to better support application developers (driven by observations as part of Goal #1).
- Presentation and poster on the work as part of the CS+ program.
- Contributions towards a future publication. The evaluation section of the Phoenix paper will be largely driven by analyzing the utility and efficiency for supporting a wide-variety of application-level networking applications (see Goal #1).
Student Background/Prerequisites: Has already taken CS210 or CS250. Experience with programming in C, C++, or Rust.
Lead: Pardis Emami-Naeini
Description: During the political and social unrest, such as the Black Lives Matter (BLM) movement, the Ukrainian war, and the Iranian protests, people started discussing various security and privacy "best practices" on social media to help those involved in such movements better protect themselves.
The current and ongoing protest in Iran is an example of such unrest. After the death of Mahsa Amini by Iran's Morality Police, Iranian people started large-scale protests on the streets. At the same time, the Iranian government attempted to shut down the Internet, block social media platforms, and track Iranian protesters and activists. Immediately after the start of such movements, social media, including Twitter, started filing up with various security and privacy tips for protesters to better protect themselves against the Iranian regime.
This project aims to understand the landscape of the available security and privacy advice on social media for protesters in various movements (e.g., BLM, Ukrainian war). This project involves scraping social media to collect the relevant posts and then conducting quantitative and qualitative analysis to carefully surface the themes. In addition, we would like to understand the differences and similarities between these movements regarding promoted and shared security and privacy advice.
Goals/Deliverables: This project aims to expose students to societal research at the intersection of privacy and social sciences. The outcome of this research could then be presented as a poster or a research paper in a security, privacy, or human-computer interaction conference.
Student Background/Prerequisites: This project involves extracting information from social media and conducting statistical analysis as well as qualitative coding on the collected data. It is important for students to have some coding and statical background to conduct this project. In addition, being from a country/background that has been experiencing recent social/political unrest could help with relating to this project, as well as being able to search social media using keywords from the main/first language of the country.
Leads: Bhuwan Dhingra, Jun Yang
Description: Large Language Models (LLMs) such as GPT-3 and Chat-GPT generate remarkably accurate, fluent and coherent responses when prompted to perform tasks such as answering questions, summarizing information and generating code. In this project we will explore their application to generating contextualized and effective interventions to concerns about vaccines. There is a tremendous amount of misinformation online which fuels vaccine hesitancy, but corrective interventions remain few and far between. There is a need to scale up these interventions so that they speak to specific concerns in a manner which would appeal to audiences of different backgrounds.
Students will devise prompting techniques which take a vaccine related concern and some metadata about the intended audience to generate a contextualized response for addressing that concern. They will also devise techniques for evaluating the accuracy, appropriateness and efficacy of the generated responses and compare them to existing interventions on websites such as CDC.
- A report summarizing the results of the evaluation and the limitations of using LLMs for generating interventions
- A user facing interface where stakeholders can issue queries related to vaccine hesitancy and get responses from the LLM
Student Background/Prerequisites: Proficiency in Python and a high level understanding of deep learning and language modeling.
Lead: Anru Zhang
Description: Recent work has shown the great power of the (deep) generative model in generating new data (images, audio, etc) with the same/similar distribution as the training dataset. This body of work has sparked great recent interest in both applications and theory development. In the medical world, synthetic samples generated from generative models have been particularly appealing because synthetic samples yield better privacy protection and larger sample sizes. In this project, we aim to explore the generative diffusion model, the trendy framework of generative models, for generating medical records data. If time permits, we also aim to explore the follow-up medical applications and the related theory.
Goals/Deliverables: We will develop a software package and write a paper at the end of the project.
Student Background/Prerequisites: Strong coding skills (Python, Pytorch), collaborative skills, and passion.
Lead: Xiaowei Yang
Goals/Deliverables: The ideal deliverables will be a project write-up that can be published at systems, networking, or security conferences such as USENIX NSDI, ACM SIGCOMM, USENIX Security, and ACM CCS.
Student Background/Prerequisites: Students who are interested in this project should be familiar with one programming language and web development. Students will learn and practice the knowledge of network protocols and network security in this project.
What is the difference between Code+, Data+, Climate+, and CS+? All three “plus” programs have the same model: students collaborating in teams on a project in tech/data for the same 10 weeks of the summer and receiving a stipend of the same amount. We also partner to provide some common events (talks, social events, final poster fair, etc) in order to create a larger ecosystem of students studying in tech and data over the summer; over 100 students participated in 2019 across all three programs. Each program has its own application.
- CS+ focuses on projects in computer science research and applications and is run by the Department of Computer Science. Project leads are typically computer science faculty.
- Data+ focuses on interdisciplinary data science projects from all over the university, and is run by Rhodes I.I.D. in Gross Hall. Project leads are typically faculty from diverse areas of the university, with frequent additional participation from community and/or industry partners.
- Code+ focuses on projects in software and product development and is run by Duke OIT taking place at the American Tobacco Campus in downtown Durham. Project leads are professional IT developers with the emphasis on developing real-world development experience.
- Climate+ focuses on climate-related, data-driven interdisciplinary research projects on diverse topics like electricity consumption, wetland carbon emissions, climate change’s impacts on river and ocean ecosystems, and the use of remote sensing data to inform climate strategies. Project leads are data science experts, and also climate, environment, and energy researchers and practitioners with additional participation from other project teams.
Do I apply to the program, or can I pick the projects I want to be a part of? You can apply specifically to the projects and faculty of interest to you.
How much background do I need? CS+ is intended for students who have some computer science experience, but students do not need to be computer science majors or rising seniors in order to apply. We welcome and encourage applications from rising 2nd and 3rd year students who have completed the introductory course sequence in computer science and have skills and interests that make them a good fit for their projects. Feel free to reach out to individual project leaders to discuss background for specific projects.