CS Summer Research Projects - 2023

CS+ 2023

Project descriptions for CS+ 2023 Summer Projects are available below.

Active Learning for Breast Mass Segmentation in 2D Digital Mammography - poster image

Leads: Cynthia Rudin, Alina Jade Barnett

Students: Sangwook Cheon, Giyoung Kim

Educational Data Analysis and Mining for Hybrid Classes

Lead: Kristin Stephens-Martinez

Description: There are known best practices for running in-person or completely online classes, but how to run a hybrid class effectively is mostly still unknown. In a hybrid class, students self-choose whether they attend the class meeting in the classroom, over zoom, or even neither while still engaging with the class (e.g., through peer instructions) synchronously. It is unclear what factors (time of the day, location, weather, social anxiety, class policies) may influence students' actions and how this in turn shapes their learning. It is also unclear whether pedagogical strategies like peer instructions are still effective and carried out in an online/hybrid setting. As we recover from the pandemic, better and holistic understanding of what happens in hybrid classes becomes increasingly important to inform the decision of whether we keep our classes hybrid.

In this project, we plan to analyze class data collected from CS216 (Everything Data) spanning the Spring 2022, Fall 2022, and Spring 2023 semesters. This class is hybrid and similarly in size for all three semesters, but the class policies are different in the details, allowing for comparing and contrasting cross semester. We will start from analyzing students' class participation footprints collected via the in-class peer instructions and students' logs from attending the Zoom meeting. Depending on progress and interest, data for other aspects of the class (e.g., Office Hours, Ed discussion, Sakai quizzes, etc.) is also available.

Goals/Deliverables: The base goal of this project is to obtain actionable insights that can help inform and improve not only CS216 but also other classes that were/are hybrid or may become hybrid in the future. Depending on students' interests, the findings can then be shaped into a research poster/paper in a CS education venue, or a publicly available infographics webpage/tool demonstrating the findings, or both.

Student Background/Prerequisites: Having taken CS216 by the time of the project will help tremendously, because CS216 provides both the data skills and the class context needed in the project. Having taken CS290 (Computing education research) or UTA experience will also be helpful be is less essential. Statistical background is also a plus. None of this is a hard constraint.

Efficient Application-Level Networking - poster image

Leads: Matthew Lentz, Danyang Zhuo

Description: Networking is the backbone for most modern applications. The networking stack is complex, spread across hardware, the operating system (OS), and applications themselves. The OS traditionally provides a general “socket” interface which is useful to many applications for supporting their higher-level protocols. However, as applications’ demands have grown, this choice of interface has caused some tension. For example, some modern applications even bypass the OS networking stack entirely and provide their own more-efficient, task-specific stack (potentially taking advantage of advances in network hardware). While this is fantastic for performance, it presents a number of challenges, such as: How can we efficiently enforce policies on application-level traffic (e.g., load balancing across different networked machines)? How can we enable upgrading of the network stack without needing to take down running applications?

Our insight to addressing these challenges is that we should provide managed OS services that operate at the same level of application protocols. This involves decoupling the network stack from the application itself to instead be executed within the managed service, thus enabling us to address the management challenges while providing strong efficiency. As part of our prior work, we built mRPC, a managed service that supports the Remote Procedure Call (RPC) protocol for applications. We are looking to take the insights from our work on mRPC to support arbitrary application-level networking protocols and requirements. For this summer project, our goal is to design and implement a system that takes in a developer-specified protocol (e.g., RPC) and generates an application library to transparently interface with the managed service. To drive the development of this system, we will go beyond RPC protocols to also implement support for collective communication protocols used in modern distributed ML training/serving applications.

Goals/Deliverables:

  1. Design and implementation of an application that builds on our current application-level networking managed service infrastructure (Phoenix). The current target for this application is a distributed ML model training system that leverages new advances in networking hardware, such as Remote Direct Memory Access (RDMA).
  2. Changes to the application interface and implementation of Phoenix to better support application developers (driven by observations as part of Goal #1).
  3. Presentation and poster on the work as part of the CS+ program.
  4. Contributions towards a future publication. The evaluation section of the Phoenix paper will be largely driven by analyzing the utility and efficiency for supporting a wide-variety of application-level networking applications (see Goal #1).

Student Background/Prerequisites: Has already taken CS210 or CS250. Experience with programming in C, C++, or Rust.

Exploring Generative Models for Medical Records - poster image

Lead: Anru Zhang

Description: Recent work has shown the great power of the (deep) generative model in generating new data (images, audio, etc) with the same/similar distribution as the training dataset. This body of work has sparked great recent interest in both applications and theory development. In the medical world, synthetic samples generated from generative models have been particularly appealing because synthetic samples yield better privacy protection and larger sample sizes. In this project, we aim to explore the generative diffusion model, the trendy framework of generative models, for generating medical records data. If time permits, we also aim to explore the follow-up medical applications and the related theory.

Goals/Deliverables: We will develop a software package and write a paper at the end of the project.

Student Background/Prerequisites: Strong coding skills (Python, Pytorch), collaborative skills, and passion.

Interpretable Binding Affinity Prediction with Persistent Homology - poster image

Leads: Bruce Donald, Graham Holt

Student: Yuxi Long

Privacy Attitudes and Concerns Towards Medical Tracking Apps - poster image

Lead: Pardis Emami-Naeini

Goals/Deliverables: This project aims to expose students to societal research at the intersection of privacy and social sciences. The outcome of this research could then be presented as a poster or a research paper in a security, privacy, or human-computer interaction conference.

Student Background/Prerequisites: This project involves extracting information from social media and conducting statistical analysis as well as qualitative coding on the collected data. It is important for students to have some coding and statical background to conduct this project. In addition, being from a country/background that has been experiencing recent social/political unrest could help with relating to this project, as well as being able to search social media using keywords from the main/first language of the country.

A UI to Communicate Interpretable AI Decisions to Radiologists - poster image

Leads: Alina Jade Barnett, Cynthia Rudin

Student: Anika Mitra

Combatting Vaccine Misinformation Using Large Language Models - Poster image

Leads: Bhuwan Dhingra, Jun Yang

Description: Large Language Models (LLMs) such as GPT-3 and Chat-GPT generate remarkably accurate, fluent and coherent responses when prompted to perform tasks such as answering questions, summarizing information and generating code. In this project we will explore their application to generating contextualized and effective interventions to concerns about vaccines. There is a tremendous amount of misinformation online which fuels vaccine hesitancy, but corrective interventions remain few and far between. There is a need to scale up these interventions so that they speak to specific concerns in a manner which would appeal to audiences of different backgrounds.

Students will devise prompting techniques which take a vaccine related concern and some metadata about the intended audience to generate a contextualized response for addressing that concern. They will also devise techniques for evaluating the accuracy, appropriateness and efficacy of the generated responses and compare them to existing interventions on websites such as CDC.

Goals/Deliverables:

  • A report summarizing the results of the evaluation and the limitations of using LLMs for generating interventions
  • A user facing interface where stakeholders can issue queries related to vaccine hesitancy and get responses from the LLM

Student Background/Prerequisites: Proficiency in Python and a high level understanding of deep learning and language modeling.

Web Cookie Security and Privacy - poster image

Lead: Xiaowei Yang

Description: Current websites use cookies for diverse purposes, such as authentication, advertisement, and behavior tracing. This leads to not only potential vulnerabilities of websites but also possible violations of users’ privacy. In this project, we start by investigating web cookie compliance with the web privacy policies and laws (GDPR). We will measure the cookie categories distribution on websites, and measure the extent of their compliance with cookie policies. The challenge of this project is to automatically detect different categories of cookies and evaluate compliance with the cookie policies. Besides the cookie policy compliance, we will investigate potential vulnerabilities or privacy leakage caused by illegal cookie usage.

Goals/Deliverables: The ideal deliverables will be a project write-up that can be published at systems, networking, or security conferences such as USENIX NSDI, ACM SIGCOMM, USENIX Security, and ACM CCS.

Student Background/Prerequisites: Students who are interested in this project should be familiar with one programming language and web development. Students will learn and practice the knowledge of network protocols and network security in this project.