By Glenn McDonald
Data is everywhere. In the modern digitized world, data powers virtually every endeavor and transaction — from car loans to medical diagnoses to interstellar imaging.
Data science, as such, is necessarily a deep and interdisciplinary field of study. Dr. Jian Pei, Professor and now chair of Duke's department of Computer Science in 2023, has dedicated his professional life to advancing the science of data and leveraging its multidisciplinary nature. He’s also a fierce advocate of making sure the playing field is level for everyone. Pei works to bring equity, efficiency, and user understandability to the practical applications of data science.
In fact, Pei’s multiple appointments at Duke reflect this interdisciplinary approach. In addition to his appointment in Computer Science, Pei is also a Professor of Biostatistics and Bioinformatics, and Electrical and Computer Engineering.
Pei’s expertise in the field is broad, and his list of official specialties is a long one: data mining, database systems, artificial intelligence, bioinformatics, online analytic processing, social media and social networks, health and medical informatics …. The list, believe it or not, goes on.
Since joining Duke in July of 2022, Pei has been particularly enthused by the opportunity to conduct research in and among these various interests. “Amazingly, Duke is not only strong in one discipline, it is strong in so many different areas and particularly integrating them in an interdisciplinary way,” Pei says. “I love to connect data science to different applications.”
From there, Pei says, the goal is to transform the data into actionable business opportunities: “Bioinformatics, healthcare, or new energy,” Pei says. “Applications for human good.”
A major focus of Pei’s work, in recent years, is to make data science more equitable, fair and unbiased. In any project or system, he says, there can be inherent bias in the way data is collected, or how it’s interpreted.
“For example, when some companies collect audio or video data to train their machine learning models, they collect more data from white people,” Pei says. “This results in a practical model that may be more accurate for white people.”
Pei notes that there may not be a designed agenda behind these kinds of biases, personal or even institutional. Often, people may not even be aware of the issues. It has more to do with demographics, statistics, and hard numbers about racial or ethnic representation in the data sample.
“The more kinds of minds that can be brought to data science — in terms of age and gender, background and experience — the better off we’ll all be.”
JIAN PEI
“We need to find ways to detect this kind of unfairness, and develop ways to technically correct such issues,” Pei says.
Developing such technical or algorithmic methods, Pei says, starts with educating young data scientists. “At the essential level, I want to use my work to raise people's awareness of this issue,” Pei says. “If we can prevent the problem from happening at the very beginning, that's much better than finding some later techniques to correct a problem.”
This notion dovetails rather nicely with another of his professional passions: Educating young scientists. Teaching is a beneficial relationship, he says — one that works both ways.
“Sometimes I'm not sure whether I'm helping them or they're helping me,” he says with a chuckle. “I always find it amazing that whenever I want to learn something — a new direction or a new area — the best way is to talk with a student. The students are always better than me in learning to adapt to the new things.”
As for his plans outside of the classroom, Pei says he enjoys running and mountain biking, and has recently discovered the outdoor amenities in the area.
“Oh, yes, since my second visit, I have been running through the East Campus trail,” Pei says. “Just last weekend I ran through the Carolina North Forest [in Chapel Hill]. The Pumpkin Loop! So beautiful.”
While on campus in his more official capacities, Pei hopes to bring together his expertise in data science with his advocacy and zeal for teaching young scientists.
“In data science, the most important thing is really connecting people.” he says. “The culture is moving forward, generation by generation — and there is so much we need to learn. We cannot just sit in our own little groups or our own generations.”
Pei’s belief is both simple and powerful: The more kinds of minds that can be brought to data science — in terms of age and gender, background and experience — the better off we’ll all be.
“And this is not just that we want to encourage participation,” Pei says. “It is also the best way to enrich the diversity in thinking that, in the long run, will be very critical for our whole society.”