The NCDS Data Fellows Program provides funding to researchers at member institutions for work that addresses data science research issues that are important to our members in novel and innovative ways. Each Data Fellow receives funding for one academic year, beginning July 1 and running through June 30 of the next calendar year.
Fellows share their research with NCDS members through a face-to-face meeting in early Fall, when outgoing Fellows report on their research results and new Fellows provide an introduction to their work, as well as through webinars and a final report. The program is supported by NCDS membership.
Data Fellows 2017- 2018 Read News Release
Robert Chew, research data scientist, RTI International
Project Title: SMART: Smarter Manual Annotation for Resource-constrained collection of Training data
Project Summary: Over the past decade, breakthroughs in artificial intelligence have achieved human-level performance on tasks as diverse as object recognition, speech recognition and gaming. Many of these achievements have been due less to recent algorithmic innovation, and moreso due to the availability of (1) powerful and increasingly inexpensive computing resources and (2) open labeled datasets. Though performance gains in computation have historically increased exponentially, human gains in annotating labelled data have not. In the research community and industry, it is often acknowledged that the main bottleneck in machine learning adoption is no longer in engineering algorithms or hardware, but in creating sufficiently large labeled data sets.
To address this concern, active machine learning offers a smarter way to get the most gain from data annotation efforts when labels are expensive to obtain but data collection is cheap. The underlying notion behind active machine learning is that not all observations in a training set are as uniformly informative for a machine learning model to generalize well to new cases. This project will develop an annotation software prototype that leverages elements of active machine learning, gamification, and Ul/UX design to help data scientists and researchers reduce manual coding time and effort, making machine learning classification tasks more affordable and widely accessible.
Biography: Rob Chew, MS, is a Research Data Scientist and Program Manager at RTI International, where he uses his expertise in machine learning, text mining, data visualization, and software development to collaborate with subject matter experts on their complex data problems. Mr. Chew’s research interests broadly lie at the intersection of data science and public health, with a recent focus on computational social science. Currently, Mr. Chew is developing machine learning models to classify user types, communities, and latent attributes on Twitter, using deep learning on satellite images to support survey sampling efforts in developing countries, developing dynamic data visualizations to help policymakers better understand results of a Bayesian meta-regression for program evaluation, and creating a software application to allow police departments to quantify and assess local “near repeat” phenomena. As a program manager, Mr. Chew’s role also extends to mentoring fellow data scientists to support their professional development. Part of the NCDS Fellowship funding will help Mr. Chew mentor Jason Nance, a data scientist in RTI’s Center for Data Science, in the shared development of the SMART application.
Chew holds an MS in Analytics from the Institute for Advanced Analytics at North Carolina State University and a BA in Economics and Environmental Studies from Oberlin College.
Samira Shaikh, assistant professor of cognitive science, department of computer science and department of psychology, University of North Carolina at Charlotte
Project Title: Modeling Persuasion and Group Behavior in Big Data
Project Summary: Online social media platforms provide millions of individuals with a means of expressing their views about a plethora of subjects of import to their lives. This massive communicative effort occurring in the online realm has been shown to impact the offline, real world in measurable ways. Such real-world consequences call for the need to understand how real-world behaviors correspond to behaviors by people in online platforms and how these can be understood and detected by automated methods. This project investigates the propagation of ideas in the online world and their effect in persuading groups of individuals to take action in the real world. The core of proposed work relies on the study of language, and exploits reliable research practices in psycholinguistics and sociolinguistics to investigate human behavior in online platforms, specifically persuasive behavior. The project will deliver an integrated model of persuasion and group behavior in online communication associated with rapid and broad information diffusion and influence on digital media.
Biography: Samira Shaikh (PhD) is Assistant Professor of Cognitive Science in the Department of Computer Science and faculty member of the Data Science Initiative at UNC Charlotte. Dr. Shaikh’s research expertise is in Computational Sociolinguistics, Data Science, Natural Language Processing and Artificial Intelligence. Her work focuses on computational modeling of human behavior in big data, with strong theoretical underpinnings from social science – including those from psychology, communication and anthropology. Previously, Dr. Shaikh was a lead research scientist for the Research Foundation of the State University of New York, where she worked on several large-scale research projects funded by the U.S. Department of Defense. Dr. Shaikh received her PhD in Computer Science from the State University of New York at Albany.