research data scientist, RTI International
SMART: Smarter Manual Annotation for Resource-constrained collection of Training data
Over the past decade, breakthroughs in artificial intelligence have achieved human-level performance on tasks as diverse as object recognition, speech recognition and gaming. Many of these achievements have been due less to recent algorithmic innovation, and moreso due to the availability of (1) powerful and increasingly inexpensive computing resources and (2) open labeled datasets. Though performance gains in computation have historically increased exponentially, human gains in annotating labelled data have not. In the research community and industry, it is often acknowledged that the main bottleneck in machine learning adoption is no longer in engineering algorithms or hardware, but in creating sufficiently large labeled data sets.
To address this concern, active machine learning offers a smarter way to get the most gain from data annotation efforts when labels are expensive to obtain but data collection is cheap. The underlying notion behind active machine learning is that not all observations in a training set are as uniformly informative for a machine learning model to generalize well to new cases. This project will develop an annotation software prototype that leverages elements of active machine learning, gamification, and Ul/UX design to help data scientists and researchers reduce manual coding time and effort, making machine learning classification tasks more affordable and widely accessible.
Rob Chew, MS, is a Research Data Scientist and Program Manager at RTI International, where he uses his expertise in machine learning, text mining, data visualization, and software development to collaborate with subject matter experts on their complex data problems. Mr. Chew’s research interests broadly lie at the intersection of data science and public health, with a recent focus on computational social science. Currently, Mr. Chew is developing machine learning models to classify user types, communities, and latent attributes on Twitter, using deep learning on satellite images to support survey sampling efforts in developing countries, developing dynamic data visualizations to help policymakers better understand results of a Bayesian meta-regression for program evaluation, and creating a software application to allow police departments to quantify and assess local “near repeat” phenomena. As a program manager, Mr. Chew’s role also extends to mentoring fellow data scientists to support their professional development. Part of the NCDS Fellowship funding will help Mr. Chew mentor Jason Nance, a data scientist in RTI’s Center for Data Science, in the shared development of the SMART application.
Chew holds an MS in Analytics from the Institute for Advanced Analytics at North Carolina State University and a BA in Economics and Environmental Studies from Oberlin College.