DataBytes

The goal of DataBytes is to share developments or new perspectives in the field of data science. These events often highlight work happening in industry to provide a pathway to connect industry with academia. We ask that presenters refrain from offering explicit sales or collaborative opportunities during or after the event. NCDS enjoys a close relationship with many higher education institutions, and we facilitate connections based on mutual interest. The NCDS offers other opportunities to provide thought leadership in a variety of platforms, please reach out to Amanda C. Miller to discuss this further or click below to navigate to the interest form.

DataBytes Interest Form

Upcoming DataBytes Events

ROBOKOP (Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways) is a knowledge graph system that aims to accelerate and advance scientific discovery by enabling users to simultaneously explore dozens of integrated and harmonized sources of biomedical knowledge. ROBOKOP has been applied to suggest mechanistic insights into biomedical questions and hypotheses for subsequent testing across multiple domains, including hepatotoxicity, environmental determinants of disease, drug repurposing and mechanism of action, and others.

During this NCDS webinar, Dr. Karamarie Fecho will provide a short presentation and live demonstration of ROBOKOP, followed by an opportunity for attendees to pose questions to ROBOKOP and explore answers.

After this webinar, attendees will:
● Gain a general understanding of knowledge graphs
● Be aware of the ROBOKOP knowledge graph system and know how to access it
● Have a basic understanding of how to query ROBOKOP and interpret results

Funding
ROBOKOP is funded by NIEHS with joint support from the NIH Office of Data Science (#U24ES035214). Drs. Alexander Tropsha and Chris Bizon, of the University of North Carolina at Chapel Hill, serve as Principal Investigators.

About Dr. Karamarie Fecho
Dr. Fecho holds a PhD degree in Neurobiology from UNC’s School of Medicine. Her scientific background is broad and spans the clinical and translational spectrum, from basic science to clinical research to healthcare quality assurance/improvement and data science.

Kara currently serves as Founder and CEO of Copperline Professional Solutions, which is a small biomedical consulting company that she founded in 2010. Through Copperline, Kara has engaged with numerous academic organizations, pharmaceutical companies, tech start-ups, and non-profits, providing a wide array of services, products, and expertise.

Kara also is a Research Affiliate at RENCI, where in recent years she has been focused on the development and application of biomedical knowledge graphs. This webinar will focus on her work on the ROBOKOP (Reasoning Over Biomedical Objects linked in Knowledge Oriented Pathways) knowledge graph system.

Registration coming soon!

Past DataBytes Events | 2025

Key Findings from the NIST AI 100-2 Report

In this webinar, Apostol Vassilev will examine the recently published 2025 edition of the NIST Al 100-2 report, a crucial resource for organizations seeking to ensure the security of Al systems. The talk will focus on the security of generative Al models, tailored to address the needs of Al practitioners and stakeholders in the audience. The report provides a comprehensive taxonomy of concepts and terminology in the field of adversarial machine learning (AML), organizing key Al technologies, attack life cycle stages, and attacker goals, objectives, capabilities, and knowledge into a conceptual hierarchy. By identifying current security challenges in the Al system life cycle and describing methods for mitigation and management, the report aims to establish a common language for the rapidly evolving AML landscape. This webinar will provide attendees with a deeper understanding of the development and deployment of more trustworthy and responsible Al systems.

View a recording of the event here.

Large language models (LLMs) do not distinguish between fact and fiction. They will return an answer to almost any prompt, yet factually incorrect responses are commonplace. Our tendency to anthropomorphise machines and trust models as human-like truth tellers — consuming and spreading the bad information that they produce in the process — is uniquely worrying. They are not, strictly speaking, designed to tell the truth.

Yet they are implemented in many sectors where truth and detail matter such as education, science, health, the media, law, and finance. Our guest presenter Sandra Wachter coined the idea of “careless speech” as a new type of harm created by large language models (LLM) that poses cumulative, long-term risks to science, education, and shared social truth in democratic societies. These subtle mistruths are poised to cumulatively degrade and homogenize knowledge over time.

This begs the question: Do large language models have a legal duty to tell the truth?

Join us as Sandra shows the prevalence of hallucinations, and we assess the existence of truth-related obligations in EU human rights law and the Artificial Intelligence Act, Digital Services Act, Product Liability Directive and Artificial Intelligence Liability Directive. We will close by proposing ideas of how to reduce hallucinations in LLMs and a robust Q & A opportunity.

Past DataBytes Events | 2024

Organizations at varying levels of analytical maturity are increasingly seeking to optimize their data usage. This presentation, delivered by two experienced data scientists, will discuss how analytics can vary across different environments. The speakers will draw on their work with clients across the analytical maturity spectrum, highlighting their efforts to enhance analytically established organizations and develop capabilities in analytically emerging ones. They will examine a range of analytical methodologies and how these can either support or hinder organizational performance. This presentation will focus on how different organizations engage with analytics based on their maturity levels, outlining the specific opportunities and challenges each faces. The goal is to provide attendees with practical insights on how different organizations approach data analytics so that they can be prepared to make analytic contributions at any organization.

View a recording of the event here.

At the Street Drug Analysis Lab, researchers analyze street drug samples from around the country and have detected about 300 unique chemical substances. But making sense of chemicals is a notorious challenge, with long names, esoteric molecules, and overlapping pharmacological properties. Therefore, the team created a flexible ontology that can adapt over time, and developed visualizations to bring order to the chaos. Using a type of chord diagram called hierarchical edge bundling (.js package), they conceptualized co-occurrence of substances in the drug supply based on 6,000 drug samples from 34 US states, showing connections between classes of molecules. Working with a local graphic designer, hand drawn illustrations highlight particularly dangerous combinations of substances and tell the story of where the samples came from.

Join team members Nabarun Dasgupta and Anuja Panthari at the intersection of chemistry, art, public health, and data science, as they describe how their project brings order to the unruly illicit drug supply. The key message: The drug supply is vast, but it is knowable.

View a recording of the event here.

Through a competitive awards-based program, STTR and SBIR federal grants enable small businesses to explore their technological potential and open new opportunities to profit from its commercialization. However, it can be difficult for first-time applicants to get their foot in the door, between sorting through guidelines, requirements, and deadlines, and trying to locate successful examples.

GrantScout is automating grant search, writing, and submission. The tool uses traditional deep learning methods and generative AI to unlock funding for everyone. Join GrantScout founders Felicia Chen and Jennifer Tang as they present how the government provides over two million grants for small businesses, the ways that their team fine-tunes their own models to create strong technical proposals, and lessons they’ve learned from past experiences that helped them build the platform.

View a recording of the event here.

The National Institute of Standards and Technology (NIST) released an AI Risk Management Framework for trustworthy and responsible use of AI and analytics. NIST offers a portfolio of measurements, standards, and legal metrology to provide recommendations that ensure traceability, enable quality assurance, and harmonize documentary standards and regulatory practices. Their framework is very detailed with recommendations across four functions: govern, map, measure, and manage. In this session, we’ll discuss incorporating these recommendations into the analytics lifecycle. Attendees to this session will gain a greater understanding of trustworthy AI best practices as well as user roles and expectations for building responsible analytics.

Join NCDS as Sophia Rowland, a Senior Product Manager focusing on ModelOps and MLOps at SAS, walks us through this important presentation.

View a recording of the event here.

DataBytes Becoming a Data Dectective Holding AI Accountable w Hilke SchellemannBias and brittleness in artificial intelligence (Al) tools are a growing concern. Join Hilke Schellman, Emmy-award winning investigative reporter, Wall Street Journal and Guardian contributor and Journalism Professor at NYU, as she shares key takeaways from her book, The Algorithm: How Al Decides Who Gets Hired, Monitored, Promoted, and Fired and Why We Need to Fight Back Now.

Al is now being used to decide who has access to an education, who gets hired, who gets fired, and who receives a promotion. Algorithms are on the brink of dominating our lives and threaten our human future-if we don't fight back. During the webinar, Schellmann will share takeaways about the rise of Al in the world of work and show how she tested many of the available tools herself without coding experience.

During our time together, Hilke will share a few key takeaways from the book and answer questions from the audience. You don't want to miss this.

View a recording of the event here.

Past DataBytes Events | 2023

The National Consortium for Data Science looks forward to welcoming back Christopher Lam, CEO of Epistamai on December 5th for our next DataBytes event as he discusses Causal AI: The Key to High-Stakes Decision Making.

There has been tremendous attention to the generative AI wave and its enormous potential to transform industries. But there is a hidden wave developing right behind it called causal AI. Whereas generative AI is optimized for low-stakes decisions like chatbots and image generation, it is not designed to address issues like ethics or trustworthiness that are essential for using AI in high-stakes decisions like credit and hiring decisions. This is where causal AI fits into the picture.

In this presentation, Christopher Lam will discuss how to use causal AI to build AI systems that society can trust for high-stakes decision making. Lam will show how causal AI can help bridge the gap between symbolic AI and machine learning, demonstrating the value of integrating human knowledge and reasoning about the world to improve how data is analyzed. He will demonstrate through a use case how this more human-centric approach to AI can be used to build fairer and more equitable AI systems that are aligned with society's democratic values. Finally, he will describe a new causal hierarchy, one that integrates machine learning with causal inference and system dynamics.

View a recording of the event here.

In a lawsuit challenging its surveillance activities, Clearview AI used the First Amendment as a defense. The facial recognition technology company argued that the creation and use of its surveillance product was First Amendment protected speech. Join Talya Whyte, third-year law student at New York University, as she presents a case study on the parties’ basic arguments, Clearview AI’s characterization of its activities as “speech,” and the implications of this argument. Attendees will understand how facial recognition technology works and the risks and harms inherent in its building and implementation, and gain the knowledge to make more informed legal, policy, and technical choices about the implementation of AI-based surveillance technology.

Talya Whyte is a third year law student at New York University. Her research interests lie at the intersection of new technology, society, public trust, and digital rights. She is a 2023 Google Legal Scholar, a Student Fellow at the Engelberg Center on Innovation Law & Policy, and NYU Cyber Scholar. Whyte hopes for a thoughtful and humanitarian integration of technology into existing legal and societal frameworks.

View a recording of the event here.

The National Consortium for Data Science looks forward to hosting Kimberly Robasky, Associate Director of Machine Learning/AI at Arrakis Therapeutics, on August 22 for our next DataBytes event as she discusses AI in Target Identification and Drug Discovery: Transforming the Future of Medicine.

Artificial intelligence (AI) is taking a transformative role in target identification and drug discovery. Today, AI algorithms can analyze vast, multi-modal datasets to identify drug targets, accelerate lead compound discovery, and optimize drug design. AI is being used by biotechnology companies around the world to compress timelines and improving clinical trial outcomes. Join us to uncover the data-driven revolution in personalized medicine enabled by AI-driven drug development.

View a recording of the event here.

A Theory of Fairness
To understand fairness, one must unify central ideas from the social sciences and humanities to mathematics and computer science. In this talk, Chris will show how to model a principal cause of algorithmic bias (the structure vs. agency debate in sociology) and directly map it to the two fundamental laws of causal inference (counterfactuals/interventions vs. conditional independence). He will also show how to bridge the field of causal inference to machine learning, providing us with a novel way to visualize the different ways that a supervised machine learning model can discriminate. These causal models may help policymakers on both sides of the aisle to modernize AI regulations so that they are aligned to society’s values.

View a recording of the event here.

About the Speaker- Chris Lam
Chris is the founder and CEO of Epistamai, an AI research company based in the Research Triangle that is focused on understanding AI ethics through the lens of causality. The inspiration for his startup came from his work at the Federal Reserve, where he did research on algorithmic bias in credit decisions. He is an evangelist for the emerging field of causal data science, which could help us to solve intractable problems in data science today.

 

Data ethics is a growing concern in all industries, especially as issues such as algorithmic bias, informed consent, and privacy become more nuanced. Additionally, with artificial intelligence and machine learning tools gaining traction at a rapid speed, it is more imperative than ever that organizations establish strong ethical guidelines around the data collected from client projects, research endeavors, and business affairs. Anisha Nadkarni, Data Ethics Officer at Randstad Global, walks us through a day in the life of a data ethicist, with a Q&A session with Nadkarni at the end of the meeting.

Visualizations allow people to readily analyze and communicate data. However, many common visualization designs lead to engaging imagery but false conclusions. By understanding what people see when they look at a visualization, we can design visualizations that support more accurate data analysis and avoid unnecessary biases. UNC Computer Science Assistant Professor Danielle Szafir walks us through best practices in data visualization and analysis, with a Q&A session with Dr. Szafir at the end of the meeting.

View a recording of the event here.