Data Sets @ NCDS

The Statewide Digital Elevation Model

These data are a bare earth digital elevation model of the state of North Carolina. The model was commissioned by the North Carolina Floodplain Mapping Program (NCFMP), and was generated from processed LIDAR data. The model is stored in ESRI ASCII files, each containing the DEM information for a grid spanning 1 degree of latitude by 1 degree of longitude.

The North Carolina Forecast System

These data are output from the storm surge, wind wave, and tide model ADCIRC, in NetCDF format. RENCI operates ADCIRC in a forecast for the North Carolina coastal waters and provides the forecasted water levels to decision makers in North Carolina and surrounding areas.

Million Songs

The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.

The First Billion Digits of Π

These data originate from PI World of JA0HXV. The data are organized in a flat text file with the digits arranged in 10 lines of 10 digits.

The U.S. Census: 1 Billion RDF Triples

This is a well-documented data set based on the well-described 2000 US Census data. In addition to reorganizing the data via an RDF format, the data are exposed via SPARQL.

Tiny Images Dataset

Tiny Images dataset, which consists of 79,302,017 images, each being a 32×32 color image is stored in the form of large binary files which can be accessed by a Matlab toolbox.

Twitter Census – Developer Tools from Infochimps

Twitter data from over 24 million tweets scraped from March 2006 to November 2009.


A programmer’s guide to big data: 12 tools to know





SAS on Big Data

Tutorials and other How-Tos

AMP Camp – Big Data Minicourse

MapReduce Tutorial

IBM Tech Article


To learn more about the resources listed above or to submit your own data resources, please get in touch!