Data Sets @ NCDS

The Statewide Digital Elevation Model

http://observatory.data2discovery.org/dvn/dv/ncfmp

http://www.ncfloodmaps.com

These data are a bare earth digital elevation model of the state of North Carolina. The model was commissioned by the North Carolina Floodplain Mapping Program (NCFMP), and was generated from processed LIDAR data. The model is stored in ESRI ASCII files, each containing the DEM information for a grid spanning 1 degree of latitude by 1 degree of longitude.

The North Carolina Forecast System

http://observatory.data2discovery.org/dvn/dv/ncfs

These data are output from the storm surge, wind wave, and tide model ADCIRC, in NetCDF format. RENCI operates ADCIRC in a forecast for the North Carolina coastal waters and provides the forecasted water levels to decision makers in North Carolina and surrounding areas.

Million Songs

http://www.infochimps.com/collections/million-songs

http://labrosa.ee.columbia.edu/millionsong/

The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.

The First Billion Digits of Π

http://www.infochimps.com/datasets/the-first-billion-digits-of-pi

These data originate from PI World of JA0HXV. The data are organized in a flat text file with the digits arranged in 10 lines of 10 digits.

The U.S. Census: 1 Billion RDF Triples

http://www.rdfabout.com/demo/census/

This is a well-documented data set based on the well-described 2000 US Census data. In addition to reorganizing the data via an RDF format, the data are exposed via SPARQL.

Tiny Images Dataset

http://horatio.cs.nyu.edu/mit/tiny/data/index.html

http://groups.csail.mit.edu/vision/TinyImages/

Tiny Images dataset, which consists of 79,302,017 images, each being a 32×32 color image is stored in the form of large binary files which can be accessed by a Matlab toolbox.

Twitter Census – Developer Tools from Infochimps

http://www.infochimps.com/datasets/twitter-census-developer-tools-mapping-from-twitter-user-search-

Twitter data from over 24 million tweets scraped from March 2006 to November 2009.

Tools

A programmer’s guide to big data: 12 tools to know

A programmer’s guide to big data: 12 tools to know

Hadoop

http://hadoop.apache.org

IBM

http://www-01.ibm.com/software/data/bigdata/enterprise.html

MapReduce

http://research.google.com/archive/mapreduce.html

Oracle

http://www.oracle.com/us/technologies/big-data/index.html

SAS on Big Data

http://www.sas.com/big-data/

Tutorials and other How-Tos

AMP Camp – Big Data Minicourse

http://ampcamp.berkeley.edu

MapReduce Tutorial

http://hadoop.apache.org/docs/stable/mapred_tutorial.html

IBM Tech Article

http://www.ibm.com/developerworks/data/library/techarticle/dm-1209hadoopbigdata/

Contributions/Feedback

To learn more about the resources listed above or to submit your own data resources, please get in touch!