The Statewide Digital Elevation Model
http://observatory.data2discovery.org/dvn/dv/ncfmp
These data are a bare earth digital elevation model of the state of North Carolina. The model was commissioned by the North Carolina Floodplain Mapping Program (NCFMP), and was generated from processed LIDAR data. The model is stored in ESRI ASCII files, each containing the DEM information for a grid spanning 1 degree of latitude by 1 degree of longitude.
The North Carolina Forecast System
http://observatory.data2discovery.org/dvn/dv/ncfs
These data are output from the storm surge, wind wave, and tide model ADCIRC, in NetCDF format. RENCI operates ADCIRC in a forecast for the North Carolina coastal waters and provides the forecasted water levels to decision makers in North Carolina and surrounding areas.
Million Songs
http://www.infochimps.com/collections/million-songs
http://labrosa.ee.columbia.edu/millionsong/
The Million Song Dataset is a freely-available collection of audio features and metadata for a million contemporary popular music tracks.
The First Billion Digits of Π
http://www.infochimps.com/datasets/the-first-billion-digits-of-pi
These data originate from PI World of JA0HXV. The data are organized in a flat text file with the digits arranged in 10 lines of 10 digits.
The U.S. Census: 1 Billion RDF Triples
http://www.rdfabout.com/demo/census/
This is a well-documented data set based on the well-described 2000 US Census data. In addition to reorganizing the data via an RDF format, the data are exposed via SPARQL.
Tiny Images Dataset
http://horatio.cs.nyu.edu/mit/tiny/data/index.html
http://groups.csail.mit.edu/vision/TinyImages/
Tiny Images dataset, which consists of 79,302,017 images, each being a 32×32 color image is stored in the form of large binary files which can be accessed by a Matlab toolbox.
Twitter Census – Developer Tools from Infochimps
http://www.infochimps.com/datasets/twitter-census-developer-tools-mapping-from-twitter-user-search-
Twitter data from over 24 million tweets scraped from March 2006 to November 2009.
Tools
A programmer’s guide to big data: 12 tools to know
Hadoop
IBM
http://www-01.ibm.com/software/data/bigdata/enterprise.html
MapReduce
http://research.google.com/archive/mapreduce.html
Oracle
http://www.oracle.com/us/technologies/big-data/index.html
SAS on Big Data
Tutorials and other How-Tos
AMP Camp – Big Data Minicourse
MapReduce Tutorial
http://hadoop.apache.org/docs/stable/mapred_tutorial.html
IBM Tech Article
http://www.ibm.com/developerworks/data/library/techarticle/dm-1209hadoopbigdata/
Contributions/Feedback
To learn more about the resources listed above or to submit your own data resources, please get in touch!