All Data sets

Digital Element IP Intelligence Demographics

A geolocation API for all your demographics needs. Search by IP address to return data about a geographical area, including number of households, gender, age groups and language. Looking for more dimensions of IP searchable data? Try the Gelocation API, returning up to 20 geo data points of custom query information per IP address. Or the Domains API that retrieves ...
API

Latitude Longitude and Zip Code Conversions

This API returns approximated latitude/longitude centroids for a given zip code, along with the relative city, state, and county. When provided a lat/long, we will try a spatial mapping find the nearest zip centroid. Roughly, it draws a line from the coordinates you give it to the nearest set of coordinates in the data set. Coordinates in the dataset are considered to ...
Offsite

Zillow Neighborhoods

Zillow Neighborhoods retrieves geo data pertaining to neighborhoods within defined geometric parameters. Lat/long coordinates may return details for large areas like cities or towns, which can be dissected further into specific neighborhoods. Neighborhood Boundaries provides an important element of specificity for geo queries: Neighborhood distinctions are a critical ...
API

Geonames Places

The Geonames Places API locates all places within a specified area. Places are any geographic points that can be named. In other words, within a defined area, any geographic point that is “named” will appear as a match for your custom query. The Geonames API database includes over 10 million geographical names and 7.5 million unique features. The combined volume ...
API

Wikipedia Articles

Did you ever want to correlate Wikipedia articles with geographic locations? You know, so you can figure out whose castle that is on the hill you just drove past, know whether there’s a natural or supernatural phenomenon nearby, or find a tiny museum in your neighborhood? With the Wikipedia Articles API, you can swiftly sift through ~300k Wikipedia entries to find ...
API

Geocoding API

The Geocoding API is a powerful and useful tool that provides location information for any given address in the United States. Geocoding is a process that assigns geographic data (ie, latitude and longitude) to an address. For example, the API would take the address “1214 W 6th St. Austin, TX” and return the latitude 30.272896 and the longitude -97,757443. The API ...
API

NCDC Weather

The NCDC Weather API provides detailed weather data based on your geographically defined query. Weather data points for your query may include dew point, precipitation, snow depth, temperature, visibility, and wind speed details. The API is versatile in its ability to retrieve data by lat/long coordinates, bounding box or quadkey. It provides a simple solution for ...
API

Business Places by Locationary

The Business Places by Locationary API delivers quality business information based on your geographically defined query. Business information for queries may include business name, address, business type, country, city, state, postal code, and geometry type. Locationary provides an impressive volume of business information, that is also rich with precise detail and ...
API

Foursquare Places

The Foursquare Places API delivers uniquely rich information about venues, worldwide. Where many geolocation providers will deliver venue categories described across broad types: bars, restaurants, gyms, colleges, grocery stores, etc, Foursquare data is unique in the venue type depth provided: for example, bars are further classified as sports, gay, dive, wine, whiskey, ...
API

Digital Element IP Intelligence Geolocation

A geolocation API with 20 fields of search results, all customized to your IP query. Search by IP address to return data about a geographical area, including country, region, city, internet connection speed, global coordinates, postal and country codes, time zone, and even daylight savings observation status. Looking for more dimensions of IP searchable data? Try the ...
API

Digital Element IP Intelligence Domains

A reverse IP lookup API with 5 fields of search results, all customized to your IP query. Search by IP address to return data about the domain, company, ISP, NAICS industry code and proxy type for an IP address. Looking for more dimensions of IP searchable data? Try the Geolocation API, returning up to 20 geo data points of custom query information per IP address. Or ...
API

American Community Survey (Topline)

The 2009 American Community Survey (ACS) Topline API provides basic demographic data based on your geographically defined query. This geo to ACS data API searches by lat/long coordinates to retrieve ACS data about a geographical area, including education levels, household income, race statistics, household size, gender and age groups. For more information about the ...
API

EIA - Petroleum Data, Reports, Analysis, Surveys

Find statistics on crude oil, gasoline, diesel, propane, jet fuel, ethanol, and other liquid fuels, and information on petroleum prices, crude reserves and production, refining and processing, imports/exports, stocks, and consumption/sales.
Offsite

Open Notebook Science Challenge Solubility Dataset

A collection of non-aqueous solubility measurements, mainly aldehydes, carboxylic acids and amines. The data are linked to the laboratory notebook pages where the measurements were obtained. This is part of the Open Notebook Science Solubility Challenge. Sponsored by Submeta, Nature and Sigma-Aldrich.
Offsite

Richard Nixon - Presidential Recordings

Between February 16, 1971 and July 18, 1973 Richard Nixon secretly recorded roughly 3,700 hours of conversations and meetings in five different locations. With the exception of the manually-operated equipment in the Cabinet Room, Nixon’s recording system was sound-activated and recorded a wide range of conversations of varying audio and substantive quality. The ...
Offsite

Taxobox - Wikipedia Infoboxes with Taxonomic information on Animal Species

This dataset consists of a collection of Infoboxes from Wikipedia on the topic of Taxobox. Snippet: Antilles_pinktoe: name: Antilles Pink Toed Tarantula regnum: "[[Animal]]" classis: "[[Arachnid]]" phylum: "[[Arthropod]]" ordo: "[[Spider]]" imageWidth: 250px imageCaption: Female Avicularia versicolor binomial: Avicularia versicolor familia: ...
Free

Enron Email Dataset

From the CALO Project at Carnegie-Mellon University a massive dataset of emails recovered from discovery documents in the Enron trials About From distribution page: > This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into ...
Offsite

Crime Rates by State, 2004 and 2005, and by Type, 2005 (Cleaned up version)

Want Census data in a manageable format? Look no further – this data set of Crime Rates by State (2004 and 2005), and by Type (2005), has been meticulously processed to deliver US Census data in a simplified, clean format. The US Census Bureau distributes Statistical Abstract files in Microsoft Excel format, presenting numerous complexities with navigating the lengthy ...
Free

Crime Rates by State, 2004 and 2005, and by Type, 2005 (Cleaned up version)

The Statistical Abstract files are distributed by the US Census Department as Microsoft Excel files. These files have data mixed with notes and references, multiple tables per sheet, and, worst of all, the table headers are not easily matched to their rows and columns. A few files had extraneous characters in the title. These were corrected to be consistent. A few files ...
Free

Average Hours Worked Per Day by Employed Persons: 2005

Want Census data in a manageable format? Look no further – this data set of Average Hours Worked Per Day by Employed Persons: 2005 has been meticulously processed to deliver US Census data in a simplified, clean format. The US Census Bureau distributes Statistical Abstract files in Microsoft Excel format, presenting numerous complexities with navigating the lengthy ...
Free

The Whitburn Project: 120 Years of Music Chart History

For the last ten years, obsessive record collectors in Usenet have been working on the Whitburn Project — a huge undertaking to preserve and share high-quality recordings of every popular song since the 1890s. To assist their efforts, they’ve created a spreadsheet of 37,000 songs and 112 columns of raw data, including each song’s duration, beats-per-minute, ...
Offsite

Retrosheet: Ballpark Data by Major League Baseball Franchise

All ballparks used for Major League Baseball that have opened since 1903 and many before that. The list for each park contains significant “firsts” to occur there. Parks used for 1 or 2 games are not included. Primary research was done by Jim Herdman and David Vincent. Please notify us of any additions or changes.
Offsite

Word List - 100,000 + Official Crossword Words (Excel readable)

A word list with over 100,000 entries that are officially permitted in crossword games like Scrabble™. This word list is available in a simple, alphabetically-ordered Excel format, making it convenient for reference, spell-checking, or in more sophisticated application, for developers looking to build a custom spelling dictionary. The entries include variants of ...
Free

Last.fm Music Tags

This is a set of artist and genre tag data collected from Last.fm using the Audioscrobbler webservice during the Spring of 2007. The data consists of the raw tag counts for the 100 most frequently occuring tags that Last.fm listeners have applied to over 20,000 artists. Included are artist tags and genre related tags. An undocumented (and deprecated) option of the ...
Offsite

Retrosheet: Event Files (play-by-play) data for Major League Baseball Games

Retrosheet was founded in 1989 for the purpose of computerizing play-by-play accounts of as many pre-1984 major league games as possible. Play-by-play files (also called event files) — Data files containing literally every play in the included games. The files are designed to be processed further using your own computer. We provide some software to help and some ...
Offsite

Corpus of Erotica Stories

Excellent resource for working with natural language processing and machine learning. This corpus consists of 4771 raw text erotica stories collected from www.textfiles.com/sex/EROTICA. A logical flow from the encouragement of writing on BBSes, people have been writing some form of erotica or sexual narrative for others for quite some time. With the advent of Fidonet ...
Free
Question-mark Can't find what you're looking for? Drop us a line.