If you’ve spent any time skimming the front page of Reddit®, you’ve likely seen intriguing data visualization projects. Interesting ideas like plotting the top online search terms by state over time, charting the location of every made shot in a pro basketball player’s career or creating a heat map of where the tallest people live are often the domain of data professionals looking for low-risk—and oftentimes fun—ways to showcase their skills.
If you are interested in data and want to try your hand at data analysis and visualization projects like these, you might be running into a common problem: Finding free, high quality data sets to work with.
But don’t put your project plans on hold just yet—open-source data resources can help solve this problem. These databases are online, accessible for anyone to use, and best of all, they’re free. For would-be data professionals or anyone else looking to refine their data visualization and processing abilities, open-source resources are perfect for building your skills, experimenting and adding tangible project examples to your portfolio.
The benefits of finding the right data sources
Accessibility and affordability are important. But you also want data resources that are high quality. “With open-source data resources, you gain ready access to high-quality, accurate, reliable, secure and transparent data,” says Eric McGee, senior network engineer at TRG Datacenters. “The projects you build with this data will thus be more efficient and impactful.”
Reliability of the data should be a high concern, according to Jonathan Tian, co-founder of Mobitrix. “The results can be irrelevant with inaccurate information.”
Using a good resource right from the jump will also save you time. McGee points out that at good open-source data resources, qualified contributors have often already collected, sorted and analyzed the data, making your work less time-intensive.
12 open-source data resources available for free
Whether it’s health data, demographic information or polling results, these well-recommended data sources can provide a wealth of potential starting points for your next data project.
1. World Bank Open Data
A sophisticated site with massive repositories of data, this open-source resource is hard to beat. The search bar at the top allows you to find data in any area you are curious about. The site is also worth a visit for its “more resources” section alone. Some of these include data visualizations, which can be great inspiration!
2. World Health Organization (WHO)
Anything and everything health—this open-source data resource is easy to scroll around as you browse topics and datasets. If you want to work with data pertaining to any area of disease, public health, safety or health equity, you can hardly ask for a more reputable source.
3. Google® Public Data Explorer
Search through these datasets from all sorts of industries and organizations. The data explorer will also allow you to upload your own datasets to create visualizations of public data, making a nice opportunity to experiment with utilizing and presenting your findings.
4. United Nations Office on Drugs and Crime
If you are interested in data pertaining to more criminal topics, UNODAC is a global authority in those areas of research. Find data on homicides, drugs and firearms as well as illicit financial flows and substance use disorders.
5. Registry of Open Data on AWS
This repository has data partners like Digital Earth Africa, Facebook® Data for Good, NASA Space Act Agreement and many others who contribute datasets. The site features navigable keywords and tags for the different types of data to help you narrow your search by the parameters you want to work with.
Features around COVID-19 datasets and the cancer genome project both offer usage examples for the data as well.
6. U.S. Census Bureau®
The Census Bureau offers free, accessible datasets under the topics of business and economy, health, employment, housing and population. While maybe not the easiest source to navigate, the Census Bureau site houses massive troves of public government data.
7. GroupLens
GroupLens is a repository of social computing research through the University of Minnesota. One of their projects, MovieLens (a site that helps people find movies to watch) has active opportunities for online field experiments and open-source data. They include access to some of their datasets on the site.
8. National Center for Environmental Information
This resource from the National Oceanic and Atmospheric Administration (NOAA) exists to provide public access to “the Nation's treasure of geophysical data and information.” They offer data according to discipline areas, some of which include Geomagnetic Data & Models, Marine Geology & Geophysics, Natural Hazards and Space Weather.
All of these lead to more specific databases, such as the tsunami database where you can toggle many different parameters related to the scope of tsunamis, such as the number of houses destroyed or the vertical height of the tsunami.
9. Kaggle®
Kaggle offers datasets that data science people love for being easy to use. You can find data on many different industry areas with a variety of file types (SQLite, BigQuery, CSVs, etc.) to fit what you need.
10. Pew Research Center®
This organization offers one of the largest open-source data repositories out there. Their data comes from high-caliber surveys and covers a huge variety of topics. You do need to create an account to access the datasets, but the account is free.
11. Yelp®
As you might expect from a business built on collecting customer reviews, Yelp has amassed a ton of interesting and useful data. Even better, they offer open access to a large portion of their user-created business reviews for anyone looking to learn data-related skills. If you want to explore data about businesses and customers, you can find a lot to work with in here.
12. Google Trends®
If the data you are after pertains to internet search trends, check out this data resource. Marketers and businesses—and the data analysts who want to work in those areas—can benefit big time from utilizing trending terms. Look through their year in search features to get a snapshot of the largest search categories in various topics from year to year.
Get started on data analysis
Finding good data resources is extremely helpful, but like any raw resource, it is only as valuable as what you make of it. If you are eager to dig deeper into data analysis, working on data projects is an ideal place to start.
If you’re thinking through potential projects and ways to highlight your abilities as a data professional, it might help to keep in mind the skills employers are seeking out. Our article “16 Data Analyst Skills Employers Love to See” can help provide some useful direction.
Yelp is a registered trademark of Yelp, Inc.
Pew Research Center is a registered trademark of the Pew Research Center non-profit corporation.
Facebook is a registered trademark of Facebook, Inc.
United States Census Bureau is a registered trademark of the U.S. Bureau of the Census, U.S. Department of Commerce.