In this project, you will formulate a research question and statistical hypothesesdevelop a data production strategy, collect data, perform relevant analyses to answer the question, and produce a document that details your findings. More specific information about the project is below.

The first step is to find an appropriate, interesting data set. These data sets cover a variety of sources: Check out our list of free data mining tools.

United States Census Data: Census Bureau publishes reams of demographic data at the state, city, and even zip code level. The data set is fantastic for creating geographic data visualizations and can be accessed on the Census Bureau website.

Alternatively, the data can be accessed via an API. One convenient way to use that API is through the choroplethr. In general, this data is very clean and very comprehensive. The FBI crime data set is fascinating. Alternatively, you can look at the data geographically.

CDC Cause of Death: The Centers for Disease Control and Prevention maintains a database on cause of death. The data can be segmented in almost every way imaginable: Bureau of Labor Statistics: Many important economic indicators for the United States like unemployment and inflation can be found on the Bureau of Labor Statistics website.

Most of the data can be segmented both by time and by geography. Bureau of Economic Analysis: The Bureau of Economic Analysis also has national and regional economic data, including gross domestic product and exchange rates.

Dow Jones Weekly Returns: Predicting stock prices is a major application of data analysis and machine learning.

After the collapse of Enron, a data set of roughlyemails with message text and metadata were released. The data set is now famous and provides an excellent testing ground for text-related analysis. You also can explore other research uses of this data set through the page. The resulting file is 2.

Reddit released a data set of every comment that has ever been made on the site. Wikipedia provides instructions for downloading the text of English-language articlesin addition to other projects from the Wikimedia Foundation. Lending Club provides data about loan applications it has rejected as well as the performance of loans that it issued.

The data set lends itself both to categorization techniques will a given loan default as well as regressions how much will be paid back on a given loan. Inside Airbnb offers different data sets related to Airbnb listings in dozens of cities around the world.

Yelp maintains a dataset for use in personal, educational, and academic purposes. It includes 6 million reviews spanningbusinesses in 10 metropolitan areas.

This post was originally published October 13, It was last updated August 21, You can follow him on Twitter tjdegroat.

In applying statistics to, for example, a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model process to be studied.

Some topics for statistics projects, as suggested by McGraw-Hill Higher Education, include examining the factors that affect the gas mileage of a car, the gender distribution of a grocery store's customers, the physical factors affecting performance in sports and urban planning parameters across neighborhoods.

Statistics (academic discipline) What interesting topic should I choose for a high school statistics project? What interesting topics should I choose for a statistics project that are useful to farmers? What good statistics projects look like. Evaluation of your data to contribute to the argument; Even though there is a certain word limit set for all statistic projects, it is the quality of your project that matters most.

