The new open theme for the MTFC allows teams to define their own research topic – just like you would a science fair project. If you haven’t yet, make sure to view the Actuarial Process Guide to learn more about the project structure! Every project is required to use real world data to mathematically model the risks you are studying. We provide a number of recommended datasets that we think may be useful for an MTFC project; however, you are not limited to just these datasets. Other topics and other valuable datasets to use in these topics are welcome.
General Data Sources for Multiple Topics

The United States Census Bureau has a very easy to use and robust online query tool to help find any Census related data associated with a particular topic you are interested in. This is a simple text query tool that will bring forward various statistics, information, and data tables that can be downloaded and analyzed separately. This may be an extremely valuable tool early on in exploring your MTFC project topic and can help you understand what data is available in many areas from the Census. You can also sign up for weekly emails from the US Census Bureau for up-to-date information.

The United States Census Bureau includes a list of all surveys and program. MTFC teams may find it valuable to explore the Census programs and topic areas. Each program includes publicly available datasets and tables that can be exported from the program website. Use this link to explore all the Census programs and data available and see if there is one on the topic area you are interested in.
USA Facts gathers data and statistics about many aspects of government spending finances, public life, statistics, and more. On this site you can compare historical trends, dig deep into the numbers, and interact with visualizations designed to give you a better idea of government’s impact on the nation and its people. In addition to Coronavirus information mentioned in another tutorial, USAFacts includes information in areas such as government finances, security and safety, the economy, and people and society. We encourage teams to explore the datasets available on these pages to help see what is available for the topics you are interested in.

NASA Open Data initiative shares a wealth of data from its programs and missions that may be valuable for student teams. Particularly, NASA provides a lot of data about the Earth, our environment, and climate. There may also be interest from some teams in exploring some of NASA’s other data to identify risks and recommendations on projects with a more “out there” scope! The NASA Open Data site provides several links with information in actual numeric tables that may be valuable for mathematical models and characterizing risks; however, it also provides many great data visualizations that may be helpful in your team’s background research.
The Society of Actuaries provides a number of research studies on various topics that may be of interest to MTFC teams. These reports do not provide downloadable datasets for analysis, but may provide some valuable background research and information on some topics. Teams can review these research reports to see if there is high-level background information that may be useful in defining their project topics.

Github is a valuable resource to find data. It is mainly used for keeping computer programming code and data; however, people across the world use Github to host their own datasets and link to others that do as well. Try searching for “Open Data” or “Public Data” on the main Github site. This will bring up many collections. Many are only valuable if you are familiar with computer programming to use code available to analyze datasets. However, some people also have created lists of existing public datasets. For example, you could look at the “Awesome Public Datasets” to find a list, sorted by topic area, of over a hundred different specific open source datasets. Not all of these will be valuable for an actuarial risk analysis project, but you may find some unique to explore depending on your project topic.

Agriculture

The Risk Management Agency (RMA) of the United States Department of Agriculture provides publicly accessible information about federal crop insurance policies including details on cause of loss, liability, subsidy, indemnity, and more. MTF Challenge teams can use this information to model how future agricultural losses could be affected by changes in the climate, disease, drought, or other factors. Data from the USDA RMA can be viewed in multiple ways each with their own unique benefits and challenges:
View Cause of Loss Files: this page provides annual files including the cause of loss. These files are very large because they include data on all crops from all counties in whole country. So you will have to view each year’s file on its own, and gather the data that is important to your project into your own spreadsheet. It is critical that teams know this data shows insurance policies that HAD A LOSS ONLY. It does not include all insurance policies.
Cause of Loss Viewer: this link provides access to the cause of loss data in a visual format. It also allows you to download the data in a spreadsheet, but does not let you take multiple cause of losses at once. So you will have to download each cause of loss separately, but it still may be a good way of accessing the data. This is the same data as is accessed in the Cause of loss Files, so it shows policies that HAD A LOSS ONLY.
Report Generator: this page is a great way to access the insurance claims and it is very important for projects because it includes ALL CROP INSURANCE POLICIES, not just ones with a loss. The only issue with accessing data this way is that it doesn’t include monthly information, only annual summaries. So teams may want to combine this information with more detailed information from the cause of loss files.
The Society of Actuaries provides a number of research studies on various topics that may be of interest to MTFC teams. These reports do not provide downloadable datasets for analysis, but may provide some valuable background research and information on some topics. Teams can review these research reports to see if there is high-level background information that may be useful in defining their project topics.
Climate Change
Learn Climate Change Fundamentals: The Forth National Climate Assessment linked here has a ton of information in it, so you will have to do a little research to find what is important for your project, but the good thing about it is there are easy to navigate chapters where you can go directly to the information you would like to learn more about.
View Historic Climate Data: the data here can be viewed by state, county, or other divisions. Depending on your project you can look at particular historic trends and see if this could be useful in your project. The data can be exported to a spreadsheet to be analyzed in more detail.
View Actuaries Climate Index: the data on this website is easily downloadable as an excel spreadsheet. There are also good descriptions of what the data is included in the “guided tour” of the site. This may be a good place to start if you are interested in exploring this site as a supporting data set for your project.
Drought
The National Integrated Drought Information System, NIDIS, provides a wealth of data on all aspects of drought in the United States. It has a number of different products that may be useful to MTFC teams interested in researching risks of losses due to drought conditions. The data include maps, tables, graphics, and informational data points about drought.
NIDIS Data Maps and Tools page provides a great list of their data products that cover many different aspects of drought, from agricultural conditions and soil moisture to snow pack and water availability. Check out these data products to help evaluate what would be valuable to your MTFC project. As you define your Problem Statement you will be able to narrow in on particular types of drought related data (for example, using snow pack data to evaluate future water availability, or even seasonal loss potential for ski resorts). You may also find value in exploring the NIDIS Educational resources with additional background information about droughts and the data that is available.
The National Drought Mitigation Center, managed by the University of Nebraska, provides background information and educational resources to help students learn the basics about droughts. Information on the NDMC’s website provides students with some fundamental understanding of drought mitigation techniques and the effects that drought may have on various industries. It can be a valuable resource for students wanting to conduct an MTFC project on drought; however, it does not directly provide data sets on which teams can model their risks and recommendations. Students will have to use information from other sources in conjunction with the background information found here to complete a full mathematical analysis of their project.
The United States Drought Monitor, from the University of Nebraska, offers many informative ways to view actual data about drought severity throughout the United States. It provides tables of data about how much land and population was in various states of drought (D1 to D4) throughout the past two decades. It also provides time-series graphs of the state of droughts at county, state, regional, and national levels. Information from the Drought Monitor can be downloaded as CSV files, or viewed online. Students interested in pursuing MTFC projects related to risks from droughts will find this site extremely valuable; however, one thing not included here are direct values about losses due to the droughts. Teams will need to correlate information from the Drought Monitor with other data sets or find their own ways of mathematically modeling the actual losses from drought.
Floods
Other Natural Disasters
FEMA Data Feeds: for students interested in disaster relief analysis projects, we recommend starting by exploring the FEMA Data Feeds page to see what kinds of data sets are available. Explore the “Datasets” dropdown to see a list of the major datasets that are available on all of FEMA’s programs. It provides multiple tables on prevalence of different disasters and their effects. The datasets are arranged by FEMA Program including: Individual Housing Assistance, Public assistance, Hazard Mitigation Assistance, National Flood Insurance Program, Community Emergency Response Data, Disaster Relief Fund, Emergency Management Performance Grants, and the National Household Survey.
Disaster Declaration Summaries: This dataset lists all official FEMA Disaster Declarations, beginning with the first disaster declaration in 1953 and features all three disaster declaration types: major disaster, emergency, and fire management assistance. For teams pursuing an MTFC project related to losses from disasters, it is important to have this information because other datasets from FEMA reference the “disaster number” rather than a specific location or date. Although these summaries do not provide financial losses, they can be valuable when used in conjunction with other FEMA datasets that do (such as the National Flood Insurance Program). The actual dataset can be found in the “Full Data” dropdown.
Labor & Employment
BLS Data Tools: The BLS tracks data in many categories which are all listed on their Data tools page. There are several categories that may be relevant to students exploring MTFC projects involving labor, consumers, and employment.
Employment & Unemployment Databases: the BLS lists several databases about employment and job openings as well as how many people have filed for unemployment. Look through the section on Employment to find information most relevant to the sector and time periods you are interested in.
The BLS also provides many other databases that you may find interesting. One of the most valuable things about all of the BLS data is that they provide easy-to-use web-based interfaces to interact with the data. We recommend using their “One Screen Data Search” to help you identify and then export the data that is relevant to your project.
Poverty
The U.S. Census Bureau’s Small Area Income and Poverty Estimates (SAIPE) program provides annual estimates of income and poverty statistics for all school districts, counties, and states. The main objective of this program is to provide estimates of income and poverty for the administration of federal programs and the allocation of federal funds to local jurisdictions. In addition to these federal programs, state and local programs use the income and poverty estimates for distributing funds and managing programs. SAIPE data can be accessed through several online tools, or can be downloaded to spreadsheets. The Census Bureau provides the following data access systems:
SAIPE Data Tool: this online data tool provides an interactive, visual system to search and analyze the SAIPE data. You can also refine the portions of the dataset that are best for your research project and then export those to a CVS spreadsheet.
SAIPE Datasets by Year: the data is provided as downloadable spreadsheets through this link. Identifying trends over time with this data may be more difficult because you have to download each year’s data separately.
Health
The U.S. Census Bureau’s Small Area Health Insurance Estimates (SAHIE) program produces the only source of data for single-year estimates of health insurance coverage status for all counties in the U.S. by selected economic and demographic characteristics. The data provides annual estimates from 2000 – 2018 for health insurance coverage at a state and county level.
SAHIE Data Tool: In addition to providing downloadable spreadsheets of their data, each of the U.S. Census Bureau programs provides a valuable web-based tool to access their data. MTFC teams can use this tool to analyze and identify specific sub-sets of this data that is relevant for their project.
The U.S. Center for Disease Control and Prevention’s National Center for Health Statistics (NCHS) includes several valuable data resources that may be helpful to MTFC teams exploring projects related to health, injury, and healthcare topics. We highlight three primary web-based tools provided by the NCHS, but don’t be afraid to look elsewhere within this data rich site!
Wonder Data Query Directory: CDC WONDER database includes data for U.S. births, deaths, cancer diagnoses, tuberculosis cases, vaccinations, environmental exposures, and population estimates, among many other topics. These data collections are available as online interactive databases with summary statistics, maps, charts, and data extracts. Using the online WONDER database tools MTFC teams can creates tables, maps, charts, and data exports that include the most relevant data for your project. Each of the ~20 data collections in the WONDER database include slightly different query structures, but they are all web-based, easy to use systems. Descriptions of each of the datasets available through WONDER can be found online here.
Web-based Injury Statistics Query and Reporting System: the CDC’s Web-based Injury Statistics Query and Reporting System (WISQARS) is an interactive, online database that provides fatal and nonfatal injury, violent death, and cost of injury data. The data found in WISQARS can be viewed graphically, or through downloadable spreadsheets. WISQARS also provides a robust filtering system to help teams find the most relevant data to their project.
Summary Health Statistics: Summary Health Statistics are descriptive statistics for various health measures including health status, conditions, health behaviors, activity limitations, health insurance coverage, and access and utilization of health care. These measures are available for the nation as a whole and for selected subgroups defined by characteristics such as sex, age, race, ethnicity, family income, and region of the United States. Health statistics are easily explored through the CDC’s online tool, and then downloaded as spreadsheets for further analysis.
The Centers for Medicare and Medicaid Services provide a wealth of knowledge on how and where healthcare costs are incurred. This data can be valuable for many types of MTFC research projects involving healthcare and expenses in the U.S. On the main CMS Research, Statistics, Data and Systems page there are many links that take you to various informational pages about CMS. For MTFC projects, you will probably find the most use out of the links in the “Statistics, Trends, and Reports” section, where they provide actual datasets that may be used in your models and analysis. Below are a few example areas that may be useful.
CMS Drug Spending: This provides cost information from Medicare and Medicaid drug spending for all drugs covered. You can view the data online, or download it into excel files for analysis. The CMS Drug Spending Dashboards are interactive, web-based tools that provide spending information for drugs in the Medicare Part B and D programs as well as Medicaid. This data may be useful to MTFC teams interested in quantifying the expenses covered for treatment of particular diseases or conditions.
The National Emergency Medical Services Information System (NEMSIS) is the national database that is used to store EMS data from the U.S. States and Territories. NEMSIS is a universal standard for how patient care information resulting from an emergency 911 call for assistance is collected. The database helps local, State and national EMS stakeholders more accurately assess EMS needs and performance, as well as support better strategic planning for the EMS systems of tomorrow. Data from NEMSIS is also used to help benchmark performance, determine effectiveness of clinical interventions, and facilitate cost-benefit analyses. MTFC team interested in evaluating the risk management and mitigation involving emergency services will find this data valuable in characterizing the severity and frequency of their risks.
EMS for Educators: the most valuable way for student MTFC teams to access the NEMSIS EMS data is through their EMS for Educators page. On this site you will see three valuable ways to view EMS data online. Version 2 Dashboards (for data from 2014-2016), Version 3 Dashboards (For data from 2017-2019), and the Data Cube. The EMS Data Cube can be a very valuable way to analyze NEMSIS data and is one we recommend if you want to be able to export specific quantities and data for your own analysis; however, it can also be very difficult to navigate. We recommend reviewing the video tutorials on this before trying to dive into using the Data Cube.
Veterans
The National Center for Veterans Analysis and Statistics provides useful information on many areas of Veterans Affairs. MTFC teams interested in risks that veterans face, or that governments or other organizations face in supporting veterans will find this site valuable for their projects. When you first start exploring data on this site, it may be valuable to start with their “Quick Facts” page that has some high-level overviews of veteran populations, utilization of services, expenditures, and more. Reviewing the Quick Facts may help MTFC teams better understand what data is available and what can be useful for their projects. Once you have viewed the Quick Facts, you can dive further into the actual data on these pages:
Veteran Populations: find information on the total number of veterans in various demographic regions.
State Summaries: provides PDF summaries of veteran expenditures and use of VA services by state. These summaries may provide valuable numbers on expenses in specific states.
Expenditure Tables: these spreadsheets provide valuable data about the quantity of expenses for a multitude of VA expenses. These data tables go back to 1996 and are divided not only by the type of expenses, but by state as well. MTFC teams can use this information to specifically quantify the severity of the costs for various veteran support programs.
Utilization Tables: the tables under this link provide additional detail on the amount of various VA services used and the costs associated with that use. Like the Expenditure tables, these may be valuable in calculating the severity of the “risks” (costs) to the VA services.
Retirement and Social Security
The Social Security Administration’s Research, Statistics and Policy Analysis page provides a wealth of data on Social Security expenditures and beneficiaries. This data may be valuable to MTFC teams examining social security costs and expenses. Teams may want to start by reviewing the “Fast Facts” information on this site to get a general understanding of social security data and information. Once you have familiarized yourself with the fast facts, the SSA provides many detailed datasets about SSA benefits and beneficiaries:
Annual Statistical Report on the Social Security Disability Insurance Program: this annual report (see earlier versions linked in the website for trend analysis) provides many spreadsheets about the annual expenses and beneficiaries of the Social Security Disability Insurance Program. You can find information about how many people receive benefits, how large the benefits are, etc.
Annual Statistical Supplement: This report provides spreadsheets on the expenditures and beneficiaries of all Social Security benefits. There are many different spreadsheets linked through this page, so each MTFC team will have to research which data sets will be most valuable for their work and select the specific datasets that will help you identify the severity and frequency of the risks that you have defined.
Additional pages on the SSA website provide other pieces of background information and possible datasets that may be valuable.
The Pension Benefit Guaranty Corporation (PBGC) was established to ensure that participants in defined benefit pension plans receive their pensions if their plans terminate without sufficient assets to pay promised benefits. The PBGC administers separate insurance programs to protect participants in Single-Employer and Multiemployer plans. Data on the pension benefits paid, liabilities of the PBGC, and other information can be found from 2000 forward in spreadsheet tables. MTFC teams interested in retirement savings for people with pensions may find this information valuable.
Housing and Homelessness
The Department of Housing and Urban Development’s Office of Policy Development and Research (PD&R) provides a large number of datasets about housing needs, market conditions, and existing HUD programs. PD&R also conducts research on priority housing and community development issues. Multiple datasets are made available through the PD&R website. Additional MTFC project topics that may be supported by this data include homelessness, poverty and low-income economic concerns. Some of the datasets available are noted below, but it may be valuable for MTFC teams to explore the main PD&R site above first to get a full picture of all the data that is available.
Low-Income Housing Tax Credit (LIHTC): this credit is provided to low-income housing builders. It is the most important resource for creating affordable housing in the United States today. The LIHTC database, created by HUD and available to the public since 1997, contains information on 48,672 projects and 3.23 million housing units placed in service between 1987 and 2018. This may be valuable to MTFC teams exploring local, state-wide, or national levels of low-income housing. There is an online query tool to make it easier to identify the data useful to your project.
Picture of Subsidized Households: this data covers programs that provide subsidies to reduce rents for low-income tenants. Assistance provided under HUD programs falls into three categories: public housing, tenant-based, and privately owned, project-based. These datasets provide information on all of these housing programs, but it can also be broken down into subsets by program, region, or other information.
Transportation and Automobiles
The National Highway Transportation Safety Administration provides a variety of informational reports and datasets on numerous aspects of transportation related risks, accidents, fatalities, and more. NTHSA data is managed through their National Center for Statistics and Analysis (NCSA). The NCSA provides excellent online query tools to help you down select data that is most relevant to your project. The following links are examples of some of the NCSA datasets that may be valuable to MTFC projects; however, don’t forget to review the general NCSA website noted above to get a better understanding of all of the data available in this area.
Traffic Safety Facts: this may be a valuable place to start looking at data if you are pursuing a transportation, or auto-accident related MTFC project. This page provides valuable summaries and high-level information and trends about traffic safety and accidents.
Fatality and Injury Reporting System (FIRST): This system provides an interactive online query tool to allow you to customize specific reports on data involving automobile fatalities and. The tool provides many variables to help you select specific data, and differentiate various types or scenarios of risk.
State Traffic Safety Information: this tool provides data about the number, type, and results of crashes by state and county levels. It also has a very easy to use online query tool to help identify specific data of value to your MTFC project.
FARS Data Tables: this is where the NCSA includes straight exportable spreadsheets of information about traffic accidents and injuries. It is the same data as is included in the FIRST query tool mentioned above, but might be nice just to look at the tables for some projects.
Epidemics and Pandemics
Johns Hopkins University Coronavirus Resource Center provides one of the most comprehensive tracking systems for COVID infections, hospitalizations, and deaths globally. The center provides data, statistics, and interactive mapping tools on a county, state, and national level. MTFC teams may find this resource valuable to explore the numbers infected, hospitalized, or deceased from COVID-19. While exporting the JHU data as spreadsheets is difficult, the online mapping tool is very comprehensive and easy to use to find specific data points that may be valuable for your analysis.
Rt Live provides data on the Rt value for COVID-19. This is a key measure of how fast the virus is growing. It’s the average number of people who become infected by an infectious person. If Rt is above 1.0, the virus will spread quickly. When Rt is below 1.0, the virus will stop spreading. The site provides a downloadable CSV file of all the calculated Rt values for U.S. states since early March, 2020. These values may be interesting to MTFC teams exploring pandemic related projects.
USAFacts is a non-profit organization helping to gather statistics and data about critical issues within the United States. The organization hosts a data portal about the coronavirus that includes some interesting statistics such as how many hospital beds there are and how many are being used. While much of the data is in graphic images or maps, some information can be found in exportable spreadsheets for further analysis. You may find the information on their COVID-19 Map by County and State page particularly useful.