Data Sources

The new open theme for the MTFC allows teams to define their own research topic – just like you would a science fair project. If you haven’t yet, make sure to view the Actuarial Process Guide to learn more about the project structure! Every project is required to use real-world data to mathematically model the risks you are studying. We provide a number of recommended datasets that we think may be useful for an MTFC project; however, you are not limited to just these datasets. Other topics and other valuable datasets to use in these topics are welcome.

General Data Sources for Multiple Topics

The United States Census Bureau has a very easy to use and robust online query tool to help find any Census related data associated with a particular topic you are interested in. This is a simple text query tool that will bring forward various statistics, information, and data tables that can be downloaded and analyzed separately. This may be an extremely valuable tool early on in exploring your MTFC project topic and can help you understand what data is available in many areas from the Census.

The United States Census Bureau includes a list of all surveys and program. MTFC teams may find it valuable to explore the Census programs and topic areas. Each program includes publicly available datasets and tables that can be exported from the program website. Use this link to explore all the Census programs and data available and see if there is one on the topic area you are interested in.

USA Facts gathers data and statistics about many aspects of government spending finances, public life, statistics, and more. On this site you can compare historical trends, dig deep into the numbers, and interact with visualizations designed to give you a better idea of government’s impact on the nation and its people. In addition to Coronavirus information mentioned in another tutorial, USAFacts includes information in areas such as government finances, security and safety, the economy, and people and society. We encourage teams to explore the datasets available on these pages to help see what is available for the topics you are interested in.

The Society of Actuaries provides a number of research studies on various topics that may be of interest to MTFC teams. These reports do not provide downloadable datasets for analysis, but may provide some valuable background research and information on some topics. Teams can review these research reports to see if there is high-level background information that may be useful in defining their project topics.

Agriculture

The Risk Management Agency (RMA) of the United States Department of Agriculture provides publicly accessible information about federal crop insurance policies including details on cause of loss, liability, subsidy, indemnity, and more. MTF Challenge teams can use this information to model how future agricultural losses could be affected by changes in the climate, disease, drought, or other factors. Data from the USDA RMA can be viewed in multiple ways each with their own unique benefits and challenges:

View Cause of Loss Files: this page provides annual files including the cause of loss. These files are very large because they include data on all crops from all counties in whole country. So you will have to view each year’s file on its own, and gather the data that is important to your project into your own spreadsheet. It is critical that teams know this data shows insurance policies that HAD A LOSS ONLY. It does not include all insurance policies.

Cause of Loss Viewer: this link provides access to the cause of loss data in a visual format. It also allows you to download the data in a spreadsheet, but does not let you take multiple cause of losses at once. So you will have to download each cause of loss separately, but it still may be a good way of accessing the data. This is the same data as is accessed in the Cause of loss Files, so it shows policies that HAD A LOSS ONLY.

Report Generator: this page is a great way to access the insurance claims and it is very important for projects because it includes ALL CROP INSURANCE POLICIES, not just ones with a loss. The only issue with accessing data this way is that it doesn’t include monthly information, only annual summaries. So teams may want to combine this information with more detailed information from the cause of loss files.

The National Agricultural Statistics Service (NASS) of the United States Department of Agriculture (USDA) provides information on all crop, livestock, and aquaculture production values across the country. Because there is so much data here, sometimes you need to do a little searching to find what data is actually useful. Because this data set does not provide direct values of loss, it may be most useful in conjunction with other data sets like the insurance loss data from the USDA RMA. You may also want to explore other data sets that can be used with this to help define the severity and frequency of loss.

Quick Stats data access portal: the data here can be useful in gathering information about the # of acres planted, yield, production, price received, and many other items related crops, livestock, or aquaculture production. You can use this portal to search through data about each crop and filter down to the county level often – though it may take some time exploring the data to find which values have data to that level.

Climate Change

This organization from the federal government provides great resources to help teams understand what changes are expected in the climate. Checkout these links to the Global Change program to learn about specific ways climate scientists are expecting things to change, but don’t feel limited to just these resources:

Learn Climate Change Fundamentals: The Forth National Climate Assessment linked here has a ton of information in it, so you will have to do a little research to find what is important for your project, but the good thing about it is there are easy to navigate chapters where you can go directly to the information you would like to learn more about.

Learn more about Climate Change Forecast Scenarios: this is another link to the information in the Forth National Climate Assessment with easy to navigate chapters so you can see information about specific ways the climate is changing. For example, things that may be relevant for this MTF Challenge theme could be: Chapter 6 Temperature Change, Chapter 7 Precipitation Change, Chapter 8 Droughts, Floods, and Wildfire, and Chapter 9 Extreme Storms. However, teams should not feel limited to just this information. It is important to find what matters for your project!

The National Oceanic and Atmospheric Administration (NOAA) provides a great way to access historic climate data including precipitation, temperature, drought indexes, and more. You may find this data  beneficial in modeling relationships between historic trends and changes in other data with actual monetary values – such as crop loss from the USDA RMA data set, or Flood losses from the National Flood Insurance Program, or any number of other data sets with loss values. This data may also be valuable in conjunction with general climate change projections found in US Global Change program.

View Historic Climate Data: the data here can be viewed by state, county, or other divisions. Depending on your project you can look at particular historic trends and see if this could be useful in your project. The data can be exported to a spreadsheet to be analyzed in more detail.

Actuarial organizations in the US and Canada created this index to help monitor climate trends and educate people on the potential impacts of a changing climate. The index could be used in MTF Challenge projects to help identify relationships with past trends and the risks of changing climate factors.

View Actuaries Climate Index: the data on this website is easily downloadable as an excel spreadsheet. There are also good descriptions of what the data is included in the “guided tour” of the site. This may be a good place to start if you are interested in exploring this site as a supporting data set for your project.

Drought

The National Integrated Drought Information System, NIDIS, provides a wealth of data on all aspects of drought in the United States. It has a number of different products that may be useful to MTFC teams interested in researching risks of losses due to drought conditions. The data include maps, tables, graphics, and informational data points about drought.

NIDIS Data Maps and Tools page provides a great list of their data products that cover many different aspects of drought, from agricultural conditions and soil moisture to snow pack and water availability. Check out these data products to help evaluate what would be valuable to your MTFC project. As you define your Problem Statement you will be able to narrow in on particular types of drought related data (for example, using snow pack data to evaluate future water availability, or even seasonal loss potential for ski resorts). You may also find value in exploring the NIDIS Educational resources with additional background information about droughts and the data that is available.

The National Drought Mitigation Center, managed by the University of Nebraska, provides background information and educational resources to help students learn the basics about droughts. Information on the NDMC’s website provides students with some fundamental understanding of drought mitigation techniques and the effects that drought may have on various industries. It can be a valuable resource for students wanting to conduct an MTFC project on drought; however, it does not directly provide data sets on which teams can model their risks and recommendations. Students will have to use information from other sources in conjunction with the background information found here to complete a full mathematical analysis of their project.

The United States Drought Monitor, from the University of Nebraska, offers many informative ways to view actual data about drought severity throughout the United States. It provides tables of data about how much land and population was in various states of drought (D1 to D4) throughout the past two decades. It also provides time-series graphs of the state of droughts at county, state, regional, and national levels. Information from the Drought Monitor can be downloaded as CSV files, or viewed online. Students interested in pursuing MTFC projects related to risks from droughts will find this site extremely valuable; however, one thing not included here are direct values about losses due to the droughts. Teams will need to correlate information from the Drought Monitor with other data sets or find their own ways of mathematically modeling the actual losses from drought.

Floods

The Federal Emergency Management Agency (FEMA) provides many extensive datasets on losses due to natural disasters and other emergencies, including floods. There are several places on the FEMA website where these datasets can be accessed. Unfortunately, FEMA does not provide an easy-to-use web platform for searching for specific data within their datasets. Students will need to be able to use a database program such as Microsoft Access, computer programming, or an API to easily access the information they are looking for. The data can be downloaded as CSV files for use; however, the files are generally too large for MS Excel to handle and are not easily accessed this way.

OpenFEMA National Flood Insurance Program Claims: this data set from FEMA provides historic claims on the National Flood Insurance Programs (over 20,000,000 items). It is very valuable for students examining MTFC projects on losses due to flooding because it provides critical severity of loss information. The dataset is in one giant CSV file (over 200 MB) and is only easily accessed with a Database program like MS Access or a computer API / programming interface. The data here provides excellent information on all of the insurance claims to the NFIP separated by the location, type of structure, and other variables. This can be very valuable for students trying to define the severity of losses due to flooding.

OpenFEMA National Flood Insurance Program Policies: similarly to the NFIP Claims dataset above, this data provides national information on ALL NFIP policies (over 50,000,000 items) for the last several decades. It is a huge dataset and not easily accessed through MS excel. Students will need to be familiar with a computer programming interface (API) or a database program like MS Access to be able to easily access this information. There is no easy web interface to search for, or just download parts of the dataset. However, the data is very valuable and used in conjunction with the NFIP claims can provide not only severity of losses due to flooding, but also the frequencies of losses.

Other Natural Disasters

The Federal Emergency Management Agency (FEMA) is a wealth of knowledge and data on disaster management in the United States. FEMA handles the United States’ national disaster relief programs that provide support for everything from fires and floods to hurricanes, tornadoes, and everything else that has officially been declared a “National Disaster.”  Unfortunately, FEMA data does not have an easy to use web-interface. Some of the data files are very large and a simple spreadsheet program like Microsoft Excel will not be able to handle them. For these larger files, students will need to be familiar with a database program such as MS Access or a computer language that will allow them to parse the data directly.

FEMA Data Feeds: for students interested in disaster relief analysis projects, we recommend starting by exploring the FEMA Data Feeds page to see what kinds of data sets are available. Explore the “Datasets” dropdown to see a list of the major datasets that are available on all of FEMA’s programs. It provides multiple tables on prevalence of different disasters and their effects. The datasets are arranged by FEMA Program including: Individual Housing Assistance, Public assistance, Hazard Mitigation Assistance, National Flood Insurance Program, Community Emergency Response Data, Disaster Relief Fund, Emergency Management Performance Grants, and the National Household Survey. 

Disaster Declaration Summaries: This dataset lists all official FEMA Disaster Declarations, beginning with the first disaster declaration in 1953 and features all three disaster declaration types: major disaster, emergency, and fire management assistance. For teams pursuing an MTFC project related to losses from disasters, it is important to have this information because other datasets from FEMA reference the “disaster number” rather than a specific location or date. Although these summaries do not provide financial losses, they can be valuable when used in conjunction with other FEMA datasets that do (such as the National Flood Insurance Program). The actual dataset can be found in the “Full Data” dropdown.

Labor & Employment

The United States Bureau of Labor Statistics (BLS) provides many datasets related to employment, productivity, pay, benefits, workplace injuries, and other labor related information in the United States. On their data portal they also provide several ways of interacting with their data including some web-based interfaces that make it easier to search for and fine data that is relevant to your project.

BLS Data Tools: The BLS tracks data in many categories which are all listed on their Data tools page. There are several categories that may be relevant to students exploring MTFC projects involving labor, consumers, and employment.

Employment & Unemployment Databases: the BLS lists several databases about employment and job openings as well as how many people have filed for unemployment. Look through the section on Employment to find information most relevant to the sector and time periods you are interested in.

Pay and Benefits Databases: the BLS includes lots of information on wages and other benefits in different sectors. If you are considering a project that analyzes the potential for lost jobs, lost wages, etc. you may want to explore this data.

Workplace Injuries: these datasets provide data on occupational injuries by industry. If you are considering an MTFC project on the workplace risks of various jobs or industries you may want to explore these datasets.

The BLS also provides many other databases that you may find interesting. One of the most valuable things about all of the BLS data is that they provide easy-to-use web-based interfaces to interact with the data. We recommend using their “One Screen Data Search” to help you identify and then export the data that is relevant to your project.

Poverty

The U.S. Census Bureau’s Small Area Income and Poverty Estimates (SAIPE) program provides annual estimates of income and poverty statistics for all school districts, counties, and states. The main objective of this program is to provide estimates of income and poverty for the administration of federal programs and the allocation of federal funds to local jurisdictions. In addition to these federal programs, state and local programs use the income and poverty estimates for distributing funds and managing programs. SAIPE data can be accessed through several online tools, or can be downloaded to spreadsheets. The Census Bureau provides the following data access systems:

SAIPE Data Tool: this online data tool provides an interactive, visual system to search and analyze the SAIPE data. You can also refine the portions of the dataset that are best for your research project and then export those to a CVS spreadsheet.

SAIPE Datasets by Year: the data is provided as downloadable spreadsheets through this link. Identifying trends over time with this data may be more difficult because you have to download each year’s data separately.

Health

The U.S. Census Bureau’s Small Area Health Insurance Estimates (SAHIE) program produces the only source of data for single-year estimates of health insurance coverage status for all counties in the U.S. by selected economic and demographic characteristics. The data provides annual estimates from 2000 – 2018 for health insurance coverage at a state and county level.

SAHIE Data Tool: In addition to providing downloadable spreadsheets of their data, each of the U.S. Census Bureau programs provides a valuable web-based tool to access their data. MTFC teams can use this tool to analyze and identify specific sub-sets of this data that is relevant for their project.

You can also learn more about the SAHIE data on the Census Bureau’s SAHIE information site here.

The U.S. Center for Disease Control and Prevention’s National Center for Health Statistics (NCHS) includes several valuable data resources that may be helpful to MTFC teams exploring projects related to health, injury, and healthcare topics. We highlight three primary web-based tools provided by the NCHS, but don’t be afraid to look elsewhere within this data rich site!

Wonder Data Query Directory: CDC WONDER database includes data for U.S. births, deaths, cancer diagnoses, tuberculosis cases, vaccinations, environmental exposures, and population estimates, among many other topics. These data collections are available as online interactive databases with summary statistics, maps, charts, and data extracts. Using the online WONDER database tools MTFC teams can creates tables, maps, charts, and data exports that include the most relevant data for your project. Each of the ~20 data collections in the WONDER database include slightly different query structures, but they are all web-based, easy to use systems. Descriptions of each of the datasets available through WONDER can be found online here.

Web-based Injury Statistics Query and Reporting System: the CDC’s Web-based Injury Statistics Query and Reporting System (WISQARS) is an interactive, online database that provides fatal and nonfatal injury, violent death, and cost of injury data. The data found in WISQARS can be viewed graphically, or through downloadable spreadsheets. WISQARS also provides a robust filtering system to help teams find the most relevant data to their project.

Summary Health Statistics: Summary Health Statistics are descriptive statistics for various health measures including health status, conditions, health behaviors, activity limitations, health insurance coverage, and access and utilization of health care. These measures are available for the nation as a whole and for selected subgroups defined by characteristics such as sex, age, race, ethnicity, family income, and region of the United States. Health statistics are easily explored through the CDC’s online tool, and then downloaded as spreadsheets for further analysis.

The Centers for Medicare and Medicaid Services provide a wealth of knowledge on how and where healthcare costs are incurred. This data can be valuable for many types of MTFC research projects involving healthcare and expenses in the U.S. On the main CMS Research, Statistics, Data and Systems page there are many links that take you to various informational pages about CMS. For MTFC projects, you will probably find the most use out of the links in the “Statistics, Trends, and Reports” section, where they provide actual datasets that may be used in your models and analysis. Below are a few example areas that may be useful.

CMS Drug Spending: This provides cost information from Medicare and Medicaid drug spending for all drugs covered. You can view the data online, or download it into excel files for analysis. The CMS Drug Spending Dashboards are interactive, web-based tools that provide spending information for drugs in the Medicare Part B and D programs as well as Medicaid. This data may be useful to MTFC teams interested in quantifying the expenses covered for treatment of particular diseases or conditions.

CMS Chronic Conditions: Information on prevalence, utilization, and spending for specific chronic conditions and multiple chronic conditions. This page provides information on 21 chronic conditions including how much was spent treating each condition, the number of patients with the conditions, co-morbidities, and more. This data can be valuable for any MTFC project evaluating risk reduction mechanisms to address these conditions.

In addition to these two sources, CMS provides many other datasets about their programs and expenditures. MTFC teams interested in healthcare related projects may find it valuable to explore other CMS links in the Statistics, Trends, and Reports section of their Research, Statistics, Data and Systems page mentioned above.

The National Emergency Medical Services Information System (NEMSIS) is the national database that is used to store EMS data from the U.S. States and Territories. NEMSIS is a universal standard for how patient care information resulting from an emergency 911 call for assistance is collected. The database helps local, State and national EMS stakeholders more accurately assess EMS needs and performance, as well as support better strategic planning for the EMS systems of tomorrow. Data from NEMSIS is also used to help benchmark performance, determine effectiveness of clinical interventions, and facilitate cost-benefit analyses. MTFC team interested in evaluating the risk management and mitigation involving emergency services will find this data valuable in characterizing the severity and frequency of their risks.

EMS for Educators: the most valuable way for student MTFC teams to access the NEMSIS EMS data is through their EMS for Educators page. On this site you will see three valuable ways to view EMS data online. Version 2 Dashboards (for data from 2014-2016), Version 3 Dashboards (For data from 2017-2019), and the Data Cube. The EMS Data Cube can be a very valuable way to analyze NEMSIS data and is one we recommend if you want to be able to export specific quantities and data for your own analysis; however, it can also be very difficult to navigate. We recommend reviewing the video tutorials on this before trying to dive into using the Data Cube.

Veterans

The National Center for Veterans Analysis and Statistics provides useful information on many areas of Veterans Affairs. MTFC teams interested in risks that veterans face, or that governments or other organizations face in supporting veterans will find this site valuable for their projects. When you first start exploring data on this site, it may be valuable to start with their “Quick Facts” page that has some high-level overviews of veteran populations, utilization of services, expenditures, and more. Reviewing the Quick Facts may help MTFC teams better understand what data is available and what can be useful for their projects. Once you have viewed the Quick Facts, you can dive further into the actual data on these pages:

Veteran Populations: find information on the total number of veterans in various demographic regions.

State Summaries: provides PDF summaries of veteran expenditures and use of VA services by state. These summaries may provide valuable numbers on expenses in specific states.

Expenditure Tables: these spreadsheets provide valuable data about the quantity of expenses for a multitude of VA expenses. These data tables go back to 1996 and are divided not only by the type of expenses, but by state as well. MTFC teams can use this information to specifically quantify the severity of the costs for various veteran support programs.

Utilization Tables: the tables under this link provide additional detail on the amount of various VA services used and the costs associated with that use. Like the Expenditure tables, these may be valuable in calculating the severity of the “risks” (costs) to the VA services.

Retirement and Social Security

The Social Security Administration’s Research, Statistics and Policy Analysis page provides a wealth of data on Social Security expenditures and beneficiaries. This data may be valuable to MTFC teams examining social security costs and expenses. Teams may want to start by reviewing the “Fast Facts” information on this site to get a general understanding of social security data and information. Once you have familiarized yourself with the fast facts, the SSA provides many detailed datasets about SSA benefits and beneficiaries:

Annual Statistical Report on the Social Security Disability Insurance Program: this annual report (see earlier versions linked in the website for trend analysis) provides many spreadsheets about the annual expenses and beneficiaries of the Social Security Disability Insurance Program. You can find information about how many people receive benefits, how large the benefits are, etc.

Annual Statistical Supplement: This report provides spreadsheets on the expenditures and beneficiaries of all Social Security benefits. There are many different spreadsheets linked through this page, so each MTFC team will have to research which data sets will be most valuable for their work and select the specific datasets that will help you identify the severity and frequency of the risks that you have defined.

Additional pages on the SSA website provide other pieces of background information and possible datasets that may be valuable.

The Pension Benefit Guaranty Corporation (PBGC) was established to ensure that participants in defined benefit pension plans receive their pensions if their plans terminate without sufficient assets to pay promised benefits. The PBGC administers separate insurance programs to protect participants in Single-Employer and Multiemployer plans. Data on the pension benefits paid, liabilities of the PBGC, and other information can be found from 2000 forward in spreadsheet tables. MTFC teams interested in retirement savings for people with pensions may find this information valuable.

Housing and Homelessness

The Department of Housing and Urban Development’s Office of Policy Development and Research (PD&R) provides a large number of datasets about housing needs, market conditions, and existing HUD programs. PD&R also conducts research on priority housing and community development issues. Multiple datasets are made available through the PD&R website. Additional MTFC project topics that may be supported by this data include homelessness, poverty and low-income economic concerns. Some of the datasets available are noted below, but it may be valuable for MTFC teams to explore the main PD&R site above first to get a full picture of all the data that is available.

Low-Income Housing Tax Credit (LIHTC): this credit is provided to low-income housing builders. It is the most important resource for creating affordable housing in the United States today. The LIHTC database, created by HUD and available to the public since 1997, contains information on 48,672 projects and 3.23 million housing units placed in service between 1987 and 2018. This may be valuable to MTFC teams exploring local, state-wide, or national levels of low-income housing. There is an online query tool to make it easier to identify the data useful to your project.

Picture of Subsidized Households: this data covers programs that provide subsidies to reduce rents for low-income tenants. Assistance provided under HUD programs falls into three categories: public housing, tenant-based, and privately owned, project-based. These datasets provide information on all of these housing programs, but it can also be broken down into subsets by program, region, or other information.

There is a wealth of additional data from PD&R on their website that may also be valuable to MTFC teams. It will be helpful to review and refine your project statement as you learn what data is available and then revise what data sets you will use.

Transportation and Automobiles

The National Highway Transportation Safety Administration provides a variety of informational reports and datasets on numerous aspects of transportation related risks, accidents, fatalities, and more. NTHSA data is managed through their National Center for Statistics and Analysis (NCSA). The NCSA provides excellent online query tools to help you down select data that is most relevant to your project. The following links are examples of some of the NCSA datasets that may be valuable to MTFC projects; however, don’t forget to review the general NCSA website noted above to get a better understanding of all of the data available in this area.

Traffic Safety Facts: this may be a valuable place to start looking at data if you are pursuing a transportation, or auto-accident related MTFC project. This page provides valuable summaries and high-level information and trends about traffic safety and accidents.

Fatality and Injury Reporting System (FIRST): This system provides an interactive online query tool to allow you to customize specific reports on data involving automobile fatalities and. The tool provides many variables to help you select specific data, and differentiate various types or scenarios of risk.

State Traffic Safety Information: this tool provides data about the number, type, and results of crashes by state and county levels. It also has a very easy to use online query tool to help identify specific data of value to your MTFC project.

FARS Data Tables: this is where the NCSA includes straight exportable spreadsheets of information about traffic accidents and injuries. It is the same data as is included in the FIRST query tool mentioned above, but might be nice just to look at the tables for some projects.

The NCSA from the NHTSA also provides other datasets, tables, and information about transportation and traffic related safety and accidents. Spend some time exploring all of their data to refine your project statement and then narrow in on the specific data most valuable to your project.

Epidemics and Pandemics

The COVID Tracking Project is an informal tracking service gathering data from multiple sources about COVID-19 cases, hospitalizations, and fatalities. Every day volunteers compile the latest numbers on tests, cases, hospitalizations, and patient outcomes from every US state and territory. The site provides data in a number of formats including an API that can be used by computer savvy teams. Others can find links to downloadable CSV files for states and counties. Interestingly, this site also provides a “data quality grade” about how good they believe the data is that they have gathered for each state.

Johns Hopkins University Coronavirus Resource Center provides one of the most comprehensive tracking systems for COVID infections, hospitalizations, and deaths globally. The center provides data, statistics, and interactive mapping tools on a county, state, and national level. MTFC teams may find this resource valuable to explore the numbers infected, hospitalized, or deceased from COVID-19. While exporting the JHU data as spreadsheets is difficult, the online mapping tool is very comprehensive and easy to use to find specific data points that may be valuable for your analysis.

Rt Live provides data on the Rt value for COVID-19. This is a key measure of how fast the virus is growing. It’s the average number of people who become infected by an infectious person. If Rt is above 1.0, the virus will spread quickly. When Rt is below 1.0, the virus will stop spreading. The site provides a downloadable CSV file of all the calculated Rt values for U.S. states since early March, 2020. These values may be interesting to MTFC teams exploring pandemic related projects.

USAFacts is a non-profit organization helping to gather statistics and data about critical issues within the United States. The organization hosts a data portal about the coronavirus that includes some interesting statistics such as how many hospital beds there are and how many are being used. While much of the data is in graphic images or maps, some information can be found in exportable spreadsheets for further analysis. You may find the information on their COVID-19 Map by County and State page particularly useful.

SUBSCRIBE
SUBSCRIBE
SUBSCRIBE
Subscribe