|
University
of Wisconsin-Madison
|
|
Print
Library Resources Data Library
Resources CDERR (Current DEmographic Research
Reports) Demographic Resources (Organizations, Internet, etc.) Reference Resources (Directories, Internet search, writing, etc.) |
Data Extractors | Data Archives Social Science Data Extractors Contents: (Note: Please wait for the entire page to load before clicking on any of the contents.) This page describes a selection of web sites that allow the user to quickly extract social science data. Either raw data, summary data, or both can be extracted. Various downloading options (data format, type of Internet access protocol, supporting documentation, etc.) are supported by various systems. Extraction systems can be interactive web sites, software systems that must be downloaded to the PC and installed there, or both. In some cases, interactive graphing and mapping are available as well. Some sites allow downloading of full data sets that underlay the extraction system. Decennial Census Summary Numbers
International Population, Vital Statistics, and Socio-Economic Estimates and Projections Microdata and Longitudinal Surveys Multiple Study Sites Vital Statistics and Health Decennial Census Summary Numbers
http://factfinder.census.gov/ Census Bureau's American Factfinder allows the user to query for data from the 2000 Census as it is released. At present, this includes data from the PL 94-171 Summary File, SF1 files, Supplementary Survey Summary Tables, and Demographic profiles. Users choose variables and geographies. Interactive mapping is also available, as are download options. Note that Factfinder also contains extractable data from the 1990 Census STF1 and STF3 files. Click on "What's in Factfinder," and then "Data Sets." Then scroll to Census 1990. http://www.census.gov/main/www/cen1990.html Census Bureau's American Factfinder allows the user to query for data from STF1 (100% count of basic demographic variables) and STF3 (sample count of all socioeconomic and demographic variables). Users can pick both geographies and variables. Download options are available. (http://Sunsite.berkeley.edu/GovData/info/) This system contains data from SSTF1, SSTF2 (Ancestry of the Population of the US) SSTF3 (Persons of Hispanic Origin in the United States), and SSTF5 (Characteristics of Asian and Pacific Islander Population of the US) at this time. Users can pick both geographies and variables. The Government Data Library has recently made all of the 22 Subject Summary Tape Files (except the ones available for extraction at this site) available for downloading via FTP from this site. Downloaded FTP files use the "Go" extraction system. (http://mcdc2.missouri.edu/websas/xtabs3v2.html) This system allows users to extract basic STF3 information by geographic variable. Users can pick only geographic variables. Since profiles can be generated for only one geographic area at a time, this extraction system is most powerful as a ready reference data source. (http://censtats.census.gov/pl94/pl94.shtml) The Census Bureau has released to the Subcommittee on the Census, Committee on Government Reform and Oversight, a copy of the Public Law 94 - 171 data adjusted to reflect the measured net undercounts based on the 1990 Census Post-Enumeration Survey. Users can pick various geographies from state down to block level. (http://fisher.lib.virginia.edu/census/) This system provides selected state and county level variables from decennial censuses from 1790-1970 at present. Users can pick geographies and variables.
(http://plue.sedac.ciesin.org/plue/ddviewer/) . A tutorial for DDViewer (Java version) can be found at: http://plue.sedac.ciesin.org/plue/ddviewer/ddvJava30/instructions.html
(http://wonder.cdc.gov/population.html) Center for Disease Control's Wonder extractor allows the user to pick geographies (down to the county level), race, gender, age, and time for the estimates database. Geographies, times, and demographics can be picked for the projection database. Users can create a two dimensional table using any of five variables in each database. Download options are available. Note that users must login at the main site before going to the population estimates (Census) and projection sites. Populations are provided by the Census Bureau. Decennial Census Public Use Microdata Samples (PUMS) (http://fisher.lib.virginia.edu/pums) The University of Virginia Geospatial and Statistical Data Center provides separate extraction systems for a customized subset of data, descriptive statistics, and cross tabulations from the 1990 1% PUMS files for all states. Users pick geographies, variables, type of subsampling, and table dimensions (for cross tabs). Data files are returned as both comma delimited ASCII and SPSS portable files, along with variable lists and an electronic codebook. Several descriptive statistics are available. (http://www.ipums.umn.edu/) IPUMS contains "high precision" samples drawn from the 1850-1990 censuses. It assigns uniform codes across the samples. Users can pick geographies, variables, sample sizes, and cases. Output can be accessed in raw or compressed form, with a customized codebook and SPSS data definition statements. Note that free registration is required to use the extraction system. International Population, Vital Statistics, and Socio-Economic Estimates and Projections (http://www.census.gov/ipc/www/idbnew.html) IDB allows the user to pick basic demographic and socio-economic variables for any or all of 227 countries around the world. Summary or detailed data is available from as early as 1950 to projections as late as 2050. In addition, static or "active" population pyramids are available. Users can aggregate selected countries into chosen regions. Countries can be ranked by population for any year from 1950-2050. Download options are available. IDB can also be downloaded and used locally on the PC. (http://www.un.org/Pubs/CyberSchoolBus/infonation/e_infonation.htm) This extraction system allows the user to pick up to seven countries, and up to four variables. It covers population, economy, social indicators, and geography. Data is available for the latest year only (snapshot). (http://apps.fao.org/cgi-bin/nph-db.pl) The United Nations Food and Agriculture Organization provides annual population data from 1961-present by country. Users can pick geographies, variables, and years. Basic descriptive statistics are available. Download options are available. FAO also provides a long term quinquennial population series for total and rural/urban population. (http://apps.fao.org/lim500/nph-wrap.pl?Population.LTS&Domain=SUA). (http://www.unicef-icdc.org/resources/transmonee.html) TransMONEE contains 120 economic and social indicators (in nine subject areas: population, reproductive behavior, family stability, mortality, morbidity, education, crime & juvenile justice, employment & income, and macro indicators) for 27 transition countries in Central Europe and the former Soviet Union. The extraction system must be downloaded and installed on the user's PC. Users can pick countries, regions, variables, and years. Annual time series data is available as far back as 1989. Data can be converted to indices or annual percent change and charts can be created. Download options are available. There is also an online version of the database. Microdata and Longitudinal Surveys (http://www.unicon.com/cpsonweb.html) An offshoot of Unicon's CPS Utilities, this extraction system allows the user to choose from any of nearly 1,100 variables from the Census Bureau's March Current Population Survey Supplement from 1962-1998. Users can choose variables and years, and create custom variables. Download options are available. NOTE!! THIS EXTRACTION SYSTEM RUNS ONLY ON MICROSOFT INTERNET EXPLORER 4.0 OR ABOVE. CPS on Web is presently freely available during the "development" stage. (http://www.measuredhs.com/statcompiler/start.cfm/) Statcomplier Express: (http://www.measuredhs.com/statcompiler/canned.cfm) HIV-STATcompiler: (http://www.measuredhs.com/hivdata/data/start.cfm) Demographic and Health Surveys, provided by Macro International, "collect information on fertility and family planning, maternal and child health, child survival, AIDS/STIs, and other reproductive health topics. Surveys are implemented by host-country institutions, usually government statistical offices. On average, 4,000 to 8,000 women of childbearing age are interviewed in a standard survey. Many countries also survey men on family planning and health issues." This system allows users to pick basic summary statistics from over 100 surveys in over 60 countries. Statistics are available on fertility, child mortality, contraception, maternity care, and child health. Download options are available. (http://hrsweb.isr.umich.edu/concord/index.html) Provided by University of Michigan's Institute for Social Research "the Health and Retirement Study (HRS) and Asset and Health Dynamics Among the Oldest Old (AHEAD) are nationally representative longitudinal data collections that examine retirement and the aging of society." This extraction system is a useful "tool for cross-referencing questions across time." Users pick the waves of the studies they are interested in, subject sections from those waves, and how the output is sorted. Question text can be searched. The concordance can also be downloaded. Note that this is a "preliminary concordance at this time" and that all matches should be confirmed with the codebook and/or questionnaire. (http://www.gesis.org/en/data_service/search/qbase/search_en.htm) "The ISSP is a continuing annual programme of cross-national collaboration on surveys covering topics important for social science research. It brings together pre-existing social science projects and coordinates research goals, thereby adding a cross-national, cross-cultural perspective to the individual national studies. Thirty-one countries are members of the ISSP." The codebook retrieval system allows users to query for variables from the survey. Retrieval is available in PDF format. (http://www.lisproject.org/lestechdoc.htm The Luxembourg Employment Study, a project associated with the Luxembourg Income Study (see below) began in 1994. Its aim is to "construct a databank containing Labour Force Surveys from the early nineties from countries with quite different labour market structures. These surveys provide detailed information on areas like job search, employment characteristics, comparable occupations, investment in education, migration, etc. The LES team has harmonised and standardised the micro data from the labour force surveys in order to facilitate comparative research." After registering, users may submit statistical program jobs to the LES in order to analyze data (http://www.lisproject.org/dataccess.htm). The "User Information" section explains this process. The "Using the Database" section provides links to available electronic documentation needed to set up program statements. LES can process SAS, SPSS, or STATA jobs via email. Note that the service is freely available only to researchers in LES member countries. (http://www.lisproject.org/techdoc.htm) The Luxembourg Income Study is an international research project that maintains a harmonized database with microdata on household income surveys for most OECD-countries. The main goal is to provide researchers free access to these data and to compile international comparable poverty and inequality indicators. The LIS is a non-profit organization that started in 1983 and is funded by the national science and social science research foundations of its member countries and by the government of Luxembourg. Data are directly taken from household surveys or administrative records in the countries involved. Microdata are standardized and become part of the database. Researchers in member countries have access to this data, after registration. LIS can process SAS, SPSS, or STATA jobs via email. Available datasets and documentation can be found at the site. For more information on LIS, see the Database Access (http://www.lisproject.org/dataccess.htm). (http://webapp.icpsr.umich.edu/cocoon/SAMHDA-SERIES/00035.xml#das) Monitoring the Future, provided by the Institute for Social Research at the University of Michigan, has "surveyed a nationwide sample of high school seniors every year since 1975. Since 1991, the project has also included nationwide samples of eighth and tenth grade students." The SAMHDA system allows extraction from MTS 12th grade surveys back to 1995. Users can pick variables, cases, and get raw data or descriptive statistics. Download options are available. (http://www.cpc.unc.edu/projects/addhealth/data/using/online) Carolina Population Center's "Add Health is a school-based study of the health-related behaviors of adolescents in grades 7-12. It has been designed to explore the causes of these behaviors, with an emphasis on the influence of social context." This extraction system allows users to access summary data using on the fly SAS procedures. Frequencies, frequency distributions, contingency tables, measures of central tendency, and descriptive statistics are available. Users can pick variables, type of central tendency, and type of statistics. Note that users must agree to the ADD Health Data Use Agreement before using the system, and that only Wave I public use data is supported at this time. (http://simba.isr.umich.edu/) The Panel Study of Income Dynamics is "a longitudinal survey of a representative sample of US individuals and the families in which they reside. It has been ongoing since 1968. The data are collected annually, and the data files contain the full span of information collected over the course of the study. PSID data can be used for cross-sectional, longitudinal, and intergenerational analysis and for studying both individuals and families." The PSID subsetting system allows the user to pick years (final or early release from 1968 on) variables (with or without conditions), and type of output. Multiple data definition file statement types are supported. Download options are available. (http://wonder.cdc.gov/) Among the useful vital and health related statistical data sets Wonder provides extraction for are: AIDS Cases Reported by State and Local Health Departments (users can pick demographics, case-definitions, dates of diagnosis, dates of report, HIV exposure group, and mortality); Microfiche AIDS (same as above except users can also pick more detailed geographies); SEER (Cancer Surveillance, Epidemiology and End Results) (users can pick geographies, demographics, time periods and disease codes); ICD9 Finder (disease by classification number) (users can search by keyword); State Injury Mortality Data (users can pick geographies and injury type); Mortality (users can pick geographies, demographics, and time periods); Natality (users can pick geographies, demographics, and natality variables, and create two dimensional tables by any of nine variables); Sexually Transmitted Disease Morbidity (users can pick geographies, times, genders, and diseases); and Tuberculosis Surveillance (users can pick geographies, times, demographics, and disease case characteristics). Note that time periods covered vary by database. Download options are available. Wonder also hosts many bibliographic databases. Users must login at the main site before accessing data. (http://www.seer.cancer.gov/query/) NCI's Cancer Query System on the Web (CANQUES, available only to browsers that support Java 1.1), "allows the user to access over 10 million pre-calculated cancer statistics. Statistics are available from SEER Cancer Statistics Reviews, 1973-2000. Users can retrieve data related to: SEER Incidence Rates and Trends; US Mortality Rates and Trends; Individual State Mortality Rates; SEER Mortality Rates; Median Age at Diagnosis and Death; NHL and Kaposi's Sarcoma in San Francisco; and Relative Survival Rates by SEER Registry and Historic Stage. Users can pick demographics, geographies (when available), types of cancers, and time periods. Download options are available. (http://dataferrett.census.gov/) FERRET provides access to the 1993 National Health Interview Survey (NHIS) and the National Health and Nutrition Examination Survey III (NHANES 1988-1994). Users can pick variables and selected values. Selected data (raw or SAS data sets) or descriptive statistics can be downloaded. Download options are available. Note that users must register before accessing data. (http://hcup.ahrq.gov/HCUPnet.asp) HCUPnet provides interactive access to national statistics about hospital stays. Users can choose: diagnoses and procedures; outcomes and measures; patient characteristics; and hospital characteristics. Data are gathered from the latest Nationwide Inpatient Sample. (http://www.census.gov/ftp/pub/ipc/www/hivaidsn.html) HIV/AIDS Surveillance Database "is a compilation of information from widely scattered small scale surveys on the AIDS pandemic and HIV seroprevalence in developing countries. Currently the database contains around 40,000 individual data records from over 4,000 publications and presentations. The database also includes information from incidence studies." This extraction system requires downloading and installing the database on a local PC. Users can pick geographies from over 160 countries, population subgroup, age and sex. Summary tables and maps are also available. (http://www.icpsr.umich.edu/SAMHDA/das.html) Health related surveys covered in the SAMHDA DAS include the National Household Survey on Drug Abuse, Treatment Episode Data Set, and Washington D.C. Metropolitan Area Drug Study (DC*MADS). Users can pick variables, cases, and get raw data or descriptive statistics. Download options are available. (http://sda.berkeley.edu:7502/archive.htm) DAS provides access to the following health related surveys: National Health Interview Survey (NHIS) 1991 (Person File only), and Health Studies From Brazil (in Portuguese). Users can pick variables, cases, and get raw data or descriptive statistics. Download options are available. (http://209.217.72.34/aging/) Trends in Health and Aging "contains information on trends in health-related behaviors, health status, health care utilization, and cost of care for the older population in the United States." At present over 20 national tables are available. NCHS will provide state based tables and "estimates and official data from other sources" in the future. Users can download the Beyond 20/20 extraction system or can browse tables directly (Microsoft IE 4.01 or Netscape 4.51 or higher required for the second option). Download options are available. Provided by the National Center for Health Statistics, these tables provide information about health at the state level. Tables can be viewed, manipulated, printed, or downloaded in the Beyond 20/20 format at this time. This format requires the user to download special software from the site. At present, the site contains mostly mortality tables. (http://www.who.int/whosis/en/) This page contains WHO and other organization extractors for country time series data regarding core health, mortality, population, and disease indicators. (http://nces.ed.gov/das/) This extraction system requires downloading and installing the DAS for Windows extraction system. DAS allows the user to create tables from data in various NCES surveys. In addition to table variables (usually in percentage format), standard errors are provided. DAS does not allow for extracting raw data from the surveys at this time. Surveys covered include: National Postsecondary Students Aid Study (NPSAS); National Study of Postsecondary Faculty (NSOPF); Beginning Postsecondary Students (BPS, 90, 92); Baccalaureate and Beyond (B&B); National Longitudinal Study of 1972 (NLS); High School and Beyond (HS & B); and National Educational Longitudinal Study of 1988 (NELS). Years and/or follow-ups of the studies vary. Download options are available. (http://nces.ed.gov/quicktables/) This search tool lets you locate all tables/figures/charts published in the inventory of NCES' Education Statistics Quarterly; the NEDRC (National Education Data Resource Center) Postsecondary Table Library; the Condition of Education; the Digest of Education Statistics, and many other NCES publications.New tables are constantly being added to this database (close to 4,000 recently published tables, graphs & figures are now available). Global Education Database--US Agency for International Development (http://qesdb.cdie.org/ged/index.html) In July 2003, USAID's Office of Education released its fourth PC-based database of international education statistics, GED 2003. Users in the Agency and their development partners worldwide were able to access the data by downloading the GED program to their desktop computers from a CD-ROM or from the web. In an effort to make current and future education data even more accessible, EGAT/ED has developed this web-based version of the GED. All of the data can now be accessed online from this site without downloading a program. There are 224 indicators compiled from the UNESCO Institute of Statistics and 71 indicators compiled from the Demographic and Health Surveys (DHS), a USAID program that has conducted full-scale nationally representative household surveys in over 60 developing countries since 1984. USAID plans to update this online database as new data become available. (http://www.icpsr.umich.edu/NACJD/SDA/das.html) Users of this interactive DAS can run frequencies or cross tabs, comparisons of means or correlations, or download customized sets of variables/cases for the following data sets: Uniform Crime Reporting Data: Supplementary Homicide Reports; National Crime Victimization Survey; National Corrections Reporting Program; Survey of Inmates in State Correctional Facilities; and Census of State and Federal Adult Correctional Facilities. Years of data vary by survey. Users can pick variables, cases, and format of output. Download options are available. (http://fjsrc.urban.org/) The FJSRC "maintains the Bureau of Justice Statistics (BJS) Federal Justice Statistics Program (FJSP) database, which contains information about suspects and defendants processed in the Federal criminal justice system. Using data obtained from Federal agencies, the FJSP compiles comprehensive information describing defendants from each stage of Federal criminal case processing." Users can download compressed ASCII versions of Standard Analysis File (SAF) data sets (after registration), or use a query system to download selected summary statistics (frequencies and cross tabulations) from them (Defendants in Criminal Cases Filed and Terminated in U.S. District Court, Offenders Entering and Exiting Federal Prisons, Population of Offenders in Federal Prisons, and Defendants Sentenced) from 1994 to the latest year available. Download options are available. Poverty, Welfare, Income and Employment (http://www.urban.org/center/anf/nsaf.cfm) This system allows the user to access information at the state level on income security, health, child well-being, demographics, fiscal and political conditions, and social services. Users can pick variables and years and 50 state tables (HTML format only) are generated for selected recent years. The database can also be downloaded for installation and use on the PC. Download options are available in the PC version. (http://www.census.gov/hhes/www/saipe/index.html) This system allows the user to pick selected years, geographies, and selected poverty variables, as well as median income. An HTML table is returned. (http://socds.huduser.org/index.html) Provided by HUD and derived from US Census Bureau and Bureau of Labor Statistics data, SCDS provides access to census data (1970, 1980, 1990, and 2000), current labor force data (1990-2001), and County Business Patterns data (1991-1997) for central cities and metropolitan areas in the US. Users can pick geographies, variables, and/or time periods depending on the database. As data can be accessed for only one municipal unit at a time, and as there are no download options, this database can be most effectively used as a ready reference resource (http://www.bls.gov/data/) The Bureau of Labor Statistics offers three extraction systems for access to its thousands of time series covering employment & unemployment, prices & living conditions, compensation & working conditions, and productivity & technology. Monthly, quarterly, and annual data is available (depending on the database) from as early as 1913 to the present. Most Requested Series retrieves data for commonly requested BLS time series. Selective Access retrieves data for all available time series. Series Report retrieves data for all available time series by series identifier number. Users pick variables, geographies, seasonality, and output type. Raw data can also be downloaded via FTP. (http://inforumweb.umd.edu/Econdata.html) EconData contains hundreds of thousands of regional, national, and international macroeconomic time series from various US government agencies such as the Bureau of Labor Statistics, Bureau of Economic Analysis, Census Bureau, and Federal Reserve Board. Monthly, quarterly, and annual time series are available going back as far as 1929. This extraction system requires downloading and installing on the PC. Users can pick variables and time. Download options are available, as well as options for graphics and more complex econometric analysis. (http://www.EconoMagic.com/) While not exactly an extraction system in the traditional sense, Economics Professor Ted Bos has set up a web site that offers access to so many macroeconomic time series (over 75,000) that it, in essence, acts as one. Monthly, quarterly, and annual time series are available from the Bureau of Labor Statistics, Bureau of Economic Analysis, Census Bureau, and Federal Reserve Board. Time periods vary. Data is available at national, state, county, and municipal level. Users select the time series they are interested in. Download options are available, as are charts of the data. (http://pwt.econ.upenn.edu/) The Penn World Tables contain a set of international economic comparisons for 152 countries in 29 macroeconomic topics from as early as 1950 to 2000. PWT attempts to provide a set of macroeconomic variables that is standardized across time and countries to facilitate ease of international comparisons. The best information about the construction of the tables can be found in Summers and Heston's article in the May 1991 Quarterly Journal of Economics: "The Penn World Table (Mark 5): An Expanded Set of International Comparisons, 1950-1988". Both extraction sites allow users to pick countries, years, and variables. NBER allows multiple picks. The advantage to the CHASS site is that it allows multiple download options, including plots. Miscellaneous Multi-Subject Sites (http://factfinder.census.gov/) American FactFinder provides interactive access to information about "community, economy, and society" from some of the largest Census databases. It provides "quick reports," detailed tables, and/or maps from the US Census, American Community Survey, and Economic Census at this time. Data can be extracted by geogaphies and variables. Selected summary statisics can be calculated from raw data. (http://censtats.census.gov/) Census Bureau's CenStats offers is an extraction system that provides access to several popular Census Bureau databases, including the Building Permits,Census Tract Street Locator, Consolidated Federal Funds Report, County and Zip Business Patterns, USA Counties, Detailed Occupation by Race and Sex, 1990 Public Law 94-171, and International Trade Data. Data can be extracted by geographies, variables, and time, depending upon the database. CenStats is a quick and powerful system for extracting ready reference information. (http://fisher.lib.virginia.edu/) In addition to PUMS data (see above), the University of Virginia Library offers interactive access to: the 1988 and 1994 City and County Databooks (Census Bureau); Uniform Crime Reporting Data (Bureau of Justice Statistics); and national, regional, state, county and municipal macroeconomic data (National Income and Product Accounts, Regional Economic Information System, NBER Productivity, US Imports and Exports, the 1987 Standard Industrial Classification Manual, Regional Economic Projections, State Personal Income, and County Business Patterns, provided by the Bureau of Economic Analysis, National Bureau of Economic Research, and Census Bureau), among other data. Time series vary by data set. Users can pick geographies, variables, and times. Download options are available. Note: users should click on "Interactive Data" in the left hand frame of the page to access the extraction system. (http://mcdc2.missouri.edu/applications/uexplore.shtml) MSCDC's UEXPLORE provides access to data in the areas of agriculture, compendia, economic indicators, education , employment, health, geography/GIS, and population. In all over 100 data sets are available, including: various decennial census summary tape and public use microdata files; Current Population Surveys; population estimates; agricultural and economic censuses; USA Counties; County Business Patterns; City and County Databooks (Census Bureau); Bureau of Economic Analysis employment and income data; and Integrated Postsecondary Education Data System (IPEDS) and Common Core of Data (CCD) files from the National Center for Education Statistics, among others. Geographies and time periods vary by data set, with particularly strong coverage, as might be expected, for Missouri. However, many data sets have complete state or even finer level geographic coverage. Users pick geographies and variables. Record selection criteria options are available, as is sorting. Download options (including SAS data sets) are available. Note that this is an extremely powerful, but not intuitive extraction system, and users should read the online tutorial before using it. This extraction system is a small subset of the World Development Indicators Database. It contains 54 macroeconomic and demographic time series for 207 countries and 18 regions. Data is available for the latest five years only. Download options are available. (http://www.prb.org/datafind/datafinder7.htm) This interactive query, which returns data for 72 demographic variables for the world 221 countries, 28 regions and subregions, and the US and the 50 states, contains data drawn from several Population Reference Bureau sources, including: World Population Data Sheet; Women of our World; Breastfeeding Patterns in the Developing World; and the United States Population Data Sheet. Users can query multiple geographies and variables. Data for only the latest year are returned. Last updated 08/22/2005 by Jack Solock jsolock@ssc.wisc.edu
© 2000 University of Wisconsin Center for Demography and Ecology
|