Autism Data Science Initiative Data Resources
Navigation #navigation
- Example Table Information
- Table 1: National Institute of Mental Health Data Archive
- Table 2: NICHD Data and Specimen Hub
- Table 3: Database of Genotypes and Phenotypes
- Table 4: National Metabolomics Data Repository
- Table 5: Human Health Exposure Analysis Resource Data Center
- Table 6: NHGRI Analysis Visualization and Informatics Lab-space
- Table 7: National Longitudinal Transition Study-2
- Table 8: National Survey of Children’s Health
- Table 9: Medical Expenditure Panel Survey
- Table 10: Medicaid and the Children’s Health Insurance Program
- Table 11: Kaiser Permanente Research Bank
- Table 12: Autism Speaks MSSNG Database
- Table 13: Simons Foundation Autism Research Initiative (SFARI) Base
- Table 14: National COVID Cohort Collaborative (N3C) Data Enclave
- Table 15: All of Us Research Hub
- Table 16: PEDSnet: A pediatric learning health system
- Table 17: UK Biobank
Example Table Information #extable
Field Name | Field Definition |
Repository Full Name | The proper name for the repository and/or acronym spell-out |
Repository Short Name | Acronym or short name for repository if one exists |
Brief Description | Description of the repository and the purpose it serves |
Web Address | Homepage URL for the repository |
Help Email | Email link for general public to contact repository staff |
Affiliation | The organization that hosts and maintains the database and associated software |
Organism | The types of organisms from which data are shared in the repository |
Research Areas | The research domain(s) for which the repository shares data |
Data Types | Keywords for types of data associated with the repository. Data types are sourced directly from the repository website without further interpretation; different repositories may use different terms to describe the same type of data. |
Controlled Access | Whether the repository has a controlled access option for access to datasets: "Yes", The repository includes controlled access option; "No", The repository does not include controlled access option; "Unclear", Unclear if the repository includes a controlled access option, or repository website does not specify this information. |
Data Access Control Description | List of which options the repository offers for access to hosted datasets: "Open access", No access restrictions or registration required to access; "Registration required", Open to all, but users need to be signed in or registered with the resource to access; "Controlled access", Requires verification of requestor identity and the appropriateness of their proposed research use to access protected data by some review process/committee; "Enclave", Controlled access where data cannot be downloaded or removed from a specific environment. |
Data Access Control Links | URL to information about data access controls |
Fairsharing Link | URL to the fairsharing.org listing where one exists. |
Table 1: National Institute of Mental Health Data Archive #table1
Field Name | Field Definition |
Repository Full Name | The National Institute of Mental Health Data Archive |
Repository Short Name | NIMH NDA |
Brief Description | The NIMH NDA is a large-scale repository that houses a wide variety of data related to autism, including behavioral, clinical, genetic, and neuroimaging data. Researchers can access de-identified data with appropriate approvals. The NDA encompasses data from the National Database for Autism Research (NDAR), the National Database for Clinical Trials related to Mental Illness (NDCT), the Research Domain Criteria Database (RDoCdb), and the NIH Pediatric MRI Data Repository, Adolescent Brain Cognitive DevelopmentSM, (ABCD) Study, the Connectome Coordination Facility (CCF), the Osteoarthritis Initiative (OAI), the National Institute on Alcohol Abuse and Alcoholism Data Archive, the Helping to End Addiction Long-term Initiative (NIH HEAL), the NeuroBioBank Data Repository, and the PsychENCODE Consortium. Researchers can access de-identified data with appropriate approvals. |
Web Address | https://nda.nih.gov/ |
Help Email | [email protected] |
Affiliation | National Institute of Mental Health (NIMH) |
Organism | Human subjects |
Research Areas | Clinical studies, Medicine, Autism, Mental Illness, Cognitive Development, Neurology, Osteoarthritis, Alcohol Abuse and Alcoholism, Triplet Repeat Disease |
Data Types | phenotypic data, imaging and other neurosignal recordings data, and genomic/pedigree data related to mental health on human subjects |
Controlled Access | Yes |
Data Access Control Description | Open access; Registration required; and Controlled access |
Data Access Control Links | https://nda.nih.gov/nda/access-data-info |
Fairsharing Link | https://fairsharing.org/3209 |
Table 2: NICHD Data and Specimen Hub #table2
Field Name | Field Definition |
Repository Full Name | NICHD Data and Specimen Hub |
Repository Short Name | NICHD DASH |
Brief Description | NICHD Data and Specimen Hub (DASH) allows researchers to share and access de-identified data from studies funded by NICHD and also serves as a portal for requesting biospecimens from selected DASH studies. DASH hosts deidentified datasets from clinical and population health studies funded by NICHD and relevant to the NICHD mission, including the National Children's Study and the Environmental influences on Child Health Outcomes (ECHO)-wide Cohort study |
Web Address | https://dash.nichd.nih.gov/ |
Help Email | [email protected] |
Affiliation | Eunice Kennedy Shriver National Institute of Child Health and Human Development |
Organism | Human subjects |
Research Areas | Life science, Critical Care Medicine, Pediatrics, Biomedical Science, Clinical Studies, Demographics, Gynecology, Obstetrics, Pharmacology, Social Science, Medicine, Musculoskeletal Medicine, Reproductive Health, Behavior, Sleep, Safety |
Data Types | Research data and biospecimens |
Controlled Access | Yes |
Data Access Control Description | Controlled access |
Data Access Control Links | |
Fairsharing Link | https://fairsharing.org/FAIRsharing.dYSI4O |
Table 3: Database of Genotypes and Phenotypes #table3
Field Name | Field Definition |
Repository Full Name | Database of Genotypes and Phenotypes |
Repository Short Name | dbGAP |
Brief Description | DbGaP archives and distributes the data and results from studies that have investigated the interaction of genotype and phenotype in humans. It includes genomic data from the NIH-funded Autism Sequencing Consortium and additional relevant studies. |
Web Address | https://www.ncbi.nlm.nih.gov/gap/ |
Help Email | [email protected] |
Affiliation | National Center for Biotechnology Information, National Library of Medicine |
Organism | Human subjects |
Research Areas | Biomedical Science, Genetics, Epigenetics, Expression Data, Genetic Polymorphism, Phenotype, Genotype |
Data Types | phenotype data, association (GWAS) data, summary level analysis data, SRA (Short Read Archive) data, reference alignment (BAM) data, VCF (Variant Call Format) data, expression data, imputed genotype data, image data, etc. |
Controlled Access | Yes |
Data Access Control Description | Controlled access |
Data Access Control Links | |
Fairsharing Link | https://fairsharing.org/FAIRsharing.88v2k0 |
Table 4: National Metabolomics Data Repository #table4
Field Name | Field Definition |
Repository Full Name | National Metabolomics Data Repository |
Repository Short Name | NMDR, Metabolomics Workbench |
Brief Description | Repository for metabolomics data and a resource for analytic tools and protocols. |
Web Address | https://www.metabolomicsworkbench.org/ |
Help Email | [email protected] |
Affiliation | UC San Diego, National Institutes of Health Common Fund |
Organism | Human subjects |
Research Areas | Metabolomics for small and large studies on cells, tissues and organisms |
Data Types | Processed data (measurements) maybe in the form of quantitated metabolite concentrations, MS peak height/area values, LC retention times, NMR binned areas, etc. Raw data in the form of MS and NMR binary files and associated parameter files may also be uploaded. |
Controlled Access | No |
Data Access Control Description | Open-access enclave |
Data Access Control Links | N/A |
Fairsharing Link | https://fairsharing.org/FAIRsharing.xfrgsf |
Table 5: Human Health Exposure Analysis Resource Data Center #table5
Field Name | Field Definition |
Repository Full Name | Human Health Exposure Analysis Resource Data Center |
Repository Short Name | HHEAR Data Center |
Brief Description | A large, de-identified data repository of epidemiologic and environmental exposure biomarker data including studies with relevant autism and neurodevelopmental outcomes. |
Web Address | https://hheardatacenter.mssm.edu/ |
Help Email | [email protected] |
Affiliation | Icahn School of Medicine at Mount Sinai; National Institute of Environmental Health Sciences. |
Organism | Human subjects |
Research Areas | Clinical Studies, Public Health, Epidemiology, Exposure, Environmental Health |
Data Types | Biomarker measurements |
Controlled Access | Yes |
Data Access Control Description | Registration required |
Data Access Control Links | |
Fairsharing Link | https://fairsharing.org/FAIRsharing.88v2k0 |
Table 6: NHGRI Analysis Visualization and Informatics Lab-space #table6
Field Name | Field Definition |
Repository Full Name | NHGRI Analysis Visualization and Informatics Lab-space |
Repository Short Name | AnVIL |
Brief Description | A unified computing environment for genomics data storage, management, and analysis of genomics and related data. It enables population-scale analysis, and facilitates collaboration through the sharing of data, code, and analysis results. The core data management and analysis components of the AnVIL currently consists of Terra, Gen3, Galaxy, RStudio/Bioconductor, Dockstore, and Jupyter. |
Web Address | https://anvilproject.org/ |
Help Email | [email protected] |
Affiliation | National Human Genome Research Institute |
Organism | Human subjects |
Research Areas | Genomics |
Data Types | Biomarker measurements |
Controlled Access | Yes |
Data Access Control Description | Registration required |
Data Access Control Links | https://anvilproject.org/faq/data-security |
Fairsharing Link | https://fairsharing.org/FAIRsharing.88v2k0 |
Table 7: National Longitudinal Transition Study-2 #table7
Field Name | Field Definition |
Repository Full Name | National Longitudinal Transition Study-2 |
Repository Short Name | NLTS2 |
Brief Description | The National Longitudinal Transition Study-2 was commissioned by the US Department of Education and documented experiences of students aged 13-16, as they moved from secondary school into adult roles. The NLTS2 includes data on secondary school experiences of youth in special education, including their schools, school programs, related services, and extracurricular activities and measures outcomes in education, employment, social, and residential domains including factors that contribute to more positive outcomes. |
Web Address | https://nlts2.sri.com/ |
Help Email | [email protected] |
Affiliation | Department of Education |
Organism | Human subjects |
Research Areas | Special Education |
Data Types | Parent/youth interview, school survey, student assessment data, demographic data, household characteristics |
Controlled Access | No; Yes |
Data Access Control Description | Open Access; Controlled Access |
Data Access Control Links | https://nces.ed.gov/statprog/rudman/ |
Fairsharing Link | N/A |
Table 8: National Survey of Children’s Health #table8
Field Name | Field Definition |
Repository Full Name | National Survey of Children’s Health |
Repository Short Name | NSCH |
Brief Description | The NSCH is funded by the Health Resources & Services Administration (HRSA) and supports national efforts to improve the health and development of children. National and state level data are released annually and focus on key measures of child health and well-being to understand the health status and health services needs of children across the nation. Data from the Children with Special Health Care Needs (CSHCN) are also included and explores the extent to which children with spcial health care needs have medical homes, adequate health insurance, access to needed services, and adequate care coordination. Other topics include functional difficulties, transition services, shared decision-making, and satisfaction with care. |
Web Address | https://mchb.hrsa.gov/data-research/national-survey-childrens-health |
Help Email | [email protected] |
Affiliation | Health Resources & Services Administration (HRSA) |
Organism | Human subjects |
Research Areas | Physical and emotional health of children, access to and use of health care, family interactions, parental health, school and after-school experiences, neighborhood characteristics |
Data Types | |
Controlled Access | No |
Data Access Control Description | Open; Registration Required |
Data Access Control Links | https://www.census.gov/programs-surveys/nsch/data/datasets.html |
Fairsharing Link | N/A |
Table 9: Medical Expenditure Panel Survey #table9
Field Name | Field Definition |
Repository Full Name | Medical Expenditure Panel Survey |
Repository Short Name | MEPS |
Brief Description | Funded by the Agency for Healthcare Research and Quality, the MEPS is a set of large-scale surveys of families and individuals, their medical providers, and employers across the United States. MEPS is the most complete source of data on the cost and use of health care and health insurance coverage. The Household Component provides data from individual households and their members, which is supplemented by data from their medical providers. The Insurance Component is a separate survey of employers that provides data on employer-based health insurance. |
Web Address | https://meps.ahrq.gov/mepsweb/ |
Help Email | [email protected] |
Affiliation | Agency for Healthcare Research and Quality |
Organism | Human subjects |
Research Areas | Access to health care, Children’s Health, Men’s Health, Women’s Health, Elderly Health, Insurance, Disability, Minority Health, employment, Health Care Disparities, Home Health Care, Employment, Injuries, Mental Health, Obesity, Opioids, Pharmacy & Prescription Drugs, Preventative Care, Preventative Care, Arthritis, Asthma, Cancer, Diabetes, Emphysema and Bronchitis, Heart Conditions, High Blood Pressure, High Cholesterol, Strokes, Quality of Health Care, Veteran’s Health, Vision Impairment, Health expenditures |
Data Types | |
Controlled Access | No; Yes |
Data Access Control Description | Open access; Enclave |
Data Access Control Links | https://meps.ahrq.gov/mepsweb/data_stats/onsite_datacenter.jsp |
Fairsharing Link | N/A |
Table 10: Medicaid and the Children’s Health Insurance Program #table10
Field Name | Field Definition |
Repository Full Name | Medicaid and the Children’s Health Insurance Program Open Data |
Repository Short Name | Medicaid & CHIP Open Data |
Brief Description | Data.Medicaid.gov is a public platform offering open access to a diverse range of datasets related to Medicaid and the Children’s Health Insurance Program (CHIP). It is tailored to support policymakers, researchers, and the general public by providing critical data for research, reporting, and analysis. The platform covers various topics, including state Medicaid and CHIP programs, enrollment statistics, spending trends, and quality metrics. |
Web Address | https://data.medicaid.gov |
Help Email | [email protected] |
Affiliation | U.S. Centers for Medicare & Medicaid Services |
Organism | Human subjects |
Research Areas | Drug utilization, drug pricing and payment, enrollment, reimbursements, behavioral health care, demographics, maternal health, mental health, disability, dental health, telehealth, substance use disorder, managed care |
Data Types | |
Controlled Access | No |
Data Access Control Description | Open |
Data Access Control Links | N/A |
Fairsharing Link | N/A |
Table 11: Kaiser Permanente Research Bank #table11
Field Name | Field Definition |
Repository Full Name | Kaiser Permanente Research Bank |
Repository Short Name | KP Research Bank |
Brief Description | The KP Research Bank includes robust data and specimen collection from members of a real-world health system, including genomic data resources. The retrospective, longitudinal medical records available include over 440K participants recruited through multiple outreach efforts since 2008, and extends more than 20 years for the majority of the cohort. Researchers can apply to use this resource tailored to their specific study design. |
Web Address | https://researchbank.kaiserpermanente.org/for-researchers/ |
Help Email | https://researchbank-econsent.kaiserpermanente.org/ContactUs/Index?ref=noreferrer&lang=en |
Affiliation | Kaiser Permanente |
Organism | Human subjects |
Research Areas | General health, cancer, pregnancy, |
Data Types | biospecimens, genomic data, self-reported health survey data, and KP clinical data |
Controlled Access | Yes |
Data Access Control Description | Controlled access |
Data Access Control Links | https://researchbank.kaiserpermanente.org/for-researchers/apply-for-access/ |
Fairsharing Link | N/A |
Table 12: Autism Speaks MSSNG Database #table12
Field Name | Field Definition |
Repository Full Name | Autism Speaks MSSNG Database |
Repository Short Name | MSSNG |
Brief Description | The MSSNG project aims to create a whole genome sequencing database on autism with deep phenotyping, with a focus on identifying subtypes of autism to inform diagnostics and personalized treatments. Data are available upon request. |
Web Address | https://research.mss.ng/ |
Help Email | [email protected] |
Affiliation | Autism Speaks, Verily, DNAstack, Hospital for Sick Children (SickKids) |
Organism | Human subjects |
Research Areas | Autism |
Data Types | Genomic data, phenotypic data |
Controlled Access | Yes |
Data Access Control Description | Controlled access |
Data Access Control Links | https://autismspeaks.fluxx.io/ https://research.mss.ng/assets/documents/db7/genomics-application-process_2.5.2025.docx |
Fairsharing Link | N/A |
Table 13: Simons Foundation Autism Research Initiative (SFARI) Base #table13
Field Name | Field Definition |
Repository Full Name | Simons Foundation Autism Research Initiative Base |
Repository Short Name | SFARI Base |
Brief Description | SFARI Base is a clearinghouse for autism and autism-related research data and biospecimens supported by the Simons Foundation Autism Research Initiative (SFARI). It includes the Simons Simplex Collection, a permanent repository of genetic samples from 2,700 simplex families; Simons Foundation Powering Autism Research (SPARK), a collection of medical and behavioral information for over 100,000 people with autism; and The Autism Inpatient Collection (AIC), which includes phenotypic and genetic data from 1,555 children with a clinical diagnosis of autism who have been admitted to one of six specialized inpatient child psychiatry units in the United States. Researchers can request access to phenotypic, genetic, or imaging data and order biospecimens. |
Web Address | https://www.sfari.org/resource/sfari-base/ |
Help Email | [email protected] (application process)
|
Affiliation | Simons Foundation Autism Research Initiative |
Organism | Human subjects |
Research Areas | Autism |
Data Types | Research data and biospecimens |
Controlled Access | Yes |
Data Access Control Description | Controlled Access |
Data Access Control Links | https://base.sfari.org/ |
Fairsharing Link | N/A |
Table 14: National COVID Cohort Collaborative (N3C) Data Enclave #table14
Field Name | Field Definition |
Repository Full Name | National COVID Cohort Collaborative |
Repository Short Name | N3C |
Brief Description | The N3C Data Enclave is a secure platform through which harmonized clinical data provided by our contributing members are stored. The Enclave includes demographic and clinical characteristics of patients who have been tested for or diagnosed with COVID-19, and further information about the strategies and outcomes of treatments for those suspected or confirmed to have the virus. Additional data from publicly available datasets, claims data, and mortality is also available to support studies. For more information on the inclusion and exclusion criteria, see the N3C Phenotype. |
Web Address | https://covid.cd2h.org/ |
Help Email | https://covid.cd2h.org/support/ |
Affiliation | National Center for Advancing Translational Sciences (NCATS), National Institutes of Health |
Organism | Human subjects; Sars-cov-2 |
Research Areas | Clinical Studies, Medical Virology, Public Health, Patient Care, Cardiovascular Disease, Diabetes & Obesity, Environmental Health, Immunocompromised or Compromised (ISC), Oncology, Rural Health, Social Drivers of Health |
Data Types | Clinical data |
Controlled Access | Yes |
Data Access Control Description | Enclave |
Data Access Control Links | https://covid.cd2h.org/account-instructions/ |
Fairsharing Link | https://fairsharing.org/FAIRsharing.bbbffe |
Table 15: All of Us Research Hub #table15
Field Name | Field Definition |
Repository Full Name | All of Us Research Hub |
Repository Short Name | N/A |
Brief Description | The All of Us Research Hub houses a large and comprehensive dataset where users can explore aggregate data including genomic variants, survey responses, physical measurements, electronic health record information, and wearables data. Registered users can use the Researcher Workbench to analyze Registered and Controlled tier data with a variety of cloud-based analysis tools. |
Web Address | https://www.researchallofus.org/ |
Help Email | [email protected] |
Affiliation | National Institute of Health |
Organism | Human subjects |
Research Areas | general health, social factors, health care access and utilization, drug exposures, chronic disease, health behavior, genomics |
Data Types | Research data, survey data, genomics data, Electronic Health Records (EHR) data, self-reported physical measurements, digital health data |
Controlled Access | Yes |
Data Access Control Description | Open; Registration Required; Controlled Access; Enclave. There are multiple access tiers with access controls that accord with the risk of the data within a given tier. |
Data Access Control Links | https://support.researchallofus.org/hc/en-us/categories/8951135815700-Access-DURA-Support |
Fairsharing Link | N/A |
Table 16: PEDSnet: A pediatric learning health system #table16
Field Name | Field Definition |
Repository Full Name | PEDSnet: A pediatric learning health system |
Repository Short Name | PEDSnet |
Brief Description | PEDSnet contains demographic and clinical data from over 15,000,000 pediatric patients across the United States. The system aligns information from outpatient, inpatient, and emergency department visits to a common data model and makes them available to authorized users through a secure data portal. |
Web Address | https://pedsnet.org/database/ |
Help Email | [email protected] |
Affiliation | PEDSnet (a Clinical Research Network within PCORnet) |
Organism | Human subjects |
Research Areas | Demographics, Diagnoses, Medications, Lab Measurements, Procedures, Providers, Visits |
Data Types | EHR, research data |
Controlled Access | Yes |
Data Access Control Description | Controlled Access |
Data Access Control Links | https://pedsnet.org/database/access-to-data/ |
Fairsharing Link | N/A |
Table 17: UK Biobank #table17
Field Name | Field Definition |
Repository Full Name | UK Biobank |
Repository Short Name | UK Biobank |
Brief Description | UK Biobank is a large-scale biomedical database and research resource, containing in-depth, de-identified genetic and health information from half a million UK participants. The database, which is regularly augmented with additional data, is globally accessible to approved researchers and scientists undertaking vital research into the most common and life-threatening diseases. UK Biobank provides data on half a million people ages 40-69 living in the UK. |
Web Address | https://www.ukbiobank.ac.uk/ |
Help Email | [email protected] |
Affiliation | Wellcome Trust, Medical Research Council, Department of Health, Scottish Government, and the Northwest Regional Development Agency |
Organism | Human subjects |
Research Areas | Research areas involving human health and disease |
Data Types | Electronic Health Records, Surveys and Questionnaires, Research visit, Wearable Fitness Device, Genomic, Registry, Imaging, Genetics, Health linkages, Biomarkers, Baseline assessments |
Controlled Access | Yes |
Data Access Control Description | Registration required |
Data Access Control Links | |
Fairsharing Link | N/A |