Autism Data Science Initiative Funded Research

Funded Projects #allabstracttable

Click on the link for the project title below to see the full award abstract.

Contact PI/Project Leader	Project Number	Awardee Organization	Title	Areas of Research & ADSI Task Areas
Chung, Wendy K	1OT2OD040407	Boston Children’s Hospital	Integrating Genetic and Environmental Data to Predict Autism Susceptibility and Heterogeneity in the SPARK Cohort	Etiology/Causation Task II: Data Generation Task III: Data Analysis
Cochran, Amy Louise	1OT2OD040368	University of Wisconsin-Madison	Understanding causes of autism, its heterogeneity, and rising prevalence with emerging methods in causal inference	Prevalence Task I: Dataset Aggregation Task III: Data Analysis
Geschwind, Daniel H	1OT2OD040572	University of California Los Angeles	Data science approaches to autism: environmental modulators of the transcriptome and gene-x-environment interactions	Etiology/Causation Task II: Data Generation Task III: Data Analysis
Klinger, Laura G	1OT2OD040528	University of North Carolina Chapel Hill	From Data to Action: Capturing Meaningful Outcomes in Autism Through Harmonized Data	Clinical Services/Treatment Task I: Dataset Aggregation Task II: Data Generation Task III: Data Analysis
Liu, Zhandong	1OT2OD040565	Baylor College of Medicine	Validate ASD: Independent Multimodal Replication and Validation of Autism Data-Science Models	Task IV: Model Validation/Replication
Lyall, Kristen	1OT2OD040445	Drexel University	Diet and the Exposome in Autism: Discovering Complex Interactions with Diet in Autism Etiology And Variability	Etiology/Causation Task I: Dataset Aggregation Task III: Data Analysis
Miller, Judith Susanne	1OT2OD040373	Children's Hospital Of Philadelphia	Genomic and Exposomic Factors in the Cause and Rise of Autism	Prevalence Task I: Dataset Aggregation Task II: Data Generation Task III: Data Analysis
Sebat, Jonathan	1OT2OD040415	University of California, San Diego	Elucidating the Interplay of Genes and Environment in Autism Using Genomic and Exposure Data from Large Populations	Etiology/ Causation Task I: Dataset Aggregation Task III: Data Analysis
Stein, Jason Louis	1OT2OD040436	University of North Carolina Chapel Hill	Modeling autism-associated gene-by-environment interactions in human brain organoids	Etiology/ Causation Task I: Dataset Aggregation Task II: Data Generation Task III: Data Analysis
Walker, Douglas Ian	1OT2OD040409	Emory University	Mapping Internal Exposome-Metabolome Dynamics with Advanced Data Science to Identify Environmental Determinants of Autism.	Etiology/Causation Task II: Data Generation Task III: Data Analysis
Wang, Xiaobin	1OT2OD040418	Johns Hopkins University	Boston Birth Cohort - Autism Data Science Initiative (BBC-ADSI)	Etiology/Causation Task II: Data Generation Task III: Data Analysis
Zhong, Hua Judy	1OT2OD040511	Weill Medical College Of Cornell Univ	AR² - Autism Replication, Validation, and Reproducibility Center	Task IV: Model Replication/Validation
Zuckerman, Katharine Elizabeth	1OT2OD040512	Oregon Health & Science University	Advancing Success & Developmental Outcomes in Autism Spectrum Disorder through Analysis of Secondary Data (ASD3 Outcomes Project)	Clinical Services/Treatment Task I: Dataset Aggregation Task II: Data Generation Task III: Data Analysis

Funded Project Abstracts

#Table1

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Chung, Wendy K	1OT2OD040407	Boston Children’s Hospital	Etiology/Causation Task II: Data Generation Task III: Data Analysis
Integrating Genetic and Environmental Data to Predict Autism Susceptibility and Heterogeneity in the SPARK Cohort
Although autism is known to be highly heritable, the lack of 100% concordance in monozygotic twins and stronger concordance between dizygotic twins over full siblings offers strong evidence for environmental susceptibility factors. There is accumulating evidence for several non-genetic factors, including air pollution, pesticides, heavy metals and pregnancy-related complications. However, studies of environmental susceptibility have been limited by modest sample sizes, indirect measurement and incomplete characterization of exposures, and lack of a multi-dimensional approach that accounts for the interplay between genetic and environmental factors that impact the probability of autism as well as its heterogenous trajectories. The infrastructure already in place in SPARK, a large, US-based cohort of >150,000 recontactable people with autism, offers a unique opportunity to implement a large-scale integrative study design to identify prenatal and early life exposures that impact autism and how exposures interact with genetic risk. Our long-term goal is to identify exposures that influence autism and understand the molecular mechanisms by which these exposures impact the genome, epigenome, and metabolome. The objective of this proposal is to characterize prenatal and early life exposures in >20,000 children with autism and identify exposures that impact probability of autism, developmental trajectories in autism and response to behavioral intervention. We will also investigate the interaction between environmental exposures and genomic risk. In our first aim, we will use multivariate regression and machine learning methods to identify geospatial exposures associated with social communication abilities and response to educational/behavioral intervention in thousands of individuals with autism. In the second aim, we will evaluate gene-environment interactions using the geospatial exposome and the entire distribution of autism genetic risk variants, including common and rare variants, in thousands of individuals with autism in SPARK. In the third aim we will directly measure exogenous exposures and endogenous biological responses in the perinatal period by performing untargeted high-resolution exposomics on residual newborn blood spots from a subset of the SPARK cohort. We will also perform long-read DNA sequencing to assess DNA methylation epigenetic signatures associated with environmental exposures. Ultimately, our goal is to build a model that can help clinicians and families assess genetic and nongenetic probability of autism and predict response to treatment to maximize the potential of individuals with autism. The findings from this study will be significant because this will be the first large-scale investigation that comprehensively evaluates the interplay between exposomic and genomic risk factors in autism and will yield novel insights into the mechanisms that drive risk and resilience in this complex condition.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Chung, Wendy K

1OT2OD040407

Boston Children’s Hospital

Etiology/Causation

Task II: Data Generation

Task III: Data Analysis

Integrating Genetic and Environmental Data to Predict Autism Susceptibility and Heterogeneity in the SPARK Cohort

Although autism is known to be highly heritable, the lack of 100% concordance in monozygotic twins and stronger concordance between dizygotic twins over full siblings offers strong evidence for environmental susceptibility factors. There is accumulating evidence for several non-genetic factors, including air pollution, pesticides, heavy metals and pregnancy-related complications. However, studies of environmental susceptibility have been limited by modest sample sizes, indirect measurement and incomplete characterization of exposures, and lack of a multi-dimensional approach that accounts for the interplay between genetic and environmental factors that impact the probability of autism as well as its heterogenous trajectories. The infrastructure already in place in SPARK, a large, US-based cohort of >150,000 recontactable people with autism, offers a unique opportunity to implement a large-scale integrative study design to identify prenatal and early life exposures that impact autism and how exposures interact with genetic risk. Our long-term goal is to identify exposures that influence autism and understand the molecular mechanisms by which these exposures impact the genome, epigenome, and metabolome. The objective of this proposal is to characterize prenatal and early life exposures in >20,000 children with autism and identify exposures that impact probability of autism, developmental trajectories in autism and response to behavioral intervention. We will also investigate the interaction between environmental exposures and genomic risk.

In our first aim, we will use multivariate regression and machine learning methods to identify geospatial exposures associated with social communication abilities and response to educational/behavioral intervention in thousands of individuals with autism. In the second aim, we will evaluate gene-environment interactions using the geospatial exposome and the entire distribution of autism genetic risk variants, including common and rare variants, in thousands of individuals with autism in SPARK. In the third aim we will directly measure exogenous exposures and endogenous biological responses in the perinatal period by performing untargeted high-resolution exposomics on residual newborn blood spots from a subset of the SPARK cohort. We will also perform long-read DNA sequencing to assess DNA methylation epigenetic signatures associated with environmental exposures. Ultimately, our goal is to build a model that can help clinicians and families assess genetic and nongenetic probability of autism and predict response to treatment to maximize the potential of individuals with autism. The findings from this study will be significant because this will be the first large-scale investigation that comprehensively evaluates the interplay between exposomic and genomic risk factors in autism and will yield novel insights into the mechanisms that drive risk and resilience in this complex condition.

#Table2

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Cochran, Amy Louise	1OT2OD040368	University of Wisconsin-Madison	Prevalence Task I: Dataset Aggregation Task III: Data Analysis
Understanding causes of autism, its heterogeneity, and rising prevalence with emerging methods in causal inference
Over the past five decades, autism prevalence in the United States has risen from fewer than 1 in 2,000 children to approximately 1 in 31. Over the same period, early-life exposures—such as parental age, maternal diabetes, and prenatal complications—have changed dramatically. These parallel trends raise urgent questions: Are these early-life factors contributing to the increase in autism diagnoses? And how might these factors shape outcomes that are meaningful to autistic individuals and families? This project addresses these questions by applying modern causal inference methods to large, population-based datasets. Unlike traditional regression models, which estimate associations, causal models allow researchers to simulate what would happen if a single exposure were changed while holding all else constant. This enables valid estimation of both how much a risk factor contributes to autism diagnosis and related outcomes—such as communication or emotional regulation—and how much of the increase in autism prevalence can be attributed to historical shifts in these factors. By combining these models with flexible, data-driven approaches that capture the heterogeneity in autism, the project may also identify distinct pathways by which early-life factors may lead to different autism phenotypes. The study has four specific aims. First, researchers will prepare two complementary datasets for causal analysis: (1) the Study to Explore Early Development, a multisite case-control study led by the CDC that includes extensive behavioral, developmental, perinatal, and biological data; and (2) electronic health records from a large Midwestern health system linked to birth certificates. Second, causal models will be used to estimate the effect of early-life exposures on autism occurrence. Third, exposures will be jointly modeled with autism phenotypes to uncover causal pathways that may be obscured by traditional analytic approaches. Fourth, these models will be applied to longitudinal data to estimate the portion of the increase in autism prevalence attributable to historical shifts in early-life exposures. Findings will clarify which early-life factors have the strongest causal effects, which developmental outcomes they influence, and the extent to which they may have contributed to rising autism prevalence. Results will inform strategies for intervention and early identification and provide a foundation for prioritizing biological mechanisms and modifiable exposures. Code and models will be publicly released in a modular, template-based format to enable replication across datasets such as CHARGE, SFARI, All of Us, and TriNetX. The research team brings expertise in autism epidemiology, causal inference, clinical informatics, and participatory methods and includes individuals with lived experience of autism, ensuring that the work is rigorous, actionable, and grounded in the perspectives of the autism community.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Cochran, Amy Louise

1OT2OD040368

University of Wisconsin-Madison

Prevalence

Task I: Dataset Aggregation

Task III: Data Analysis

Understanding causes of autism, its heterogeneity, and rising prevalence with emerging methods in causal inference

Over the past five decades, autism prevalence in the United States has risen from fewer than 1 in 2,000 children to approximately 1 in 31. Over the same period, early-life exposures—such as parental age, maternal diabetes, and prenatal complications—have changed dramatically. These parallel trends raise urgent questions: Are these early-life factors contributing to the increase in autism diagnoses? And how might these factors shape outcomes that are meaningful to autistic individuals and families?

This project addresses these questions by applying modern causal inference methods to large, population-based datasets. Unlike traditional regression models, which estimate associations, causal models allow researchers to simulate what would happen if a single exposure were changed while holding all else constant. This enables valid estimation of both how much a risk factor contributes to autism diagnosis and related outcomes—such as communication or emotional regulation—and how much of the increase in autism prevalence can be attributed to historical shifts in these factors. By combining these models with flexible, data-driven approaches that capture the heterogeneity in autism, the project may also identify distinct pathways by which early-life factors may lead to different autism phenotypes.

The study has four specific aims. First, researchers will prepare two complementary datasets for causal analysis: (1) the Study to Explore Early Development, a multisite case-control study led by the CDC that includes extensive behavioral, developmental, perinatal, and biological data; and (2) electronic health records from a large Midwestern health system linked to birth certificates. Second, causal models will be used to estimate the effect of early-life exposures on autism occurrence. Third, exposures will be jointly modeled with autism phenotypes to uncover causal pathways that may be obscured by traditional analytic approaches. Fourth, these models will be applied to longitudinal data to estimate the portion of the increase in autism prevalence attributable to historical shifts in early-life exposures.

Findings will clarify which early-life factors have the strongest causal effects, which developmental outcomes they influence, and the extent to which they may have contributed to rising autism prevalence. Results will inform strategies for intervention and early identification and provide a foundation for prioritizing biological mechanisms and modifiable exposures. Code and models will be publicly released in a modular, template-based format to enable replication across datasets such as CHARGE, SFARI, All of Us, and TriNetX. The research team brings expertise in autism epidemiology, causal inference, clinical informatics, and participatory methods and includes individuals with lived experience of autism, ensuring that the work is rigorous, actionable, and grounded in the perspectives of the autism community.

#Table3

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Geschwind, Daniel H	1OT2OD040572	University of California Los Angeles	Etiology/Causation Task II: Data Generation Task III: Data Analysis
Data science approaches to autism: environmental modulators of the transcriptome and gene-x-environment interactions
The prevalence of autism spectrum disorder (ASD) has dramatically increased in recent years to a rate of 1 in 31 U.S children. ASD risk is multifactorial, with genetic and environmental elements each playing a role. While major progress has been made in identifying causal genetic variants, environmental— particularly prenatal—exposures are increasingly recognized as critical contributors to neurodevelopmental risk. Over 200 high-production volume chemicals are routinely detected in American adults, including pregnant women. The ways in which environmental factors contribute to ASD risk (and how they interact with genetic risk) remain unclear, however. Understanding how these environmental chemicals influence neurodevelopment at the cellular and molecular level is crucial for development of mitigation strategies. Progress has been hampered by lack of large-scale data assessing the impact of chemicals at a cellular and molecular level in human neural cells that faithfully model early human brain development. A major barrier to progress has been the absence of scalable experimental platforms capable of evaluating compound exposures in ASD-relevant cell types and with comprehensive molecular readouts. Most existing studies rely on immortalized cell lines, which do not faithfully model human brain development. Here, in this project: “Data science approaches to autism: environmental modulators of the transcriptome and gene-x-environment interactions,” we address this substantial gap in the field by capitalizing on recent advances in genomics, stem cell technology, and data science. We will assess the activity of thousands of chemicals in the primary cell types implicated by ASD genetic risk, human neural progenitor cells and neurons. We leverage human stem cell-based model systems, which have been shown to accurately and reliably model key aspects of human brain development. We couple this with high-throughput culturing and robotic systems, which provide a unique opportunity to efficiently screen the entire ToxCast II/III library of ~4,700 chemical compounds to which humans are exposed, in both human primary neural progenitors (phNPCs) and induced neurons (iNeurons). We use an unbiased, genome-wide measurement at the cellular level, single-nucleus RNAseq, to comprehensively understand the effects that chemical exposures have on human neural development and to study the impacted gene regulatory networks and pathways. We will integrate this with public exposure data and molecular profiling in the human brain to find overlapping pathways. We will then evaluate how genetic background modulates cellular responses by using cell villages to model gene–environment (GxE) interactions across a cohort of 150 donors, including 110 individuals with ASD and 40 neurotypical controls. Using this framework, we will map exposure-responsive expression quantitative trait loci (eQTLs) for more than 40 chemicals with a novel method developed by one of the PIs, enabling the identification of GxE interactions at scale. We will integrate these data with publicly available data sources (e.g. PsychENCODE and ECHO). We will model the biophysical interactions between relevant chemicals and proteins, to provide molecular insights into the underlying mechanisms of action – namely, the means by which compounds induce systemic gene expression via perturbations in gene regulatory networks. We will build a public resource that will enable further mechanistic investigation. Our project will: 1) provide insight into mechanisms of action underlying known autism risk factors, 2) identify novel chemical risk factors that impact essential pathways in early brain development, and 3) query how genetic factors modulate susceptibility and resilience to chemical exposures, which will be comprehensively integrated with existing databases and resources.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Geschwind, Daniel H

1OT2OD040572

University of California Los Angeles

Etiology/Causation

Task II: Data Generation

Task III: Data Analysis

Data science approaches to autism: environmental modulators of the transcriptome and gene-x-environment interactions

The prevalence of autism spectrum disorder (ASD) has dramatically increased in recent years to a rate of 1 in 31 U.S children. ASD risk is multifactorial, with genetic and environmental elements each playing a role. While major progress has been made in identifying causal genetic variants, environmental— particularly prenatal—exposures are increasingly recognized as critical contributors to neurodevelopmental risk. Over 200 high-production volume chemicals are routinely detected in American adults, including pregnant women. The ways in which environmental factors contribute to ASD risk (and how they interact with genetic risk) remain unclear, however. Understanding how these environmental chemicals influence neurodevelopment at the cellular and molecular level is crucial for development of mitigation strategies. Progress has been hampered by lack of large-scale data assessing the impact of chemicals at a cellular and molecular level in human neural cells that faithfully model early human brain development. A major barrier to progress has been the absence of scalable experimental platforms capable of evaluating compound exposures in ASD-relevant cell types and with comprehensive molecular readouts. Most existing studies rely on immortalized cell lines, which do not faithfully model human brain development.

Here, in this project: “Data science approaches to autism: environmental modulators of the transcriptome and gene-x-environment interactions,” we address this substantial gap in the field by capitalizing on recent advances in genomics, stem cell technology, and data science. We will assess the activity of thousands of chemicals in the primary cell types implicated by ASD genetic risk, human neural progenitor cells and neurons. We leverage human stem cell-based model systems, which have been shown to accurately and reliably model key aspects of human brain development. We couple this with high-throughput culturing and robotic systems, which provide a unique opportunity to efficiently screen the entire ToxCast II/III library of ~4,700 chemical compounds to which humans are exposed, in both human primary neural progenitors (phNPCs) and induced neurons (iNeurons). We use an unbiased, genome-wide measurement at the cellular level, single-nucleus RNAseq, to comprehensively understand the effects that chemical exposures have on human neural development and to study the impacted gene regulatory networks and pathways. We will integrate this with public exposure data and molecular profiling in the human brain to find overlapping pathways. We will then evaluate how genetic background modulates cellular responses by using cell villages to model gene–environment (GxE) interactions across a cohort of 150 donors, including 110 individuals with ASD and 40 neurotypical controls. Using this framework, we will map exposure-responsive expression quantitative trait loci (eQTLs) for more than 40 chemicals with a novel method developed by one of the PIs, enabling the identification of GxE interactions at scale. We will integrate these data with publicly available data sources (e.g. PsychENCODE and ECHO). We will model the biophysical interactions between relevant chemicals and proteins, to provide molecular insights into the underlying mechanisms of action – namely, the means by which compounds induce systemic gene expression via perturbations in gene regulatory networks. We will build a public resource that will enable further mechanistic investigation. Our project will: 1) provide insight into mechanisms of action underlying known autism risk factors, 2) identify novel chemical risk factors that impact essential pathways in early brain development, and 3) query how genetic factors modulate susceptibility and resilience to chemical exposures, which will be comprehensively integrated with existing databases and resources.

#Table4

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Klinger, Laura G	1OT2OD040528	University of North Carolina Chapel Hill	Clinical Services/Treatment Task I: Dataset Aggregation Task II: Data Generation Task III: Data Analysis
From Data to Action: Capturing Meaningful Outcomes in Autism Through Harmonized Data
There are more autistic adults in the service system than ever before, and this number is estimated to increase by as much as 300% by 2030. Outcomes for autistic adults are often characterized by low rates of employment and community participation, and high rates of anxiety, depression, and suicidality. Our community partners, including autistic adults and caregivers, have identified the need for research across the adult lifespan that identifies intervention and service targets to promote a high quality of life characterized by good mental health and community participation. We believe that generating a better understanding of autistic adults, including those in mid- and later life, is a pressing public health problem where rapid advancements are needed. To date, however, the field of adult autism research has primarily relied on small, non-representative samples limited to young adulthood. This has hindered our ability to understand outcomes and identify service targets across the adult lifespan and the full autism spectrum (i.e., inclusive of those with and without intellectual disability and a variety of levels of support needs). In response to ROA-OTA-25-006, we leverage an unprecedented collaboration between senior researchers across 4 NIH Autism Center of Excellence (ACE) sites focused on autistic adults. We employ a data driven approach to characterize service needs and targets in cross-sectional and longitudinal samples of autistic adults representative of the full adult lifespan and the full autism spectrum. We will deploy two complementary datasets that are optimally poised to make rapid advancements in service access, NIMH Data Archive (NDA) and Simons Powering Autism Research (SPARK). The NDA data collected across ACE projects provides one of the largest samples of well-characterized autistic adults (n=1,400). The SPARK dataset represents one of the largest longitudinal samples (n=400) of autistic adults. This proposal has three tasks: (I) Dataset Aggregation from the NDA to harmonize the parallel assessment protocol developed by the study team and utilized across 4 ACE sites to identify modifiable adult outcomes. (II) Data Generation to 1) generate geocodes to examine the influence of environmental and neighborhood factors (e.g., socioeconomic factors, provider density, environmental exposures) on service use and needs and trajectories of change in adult outcomes and 2) collect a third time point of data in a longitudinal cohort of 400 autistic adults from SPARK to examine trajectories of change in these modifiable outcomes across adulthood. This additional data collection will complement our aggregated NDA data by advancing our ability to answer questions about drivers of service needs measured over time across the entirety of the dynamic adult years. (III) Data Analyses using both machine learning and (longitudinal) latent transition analyses. We will use machine learning to identify subgroups in the NDA dataset based on their differing profiles of service use and needs and predict subgroup membership based on service targets and sociodemographic indicators. This will allow us to examine specific hypotheses focused on identifying the greatest service needs in this population, the relation between service needs and service targets to improve adult outcomes (mental health, community participation, quality of life). We will use latent transition analysis in the SPARK dataset to identify drivers of service use and needs based on trajectories of change across time. This will allow us to identify longitudinal trajectories of adult outcomes and their relation to service needs. Our work will be conducted in partnership with a 20-member Community Advisory Board (CAB) that includes autistic adults, family members/caregivers, researchers, clinicians and service providers for autistic adults, and state services representatives. The CAB will identify community-driven priorities for analysis. It will also provide interpretation of findings that will inform innovative, implementation-ready approaches to enhance meaningful outcomes for autistic adults. The intensive exchange of knowledge between the CAB and the data science team will produce a team of autistic community research partners experienced in collaboration with potential for major contributions to public service well beyond this grant. This OTA directly addresses the Autism Data Science Initiative goals by using multiple data sources to characterize service utilization patterns and pinpoint potential service targets that will lead to effective and scalable interventions across the lifespan. This work will advance our understanding of adulthood and aging in autistic people and accelerate community-informed, efficient, and effective policy and program priorities that address service access to improve quality of life, increase community participation, and reduce co-occurring mental health challenges.

#Table5

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Liu, Zhandong	1OT2OD040565	Baylor College of Medicine	Task IV: Model Validation/Replication
Validate ASD: Independent Multimodal Replication and Validation of Autism Data-Science Models
This project will develop a robust and transparent framework for the validation and replication of autism data science models through a multi-modal, cross-institutional approach. The initiative will employ comprehensive datasets from various sources, including clinical, genomic, environmental, and physiological data from Texas Children’s Hospital (TCH) and other prominent research repositories. By leveraging advanced AI/ML methods, we will rigorously assess model performance across different pediatric populations, ensuring generalizability and fairness in clinical settings. We will implement a two-pronged validation strategy, including intact model testing and code-blinded replication. This methodology will rigorously evaluate the accuracy, reproducibility, and population-specific calibration needs of predictive models. Key to this effort will be a large-scale integration of structured clinical data from TCH's Research Data Warehouse, combined with genomic data from the SPARK (Simons Powering Autism Research for Knowledge) genetic cohort and metabolomics data from the BaBS (Bacteria and Birth Study). Additionally, environmental exposure models will be validated with placental biomarkers linked to neurodevelopmental outcomes. The project will provide essential insights into the limitations and strengths of autism-related AI models in real-world applications, guiding their future clinical deployment. Through a commitment to transparency, reproducibility, and stakeholder engagement, the project will deliver high-impact validation reports, including community-accessible tools that ensure models are utilized effectively across a broad range of patient populations. The outcomes of this work will enhance autism care and set a new standard for multimodal, multi-institutional model validation in pediatric health research.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Liu, Zhandong

1OT2OD040565

Baylor College of Medicine

Task IV: Model Validation/Replication

Validate ASD: Independent Multimodal Replication and Validation of Autism Data-Science Models

This project will develop a robust and transparent framework for the validation and replication of autism data science models through a multi-modal, cross-institutional approach. The initiative will employ comprehensive datasets from various sources, including clinical, genomic, environmental, and physiological data from Texas Children’s Hospital (TCH) and other prominent research repositories. By leveraging advanced AI/ML methods, we will rigorously assess model performance across different pediatric populations, ensuring generalizability and fairness in clinical settings. We will implement a two-pronged validation strategy, including intact model testing and code-blinded replication. This methodology will rigorously evaluate the accuracy, reproducibility, and population-specific calibration needs of predictive models. Key to this effort will be a large-scale integration of structured clinical data from TCH's Research Data Warehouse, combined with genomic data from the SPARK (Simons Powering Autism Research for Knowledge) genetic cohort and metabolomics data from the BaBS (Bacteria and Birth Study). Additionally, environmental exposure models will be validated with placental biomarkers linked to neurodevelopmental outcomes.

The project will provide essential insights into the limitations and strengths of autism-related AI models in real-world applications, guiding their future clinical deployment. Through a commitment to transparency, reproducibility, and stakeholder engagement, the project will deliver high-impact validation reports, including community-accessible tools that ensure models are utilized effectively across a broad range of patient populations. The outcomes of this work will enhance autism care and set a new standard for multimodal, multi-institutional model validation in pediatric health research.

#Table6

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Lyall, Kristen	1OT2OD040445	Drexel University	Etiology/Causation Task I: Dataset Aggregation Task III: Data Analysis
Diet and the Exposome in Autism: Discovering Complex Interactions with Diet in Autism Etiology And Variability
Multiple factors contribute to the development of autism spectrum disorder (ASD), a developmental condition with early onset and heterogeneous phenotype. Prenatal diet is one of the few factors that has been associated with reductions in risk of ASD, and emerging evidence also suggests certain nutrients may mitigate the effects of other environmental exposures. At the same time, packaged and highly processed foods represent a major source of exposure to classes of chemicals linked with adverse neurodevelopment. The balance of risks and benefits in the diet, and effects on ASD, particularly within the context of other common risk factors, has not been well studied. The overarching goal of this project is to determine the role of diet and its complex interactions with other exposures in the development and presentation of autism spectrum disorder (ASD). This project will use existing data from up to 10,000 mother-child participants from the Environmental influences on Child Health Outcomes (ECHO) consortium, a large US program initiated with over 60 cohorts following a common protocol to study child health from gestation to adulthood. We address key gaps in the field under 3 related aims. 1) Examine complex interactions of prenatal diet and exposure to common chemicals on ASD. We focus on chemical exposures with diet as a major source and in common use, including pesticides, phthalates, and phenols, to better understand risks within the diet. We will first examine independent effects of under-studied but highly consumed “high burden foods” on ASD and related traits, and next use advanced mixture models to capture interactions between nutrients and measured levels of chemicals, and address potential mitigation of chemical exposure effects across sources by healthy components of the diet. We will utilize existing metabolomics data from prenatal measurements, which capture the biological effects of exposures, to determine pathways that may link these factors to ASD and bolster evidence for mechanisms that underlie associations. Next, we expand the consideration of the role of diet to: 2). Examine multi-exposure effects on ASD to determine key players in the prenatal exposome. Building from our prior work in ECHO, we will examine the ability of prenatal dietary factors to modify the effects of another common exposure rising in prevalence in the US in parallel with ASD: maternal metabolic conditions during pregnancy (which include obesity, gestational diabetes, and gestational hypertension). We will then use advanced data science approaches to determine how the wider set of prenatal risk factors across environmental, medical and lifestyle factors contributes to the gestational “exposome,” (the entirety of exposures), to influence ASD risk, and uncover key factors. As in aim 1, these analyses will be followed by mechanistic work using metabolomics data to determine key pathways underlying identified signals. As secondary outcomes across Aims 1 and 2, we will examine risk of not just ASD itself, but also its highly co-occurring conditions and ASD-related traits, to advance understanding of contributors to variability in ASD. Finally, we will make use of the large numbers in ECHO with information on childhood exposures to: 3). Determine dietary risks and deficiencies in children with ASD and examine how these contribute to phenotypic variability. These analyses will determine if autistic children experience worse nutrition than children without ASD, including higher intake of high burden foods, and assess whether these dietary differences contribute to symptom severity or risk of comorbidities, accounting for prenatal maternal exposures. Completion of these aims will aid discovery of novel interactions and exposures whose effects may be lessened or made worse by different aspects of diet. Ultimately, findings from this project will advance understanding of the role of diet in ASD and the broader ASD exposome, and present opportunities for interventions with the potential to reduce risks and improve the lives of autistic individuals.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Lyall, Kristen

1OT2OD040445

Drexel University

Etiology/Causation

Task I: Dataset Aggregation

Task III: Data Analysis

Diet and the Exposome in Autism: Discovering Complex Interactions with Diet in Autism Etiology And Variability

Multiple factors contribute to the development of autism spectrum disorder (ASD), a developmental condition with early onset and heterogeneous phenotype. Prenatal diet is one of the few factors that has been associated with reductions in risk of ASD, and emerging evidence also suggests certain nutrients may mitigate the effects of other environmental exposures. At the same time, packaged and highly processed foods represent a major source of exposure to classes of chemicals linked with adverse neurodevelopment. The balance of risks and benefits in the diet, and effects on ASD, particularly within the context of other common risk factors, has not been well studied. The overarching goal of this project is to determine the role of diet and its complex interactions with other exposures in the development and presentation of autism spectrum disorder (ASD). This project will use existing data from up to 10,000 mother-child participants from the Environmental influences on Child Health Outcomes (ECHO) consortium, a large US program initiated with over 60 cohorts following a common protocol to study child health from gestation to adulthood. We address key gaps in the field under 3 related aims. 1) Examine complex interactions of prenatal diet and exposure to common chemicals on ASD. We focus on chemical exposures with diet as a major source and in common use, including pesticides, phthalates, and phenols, to better understand risks within the diet. We will first examine independent effects of under-studied but highly consumed “high burden foods” on ASD and related traits, and next use advanced mixture models to capture interactions between nutrients and measured levels of chemicals, and address potential mitigation of chemical exposure effects across sources by healthy components of the diet. We will utilize existing metabolomics data from prenatal measurements, which capture the biological effects of exposures, to determine pathways that may link these factors to ASD and bolster evidence for mechanisms that underlie associations. Next, we expand the consideration of the role of diet to: 2). Examine multi-exposure effects on ASD to determine key players in the prenatal exposome. Building from our prior work in ECHO, we will examine the ability of prenatal dietary factors to modify the effects of another common exposure rising in prevalence in the US in parallel with ASD: maternal metabolic conditions during pregnancy (which include obesity, gestational diabetes, and gestational hypertension). We will then use advanced data science approaches to determine how the wider set of prenatal risk factors across environmental, medical and lifestyle factors contributes to the gestational “exposome,” (the entirety of exposures), to influence ASD risk, and uncover key factors. As in aim 1, these analyses will be followed by mechanistic work using metabolomics data to determine key pathways underlying identified signals. As secondary outcomes across Aims 1 and 2, we will examine risk of not just ASD itself, but also its highly co-occurring conditions and ASD-related traits, to advance understanding of contributors to variability in ASD. Finally, we will make use of the large numbers in ECHO with information on childhood exposures to: 3). Determine dietary risks and deficiencies in children with ASD and examine how these contribute to phenotypic variability. These analyses will determine if autistic children experience worse nutrition than children without ASD, including higher intake of high burden foods, and assess whether these dietary differences contribute to symptom severity or risk of comorbidities, accounting for prenatal maternal exposures.

Completion of these aims will aid discovery of novel interactions and exposures whose effects may be lessened or made worse by different aspects of diet. Ultimately, findings from this project will advance understanding of the role of diet in ASD and the broader ASD exposome, and present opportunities for interventions with the potential to reduce risks and improve the lives of autistic individuals.

#Table7

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Miller, Judith Susanne	1OT2OD040373	Children's Hospital Of Philadelphia	Prevalence Task I: Dataset Aggregation Task II: Data Generation Task III: Data Analysis
Genomic and Exposomic Factors in the Cause and Rise of Autism
This collaboration between the Children’s Hospital of Philadelphia (CHOP) and the University of Pennsylvania (Penn) focuses exactly on the key priorities set out for Autism Data Science Initiative: 1) to identify genetic and environmental factors that lead to an autism diagnosis and 2) determine how these factors have either contributed to or are reflected in the rising prevalence. Most of the existing research on environmental contributions to ASD focuses on isolated exposures, neglecting the systemic nature of real-world environmental interactions. This gap limits the development of actionable risk models for early diagnosis and personalized intervention. By integrating multi-level exposome data (geocoded structural factors, individual social determinants, and perinatal exposures) with genomic and omic profiles, this study will advance understanding of how environmental contexts interact with biological susceptibility to shape ASD phenotypes, offering a foundation for tailored clinical management and novel preventive strategies. While publicly available research datasets are critical to this effort, an integrated database on a well characterized, large, clinical population is essential. CHOP successfully implemented universal early autism screening in its primary care network in 2011, and has studied early screening, diagnostic outcomes, and prevalence through electronic health record (EHR) data since then. This cohort now includes 104,405 children born between 2008-2017 who were screened for autism at an 18- or 24-month well-child visit and had at least one additional visit within the CHOP network at 4+ years of age. Thus, this is a well characterized and research-ready dataset of ~4000 children with autism and ~100,000 without. Our proposed three-year project is to aggregate clinical and genomic data from CHOP’s EHR, clinical and research biorepositories; with Penn’s EHR data on pregnancy, maternal, and birth outcomes, and research databases (ROA Task 1). We will also generate geo-coded exposomic data including daily air and water components, greenspace, built environment, and the Childhood Opportunity Index during pregnancy, birth, and childhood (ROA Task 2). We will then use advanced data science methods to generate hypotheses about relationships among and between variables to predict autism diagnosis and explain the increased prevalence over time (ROA Task 3). With this resource and the predictive power of our machine learning analytic plan, this project will have the statistical power to identify potential causes of autism in specific populations (e.g., starting with genetic and phenotypic subgroups). We will develop a prediction model that incorporates child-, family-, neighborhood- and community-level clinical, genomic, and exposomic data to predict autism in our cohort of 104,405 children who were all screened for autism as toddlers and who are now age 8-17 years. We will also identify genomic, exposomic, clinical-practice factors, and their interactions, that contribute to the increase in autism prevalence and heterogeneity of the autism phenotype, including increases in medical and genetic risks; changes in environmental exposures; and critical changes in diagnostic practice, criteria, and service availability. Our multi-disciplinary team represents the best clinical, informatics, genetics, data science, and community engagement expertise for developing machine learning methods and tools for large-scale, high-resolution, meaningful, and actionable autism research. If successfully implemented, this study will create an unprecedented data resource for researchers to derive specific, testable causal hypotheses and definitively answer questions regarding causes of autism in a way that was not previously possible. Furthermore, the results will parse the variance across broad domains of genomic and exposomic factors that may cause autism, which is critical for setting future research priorities.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Miller, Judith Susanne

1OT2OD040373

Children's Hospital Of Philadelphia

Prevalence

Task I: Dataset Aggregation

Task II: Data Generation

Task III: Data Analysis

Genomic and Exposomic Factors in the Cause and Rise of Autism

This collaboration between the Children’s Hospital of Philadelphia (CHOP) and the University of Pennsylvania (Penn) focuses exactly on the key priorities set out for Autism Data Science Initiative: 1) to identify genetic and environmental factors that lead to an autism diagnosis and 2) determine how these factors have either contributed to or are reflected in the rising prevalence. Most of the existing research on environmental contributions to ASD focuses on isolated exposures, neglecting the systemic nature of real-world environmental interactions. This gap limits the development of actionable risk models for early diagnosis and personalized intervention. By integrating multi-level exposome data (geocoded structural factors, individual social determinants, and perinatal exposures) with genomic and omic profiles, this study will advance understanding of how environmental contexts interact with biological susceptibility to shape ASD phenotypes, offering a foundation for tailored clinical management and novel preventive strategies. While publicly available research datasets are critical to this effort, an integrated database on a well characterized, large, clinical population is essential. CHOP successfully implemented universal early autism screening in its primary care network in 2011, and has studied early screening, diagnostic outcomes, and prevalence through electronic health record (EHR) data since then. This cohort now includes 104,405 children born between 2008-2017 who were screened for autism at an 18- or 24-month well-child visit and had at least one additional visit within the CHOP network at 4+ years of age. Thus, this is a well characterized and research-ready dataset of ~4000 children with autism and ~100,000 without. Our proposed three-year project is to aggregate clinical and genomic data from CHOP’s EHR, clinical and research biorepositories; with Penn’s EHR data on pregnancy, maternal, and birth outcomes, and research databases (ROA Task 1). We will also generate geo-coded exposomic data including daily air and water components, greenspace, built environment, and the Childhood Opportunity Index during pregnancy, birth, and childhood (ROA Task 2). We will then use advanced data science methods to generate hypotheses about relationships among and between variables to predict autism diagnosis and explain the increased prevalence over time (ROA Task 3). With this resource and the predictive power of our machine learning analytic plan, this project will have the statistical power to identify potential causes of autism in specific populations (e.g., starting with genetic and phenotypic subgroups). We will develop a prediction model that incorporates child-, family-, neighborhood- and community-level clinical, genomic, and exposomic data to predict autism in our cohort of 104,405 children who were all screened for autism as toddlers and who are now age 8-17 years. We will also identify genomic, exposomic, clinical-practice factors, and their interactions, that contribute to the increase in autism prevalence and heterogeneity of the autism phenotype, including increases in medical and genetic risks; changes in environmental exposures; and critical changes in diagnostic practice, criteria, and service availability. Our multi-disciplinary team represents the best clinical, informatics, genetics, data science, and community engagement expertise for developing machine learning methods and tools for large-scale, high-resolution, meaningful, and actionable autism research. If successfully implemented, this study will create an unprecedented data resource for researchers to derive specific, testable causal hypotheses and definitively answer questions regarding causes of autism in a way that was not previously possible. Furthermore, the results will parse the variance across broad domains of genomic and exposomic factors that may cause autism, which is critical for setting future research priorities.

#Table8

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Sebat, Jonathan	1OT2OD040415	University of California, San Diego	Etiology/ Causation Task I: Dataset Aggregation Task III: Data Analysis
Elucidating the Interplay of Genes and Environment in Autism Using Genomic and Exposure Data from Large Populations
Autism spectrum disorder (ASD) is a multifactorial neurodevelopmental condition influenced by both genetic and environmental factors. While rare and common genetic variants have identified over 100 ASD-associated genes and copy number variants (CNVs), the mechanisms through which environmental exposures modify genetic susceptibility remain poorly understood. A major barrier to progress has been insufficient sample size to rigorously investigate gene-environment interactions (G×E). To address this gap, we have assembled a combined dataset of 2.7 million individuals with paired genetic, environmental, and clinical data across multiple cohorts. This study will apply innovative causal inference frameworks to disentangle direct environmental effects from gene-environment correlation, identify genetic modifiers of environmental exposures, and develop predictive models integrating genetic, environmental, and clinical data. In Aim 1, we will investigate prenatal exposures and early-life environmental factors in two deeply phenotyped cohorts (ABCD, HBCD) with genome-wide genetic data, neurodevelopmental assessments, neuroimaging, and EEG. In Aim 2, we will examine G×E effects in neonatal intensive care using whole genome sequencing and perinatal electronic health record (EHR) data from the BeginNGS consortium. In Aim 3, we will conduct large-scale meta-analyses of these studies with large biobank and health system cohorts and the SPARK ASD family cohort to identify G×E interactions, and to elucidate causal pathways through which environmental exposures contribute to ASD. The proposed study will generate novel insights into modifiable environmental risk factors and their interaction with genetic susceptibility to ASD, providing a foundation for targeted prevention and personalized intervention strategies.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Sebat, Jonathan

1OT2OD040415

University of California, San Diego

Etiology/ Causation

Task I: Dataset Aggregation

Task III: Data Analysis

Elucidating the Interplay of Genes and Environment in Autism Using Genomic and Exposure Data from Large Populations

Autism spectrum disorder (ASD) is a multifactorial neurodevelopmental condition influenced by both genetic and environmental factors. While rare and common genetic variants have identified over 100 ASD-associated genes and copy number variants (CNVs), the mechanisms through which environmental exposures modify genetic susceptibility remain poorly understood. A major barrier to progress has been insufficient sample size to rigorously investigate gene-environment interactions (G×E). To address this gap, we have assembled a combined dataset of 2.7 million individuals with paired genetic, environmental, and clinical data across multiple cohorts. This study will apply innovative causal inference frameworks to disentangle direct environmental effects from gene-environment correlation, identify genetic modifiers of environmental exposures, and develop predictive models integrating genetic, environmental, and clinical data. In Aim 1, we will investigate prenatal exposures and early-life environmental factors in two deeply phenotyped cohorts (ABCD, HBCD) with genome-wide genetic data, neurodevelopmental assessments, neuroimaging, and EEG. In Aim 2, we will examine G×E effects in neonatal intensive care using whole genome sequencing and perinatal electronic health record (EHR) data from the BeginNGS consortium. In Aim 3, we will conduct large-scale meta-analyses of these studies with large biobank and health system cohorts and the SPARK ASD family cohort to identify G×E interactions, and to elucidate causal pathways through which environmental exposures contribute to ASD. The proposed study will generate novel insights into modifiable environmental risk factors and their interaction with genetic susceptibility to ASD, providing a foundation for targeted prevention and personalized intervention strategies.

#Table9

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Stein, Jason Louis	1OT2OD040436-01	University of North Carolina Chapel Hill	Etiology/ Causation Task I: Dataset Aggregation Task II: Data Generation Task III: Data Analysis
Modeling autism-associated gene-by-environment interactions in human brain organoids
Genetic studies have been highly successful in identifying causal factors associated with autism spectrum disorders (ASD). However, current genetic associations do not completely explain liability for this complex disorder. Several environmental exposures, like air pollution, pesticide or heavy metal exposures, are associated with ASD through epidemiological studies. However, unlike genetic studies, these environmental epidemiological associations have inherent problems including that environmental factors are difficult to quantify, and confounding variables are prevalent. In addition, while epidemiological datasets can highlight risk factors for autism, developing new treatments requires an understanding of the cellular and molecular consequences of genetics and environmental exposures on brain cells. While both genetic and environmental risk factors for ASD have been identified, how genetic background accentuates or blunts response to environmental risk factors, termed gene x environment interactions (GxE), has not been well studied. Our group has pioneered the “GxE in a dish” approach to identify genetic variants modulating response to environmental toxicants using human neural cells including brain organoids where exposures can be tightly controlled. To identify how the combination of genetic variation and environmental exposures lead to risk for ASD, to identify cellular and molecular mechanisms mediating these effects, and to suggest treatment targets to reverse these, we will conduct “GxE in a dish” studies using cortical organoids to understand the cellular and molecular consequences of autism-associated environmental toxicants modulated by genetic variation. We will conduct acute and chronic exposures on cortical organoids derived from 115 unique participants to environmentally relevant concentrations of 6 toxicants and vehicle in cortical organoids, modeling exposures during pregnancy, and we will perform single cell RNA-seq (scRNA-seq) for each organoid. We will identify common genetic variants associated with differences in gene regulation in response to environmental exposures within each identified cell type. Then, we will determine the impact of exposure-sensitive genetic variants on ASD diagnosis, intellectual ability, and brain development. These analyses will highlight whether exposure-sensitive alleles accentuate ASD risk allele effects within specific cell types of the developing brain. Overall, our proposal will identify genetic variants exerting cell-type specific modulations on environmental exposures through a novel “GxE in a dish” design. Successful completion of these aims may uncover unexplained risk for ASD that cannot be accounted for by genetics or exposures in isolation.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Stein, Jason Louis

1OT2OD040436-01

University of North Carolina Chapel Hill

Etiology/ Causation

Task I: Dataset Aggregation

Task II: Data Generation

Task III: Data Analysis

Modeling autism-associated gene-by-environment interactions in human brain organoids

Genetic studies have been highly successful in identifying causal factors associated with autism spectrum disorders (ASD). However, current genetic associations do not completely explain liability for this complex disorder. Several environmental exposures, like air pollution, pesticide or heavy metal exposures, are associated with ASD through epidemiological studies. However, unlike genetic studies, these environmental epidemiological associations have inherent problems including that environmental factors are difficult to quantify, and confounding variables are prevalent. In addition, while epidemiological datasets can highlight risk factors for autism, developing new treatments requires an understanding of the cellular and molecular consequences of genetics and environmental exposures on brain cells. While both genetic and environmental risk factors for ASD have been identified, how genetic background accentuates or blunts response to environmental risk factors, termed gene x environment interactions (GxE), has not been well studied. Our group has pioneered the “GxE in a dish” approach to identify genetic variants modulating response to environmental toxicants using human neural cells including brain organoids where exposures can be tightly controlled. To identify how the combination of genetic variation and environmental exposures lead to risk for ASD, to identify cellular and molecular mechanisms mediating these effects, and to suggest treatment targets to reverse these, we will conduct “GxE in a dish” studies using cortical organoids to understand the cellular and molecular consequences of autism-associated environmental toxicants modulated by genetic variation. We will conduct acute and chronic exposures on cortical organoids derived from 115 unique participants to environmentally relevant concentrations of 6 toxicants and vehicle in cortical organoids, modeling exposures during pregnancy, and we will perform single cell RNA-seq (scRNA-seq) for each organoid. We will identify common genetic variants associated with differences in gene regulation in response to environmental exposures within each identified cell type. Then, we will determine the impact of exposure-sensitive genetic variants on ASD diagnosis, intellectual ability, and brain development. These analyses will highlight whether exposure-sensitive alleles accentuate ASD risk allele effects within specific cell types of the developing brain. Overall, our proposal will identify genetic variants exerting cell-type specific modulations on environmental exposures through a novel “GxE in a dish” design. Successful completion of these aims may uncover unexplained risk for ASD that cannot be accounted for by genetics or exposures in isolation.

#Table10

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Walker, Douglas Ian	1OT2OD040409	Emory University	Etiology/Causation Task II: Data Generation Task III: Data Analysis
Mapping Internal Exposome-Metabolome Dynamics with Advanced Data Science to Identify Environmental Determinants of Autism.
Neurodevelopmental disabilities among autistic people are an increasing public health concern in the U.S. Current prevalence estimates indicate that 1 in 31 school-aged children have autism, and the increase in recent decades strongly supports environmental factors as key contributors. However, there have been no systematic studies of complex environmental exposures contributing to the likelihood of developing autism. Leveraging a powerful untargeted high-resolution mass spectrometry (HRMS) approach and two large, multi-site autism studies from the U.S. that already include extensive genetic, omics, questionnaire, targeted exposure, and phenotype data, we will create the largest exposome database for autism and an Autism Exposome Atlas well-powered for foundational discovery analyses of non-genetic factors driving autism outcomes, including the role of critical developmental periods and familial studies supporting differentiation between shared environmental and genetic influences. The exposome represents cumulative life-long environmental exposures that produce biological response signatures influencing health; exposome characterization is widely recognized as the greatest unmet challenge in children’s environmental health. Our team is at the forefront in developing critical advances in HRMS methodologies and algorithms for chemical detection, high-dimensional approaches for biomarker selection, and advanced mixtures statistics that address the complexity of the real-life environment. We are thus poised to conduct cutting-edge exposomic research on environmental drivers of autism-associated health outcomes. In support of the Autism Data Science Initiative (ADSI) Tasks II and III, we will apply these approaches to establish dynamic exposome-metabolome signatures of autism outcomes. We will leverage children and parent biospecimens collected from participants enrolled in the Early Autism Risk Longitudinal Investigation (EARLI) and the Study to Explore Early Development (SEED) cohorts to 1) Develop a comprehensive database of environmental, dietary, and chemical biomarkers that influence autism development; 2) Assemble a unified Autism Exposome Atlas through comprehensive and high-throughput chemical exposome profiling of 7,812 blood samples (including autism cases and their parents) for profiling of environmental, dietary, pharmaceutical, and endogenous metabolite biomarkers; 3) Identify exposome biomarker profiles of autism development, and 4) Integrate exposure and biological response pathways to uncover mechanisms underlying autism across different windows of susceptibility. Our results will identify critical exposome biomarkers for autism and determine how exposure and biological response contribute to neurodevelopment and symptom heterogeneity. We will develop a transformative Autism Exposome Atlas providing a centralized and organized resource for evaluating familial and cross-sectional exposome signatures and corresponding functional relationships with underlying biological response signatures. Assembly of the Autism Exposome Atlas will accelerate identification of key environmental predictors of autism and provide the evidence needed to prioritize public health interventions to support child neurodevelopment and improve health and well-being among autistic people.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Walker, Douglas Ian

1OT2OD040409

Emory University

Etiology/Causation

Task II: Data Generation

Task III: Data Analysis

Mapping Internal Exposome-Metabolome Dynamics with Advanced Data Science to Identify Environmental Determinants of Autism.

Neurodevelopmental disabilities among autistic people are an increasing public health concern in the U.S. Current prevalence estimates indicate that 1 in 31 school-aged children have autism, and the increase in recent decades strongly supports environmental factors as key contributors. However, there have been no systematic studies of complex environmental exposures contributing to the likelihood of developing autism. Leveraging a powerful untargeted high-resolution mass spectrometry (HRMS) approach and two large, multi-site autism studies from the U.S. that already include extensive genetic, omics, questionnaire, targeted exposure, and phenotype data, we will create the largest exposome database for autism and an Autism Exposome Atlas well-powered for foundational discovery analyses of non-genetic factors driving autism outcomes, including the role of critical developmental periods and familial studies supporting differentiation between shared environmental and genetic influences. The exposome represents cumulative life-long environmental exposures that produce biological response signatures influencing health; exposome characterization is widely recognized as the greatest unmet challenge in children’s environmental health. Our team is at the forefront in developing critical advances in HRMS methodologies and algorithms for chemical detection, high-dimensional approaches for biomarker selection, and advanced mixtures statistics that address the complexity of the real-life environment. We are thus poised to conduct cutting-edge exposomic research on environmental drivers of autism-associated health outcomes. In support of the Autism Data Science Initiative (ADSI) Tasks II and III, we will apply these approaches to establish dynamic exposome-metabolome signatures of autism outcomes. We will leverage children and parent biospecimens collected from participants enrolled in the Early Autism Risk Longitudinal Investigation (EARLI) and the Study to Explore Early Development (SEED) cohorts to 1) Develop a comprehensive database of environmental, dietary, and chemical biomarkers that influence autism development; 2) Assemble a unified Autism Exposome Atlas through comprehensive and high-throughput chemical exposome profiling of 7,812 blood samples (including autism cases and their parents) for profiling of environmental, dietary, pharmaceutical, and endogenous metabolite biomarkers; 3) Identify exposome biomarker profiles of autism development, and 4) Integrate exposure and biological response pathways to uncover mechanisms underlying autism across different windows of susceptibility. Our results will identify critical exposome biomarkers for autism and determine how exposure and biological response contribute to neurodevelopment and symptom heterogeneity. We will develop a transformative Autism Exposome Atlas providing a centralized and organized resource for evaluating familial and cross-sectional exposome signatures and corresponding functional relationships with underlying biological response signatures. Assembly of the Autism Exposome Atlas will accelerate identification of key environmental predictors of autism and provide the evidence needed to prioritize public health interventions to support child neurodevelopment and improve health and well-being among autistic people.

#Table11

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Wang, Xiaobin	1OT2OD040418	Johns Hopkins University	Etiology/Causation Task II: Data Generation Task III: Data Analysis
Boston Birth Cohort - Autism Data Science Initiative (BBC-ADSI)
The overarching goal of this proposed project is to address two strategic tasks outlined by the NIH OTA25-006 - Autism Data Science Initiative (ADSI): Task II – Targeted data generation to complement existing datasets to fill critical data gaps. Task III – Advanced data analysis using state-of-the-art statistical methods, artificial intelligence (AI), and machine learning (ML) for both hypothesis testing and hypothesis generating. Aim 1 will leverage the Boston Birth Cohort (BBC)—a large, long-term, prospective, and deeply phenotyped U.S. birth cohort—to advance the ADSI mission. We propose to expand the BBC’s multi-omics resources by generating new data from archived biospecimens using cutting-edge, unbiased biotechnologies across the following informative groups: Children diagnosed with autism; Children with elevated autistic quantitative traits without a diagnosis; children with other developmental disabilities; and Neurotypical children. The omics data will include the genome, epigenome, metabolome, proteome, and IgG antibody reactome, all derived from blood samples collected at birth and at 1–2 years of age—critical developmental windows for gaining insight into the biological mechanisms underlying autism onset and trajectory. Aim 2 will conduct innovative analyses by integrating multi-omics data with exposome measures and detailed autism phenotypes to address a fundamental question: What causes autism? Informed by literature and our own work, we will test hypotheses focused on understudied, yet potentially high-impact environmental exposures and promising biological pathways. Our team is uniquely positioned to contribute scientific and methodological advances to the ADSI, including but not limited to: Defining autism’s complex phenotypes by leveraging rich data resources—quantitative measures of core autistic traits (e.g., SCQ and SRS), clinical evaluations and diagnoses, longitudinal electronic medical records, and special educational services. Delineating the individual and combined effects of early-life factors on autism. The BBC has amassed extensive early-life exposure data—many rarely studied in an integrated fashion— including maternal nutrition, dietary patterns, psychosocial stress, toxic metals, per- and polyfluoroalkyl substances (PFAS), prenatal and perinatal clinical interventions, medications, adverse birth outcomes, neighborhood characteristics, and in utero and early-life infections, inflammation, antibiotics use, and immune responses. Integrating multi-omics and early life exposome to gain crucial insights into gene–environment (G×E) interactions and the biological pathways underlying autism development and progression. This proposal builds on our longstanding effort to generate a multi-dimensional, prospective birth cohort for autism research. It will be carried out by a transdisciplinary team with expertise in pediatrics, autism, environmental and genetic epidemiology, biotechnology, immunology, multi-omics, statistical genetics, computational genomics, AI, and ML. Successful execution of this project will produce an unprecedented multi-omics × exposome dataset, support novel analytic approaches, and catalyze future research, including replication studies and meta-analyses within the ADSI network. The project’s impact will be further amplified through a robust community engagement and dissemination strategy throughout the study period.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Wang, Xiaobin

1OT2OD040418

Johns Hopkins University

Etiology/Causation

Task II: Data Generation

Task III: Data Analysis

Boston Birth Cohort - Autism Data Science Initiative (BBC-ADSI)

The overarching goal of this proposed project is to address two strategic tasks outlined by the NIH OTA25-006 - Autism Data Science Initiative (ADSI):

Task II – Targeted data generation to complement existing datasets to fill critical data gaps.
Task III – Advanced data analysis using state-of-the-art statistical methods, artificial intelligence (AI), and machine learning (ML) for both hypothesis testing and hypothesis generating.

Aim 1 will leverage the Boston Birth Cohort (BBC)—a large, long-term, prospective, and deeply phenotyped U.S. birth cohort—to advance the ADSI mission. We propose to expand the BBC’s multi-omics resources by generating new data from archived biospecimens using cutting-edge, unbiased biotechnologies across the following informative groups: Children diagnosed with autism; Children with elevated autistic quantitative traits without a diagnosis; children with other developmental disabilities; and Neurotypical children. The omics data will include the genome, epigenome, metabolome, proteome, and IgG antibody reactome, all derived from blood samples collected at birth and at 1–2 years of age—critical developmental windows for gaining insight into the biological mechanisms underlying autism onset and trajectory.

Aim 2 will conduct innovative analyses by integrating multi-omics data with exposome measures and detailed autism phenotypes to address a fundamental question: What causes autism? Informed by literature and our own work, we will test hypotheses focused on understudied, yet potentially high-impact environmental exposures and promising biological pathways. Our team is uniquely positioned to contribute scientific and methodological advances to the ADSI, including but not limited to:

Defining autism’s complex phenotypes by leveraging rich data resources—quantitative measures of core autistic traits (e.g., SCQ and SRS), clinical evaluations and diagnoses, longitudinal electronic medical records, and special educational services.
Delineating the individual and combined effects of early-life factors on autism. The BBC has amassed extensive early-life exposure data—many rarely studied in an integrated fashion— including maternal nutrition, dietary patterns, psychosocial stress, toxic metals, per- and polyfluoroalkyl substances (PFAS), prenatal and perinatal clinical interventions, medications, adverse birth outcomes, neighborhood characteristics, and in utero and early-life infections, inflammation, antibiotics use, and immune responses.
Integrating multi-omics and early life exposome to gain crucial insights into gene–environment (G×E) interactions and the biological pathways underlying autism development and progression.
This proposal builds on our longstanding effort to generate a multi-dimensional, prospective birth cohort for autism research. It will be carried out by a transdisciplinary team with expertise in pediatrics, autism, environmental and genetic epidemiology, biotechnology, immunology, multi-omics, statistical genetics, computational genomics, AI, and ML. Successful execution of this project will produce an unprecedented multi-omics × exposome dataset, support novel analytic approaches, and catalyze future research, including replication studies and meta-analyses within the ADSI network. The project’s impact will be further amplified through a robust community engagement and dissemination strategy throughout the study period.

#Table12

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Zhong, Hua Judy	1OT2OD040511	Weill Medical College Of Cornell Univ	Task IV: Model Replication/Validation
AR² - Autism Replication, Validation, and Reproducibility Center
Autism Spectrum Disorder (ASD) research, particularly in light of its increasing prevalence and societal impact, requires rigorous replication and validation (R&V) to ensure that scientific findings are reproducible, generalizable, and applicable across diverse populations and settings. We propose an independent Autism Replication, Validation, and Reproducibility (AR²) Center in response to the NIH Autism Data Science Initiative (ADSI) Task IV, “Model Validation or Method Replication”. The primary goal of AR² is to ensure that every ADSI-generated resource is paired with an AR²-certified, complete, standalone package—fully aligned with FAIR (Findable, Accessible, Interoperable, and Reusable) principles—that enables independent reproduction of data generation, aggregation, or modeling processes in external environments and supports transparent, verifiable downstream analyses by the broader autism research community. AR² builds upon the nationally recognized Cornell Center for Social Sciences (CCSS) Data and Reproduction Archive, a CoreTrustSeal-certified infrastructure with a data archive and results replication (R2) pipeline that has archived over 2,150 studies and replicated more than 130 published datasets since 1982. Expanding on this proven infrastructure, AR² will establish a standardized, co-ownership pipeline with ADSI teams to collaboratively define R&V scope, success criteria, and deliverables; execute internal and external validation where applicable; and deliver certified R&V packages, including annotated code, test datasets, metrics, and compliance documentation (Aim 1). AR² will draw upon large-scale, racially, and geographically diverse autism-relevant data sources—including the INSIGHT Clinical Research Network, PEDSnet, PCORnet, Inovalon, and Medicaid claims datasets—to support robust validation and generalizability evaluation. Throughout the ADSI funding period, AR² will promote R&V best practices across the autism research community through targeted training and workshops and will coordinate closely with ADSI program staff, project teams, and a Community Advisory Board to ensure alignment with evolving scientific and community priorities. Final deliverables will be disseminated through accessible, trusted repositories to maximize transparency and impact (Aim 2). By applying standardized, FAIR-aligned workflows, leveraging a nationally recognized R&V infrastructure, and utilizing diverse datasets spanning racial/ethnic, geographic, and socioeconomic groups, AR² will rigorously replicate, validate, and document the generalizability of individual ADSI projects. Through these efforts, AR² will foster a culture of open, reproducible autism research and accelerate the translation of autism data science into clinical practice and policy.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Zhong, Hua Judy

1OT2OD040511

Weill Medical College Of Cornell Univ

Task IV: Model Replication/Validation

AR² - Autism Replication, Validation, and Reproducibility Center

Autism Spectrum Disorder (ASD) research, particularly in light of its increasing prevalence and societal impact, requires rigorous replication and validation (R&V) to ensure that scientific findings are reproducible, generalizable, and applicable across diverse populations and settings. We propose an independent Autism Replication, Validation, and Reproducibility (AR²) Center in response to the NIH Autism Data Science Initiative (ADSI) Task IV, “Model Validation or Method Replication”. The primary goal of AR² is to ensure that every ADSI-generated resource is paired with an AR²-certified, complete, standalone package—fully aligned with FAIR (Findable, Accessible, Interoperable, and Reusable) principles—that enables independent reproduction of data generation, aggregation, or modeling processes in external environments and supports transparent, verifiable downstream analyses by the broader autism research community. AR² builds upon the nationally recognized Cornell Center for Social Sciences (CCSS) Data and Reproduction Archive, a CoreTrustSeal-certified infrastructure with a data archive and results replication (R2) pipeline that has archived over 2,150 studies and replicated more than 130 published datasets since 1982. Expanding on this proven infrastructure, AR² will establish a standardized, co-ownership pipeline with ADSI teams to collaboratively define R&V scope, success criteria, and deliverables; execute internal and external validation where applicable; and deliver certified R&V packages, including annotated code, test datasets, metrics, and compliance documentation (Aim 1). AR² will draw upon large-scale, racially, and geographically diverse autism-relevant data sources—including the INSIGHT Clinical Research Network, PEDSnet, PCORnet, Inovalon, and Medicaid claims datasets—to support robust validation and generalizability evaluation. Throughout the ADSI funding period, AR² will promote R&V best practices across the autism research community through targeted training and workshops and will coordinate closely with ADSI program staff, project teams, and a Community Advisory Board to ensure alignment with evolving scientific and community priorities. Final deliverables will be disseminated through accessible, trusted repositories to maximize transparency and impact (Aim 2).

By applying standardized, FAIR-aligned workflows, leveraging a nationally recognized R&V infrastructure, and utilizing diverse datasets spanning racial/ethnic, geographic, and socioeconomic groups, AR² will rigorously replicate, validate, and document the generalizability of individual ADSI projects. Through these efforts, AR² will foster a culture of open, reproducible autism research and accelerate the translation of autism data science into clinical practice and policy.

#Table13

Contact PI/Project Leader	Project Number	Awardee Organization	Areas of Research & ADSI Task Areas
Zuckerman, Katharine Elizabeth	1OT2OD040512	Oregon Health & Science University	Clinical Services/Treatment Task I: Dataset Aggregation Task II: Data Generation Task III: Data Analysis
Advancing Success & Developmental Outcomes in Autism Spectrum Disorder through Analysis of Secondary Data (ASD3 Outcomes Project)
Autistic children experience some of the lowest health care quality and highest unmet needs of any pediatric chronic condition. Additionally, disparities persist in service use and life course outcomes among autistic people. These problems exist because (1) we have not adequately assessed which outcomes are most essential for autistic children and their caregivers, and (2) few large-scale studies have assessed which individual and family factors, service use factors, and local/state environmental features are associated with optimal and suboptimal health outcomes. The proposed project, Advancing Success and Developmental Outcomes in Autism Spectrum Disorder through the Analysis of Secondary Data (ASD3 Outcomes), will fill these evidence gaps by linking multiple large, population-based data sets providing rich information on various factors driving health outcomes for autistic children. By the end of this project, we will have generated actionable evidence to guide national and state-level strategies for improving health outcomes for autistic children ages 1–17 years. Leveraging cutting-edge data science methodologies including multi-source data harmonization and deep learning, this initiative will identify the most salient predictors of optimal and suboptimal outcomes among children and youth, examine geographic and demographic disparities in these outcomes, inform system-level interventions, and advance evidence-based policy change to improve health for autistic individuals. First, we will assemble a community advisory panel of autistic youth and adults, parents and caregivers, and health and educational providers, and a technical advisory panel of autism and data science researchers. We will use the panelists’ expertise to identify key health and educational outcomes that can be measured in Medicaid claims data in all 50 states, the National Survey of Children’s Health (NSCH) in all 50 states, and/or Early Intervention state data (HI, IN, MN, OR, VT). Next, we will use the NSCH to create state-level, age-specific Autism Quality Indices that measure factors driving health outcomes for autistic children at different developmental stages. We will apply these indices, along with child-level demographic markers, neighborhood measures (using the Child Opportunity Index 3.0), and other state health and education systems variables, to model key outcomes in Medicaid Claims and State Early Intervention data, through interpretable machine and deep learning models. Finally, we will translate evidence into action by harnessing the collective expertise of our community and technical advisory panels to develop recommendations based on the community-engaged and data-driven findings generated. These efforts will produce actionable insights to guide policy, programs, and practice that optimize health outcomes for autistic children and their families nationwide. We will disseminate this work broadly.

Contact PI/Project Leader

Project Number

Awardee Organization

Areas of Research & ADSI Task Areas

Zuckerman, Katharine Elizabeth

1OT2OD040512

Oregon Health & Science University

Clinical Services/Treatment

Task I: Dataset Aggregation

Task II: Data Generation

Task III: Data Analysis

Advancing Success & Developmental Outcomes in Autism Spectrum Disorder through Analysis of Secondary Data (ASD3 Outcomes Project)

Autistic children experience some of the lowest health care quality and highest unmet needs of any pediatric chronic condition. Additionally, disparities persist in service use and life course outcomes among autistic people. These problems exist because (1) we have not adequately assessed which outcomes are most essential for autistic children and their caregivers, and (2) few large-scale studies have assessed which individual and family factors, service use factors, and local/state environmental features are associated with optimal and suboptimal health outcomes. The proposed project, Advancing Success and Developmental Outcomes in Autism Spectrum Disorder through the Analysis of Secondary Data (ASD3 Outcomes), will fill these evidence gaps by linking multiple large, population-based data sets providing rich information on various factors driving health outcomes for autistic children. By the end of this project, we will have generated actionable evidence to guide national and state-level strategies for improving health outcomes for autistic children ages 1–17 years. Leveraging cutting-edge data science methodologies including multi-source data harmonization and deep learning, this initiative will identify the most salient predictors of optimal and suboptimal outcomes among children and youth, examine geographic and demographic disparities in these outcomes, inform system-level interventions, and advance evidence-based policy change to improve health for autistic individuals.

First, we will assemble a community advisory panel of autistic youth and adults, parents and caregivers, and health and educational providers, and a technical advisory panel of autism and data science researchers. We will use the panelists’ expertise to identify key health and educational outcomes that can be measured in Medicaid claims data in all 50 states, the National Survey of Children’s Health (NSCH) in all 50 states, and/or Early Intervention state data (HI, IN, MN, OR, VT).

Next, we will use the NSCH to create state-level, age-specific Autism Quality Indices that measure factors driving health outcomes for autistic children at different developmental stages. We will apply these indices, along with child-level demographic markers, neighborhood measures (using the Child Opportunity Index 3.0), and other state health and education systems variables, to model key outcomes in Medicaid Claims and State Early Intervention data, through interpretable machine and deep learning models.

Finally, we will translate evidence into action by harnessing the collective expertise of our community and technical advisory panels to develop recommendations based on the community-engaged and data-driven findings generated. These efforts will produce actionable insights to guide policy, programs, and practice that optimize health outcomes for autistic children and their families nationwide. We will disseminate this work broadly.

Autism Data Science Initiative Funded Research