Introduction

Autism spectrum disorder (ASD) is a neurodevelopmental disorder characterized by social communication deficits and the presence of restricted and repetitive behaviors. Although clinical signs of ASD are detectable at an age of as early as 12 months, the global average age at the time of diagnosis is 60.48 months (range 30.90–234.57 months), according to a recent meta-analysis (van’t Hof et al., 2021). The Centers for Disease Control and Prevention has estimated the median age of ASD diagnosis to be 51 months (range 38–57 months) in the US (Baio et al., 2018). The global shortage of well-qualified professionals who specialize in ASD contributes to the delay in diagnosis (Huang et al., 2020), and the current clinical diagnosis process is also hindered by subjectivity and reporter-dependency (Xiao et al., 2017). Identification and implementation of early biomarkers in clinical practice may help enhance early detection and reduce misdiagnoses of ASD, which would ultimately lead to better treatment outcomes and fairer prognosis (Akshoomoff et al., 2004). The use of machine learning-based methods has been increasing applied to increase the diagnostic accuracy and speed, and most studies have used T1-weighted (T1w) magnetic resonance imaging (MRI) or functional MRI (fMRI) features as input features. A recent meta-analysis of machine learning-based studies that used T1w MRI data reported a sensitivity of 0.83 and specificity of 0.84; comparatively, a subgroup meta-analysis of fMRI data showed that the sensitivity and specificity ranged from 0.69 to 0.66 (Moon et al., 2019). However, the studies that were included in this meta-analysis were limited by high heterogeneity, risk of overfitting, and use of imaging data acquired with a single modality.

Studies on neurological underpinnings of ASD have suggested that the brain of patients with ASD is characterized by widespread abnormalities such as altered cortical anatomy, abnormal white-matter integrity, and altered brain function and connectivity (Libero et al., 2015). Because of the complexity and multilayered properties of the brain of patients with ASD, unimodal MRI studies may not be sufficient to provide a comprehensive profile of ASD neurobiology. Although multimodal classifiers can help explore multivariate dimensions and provide a rich multiparametric marker with increased accuracy (Ingalhalikar et al., 2014), such studies have been strikingly limited in number (Zhuang et al., 2019). To our knowledge, only one study has combined T1w MRI and diffusion tensor imaging (DTI) data for classification. In that study, Libero et al. (2015) used T1w MRI, DTI, and magnetic resonance spectroscopy (MRS) data and found that the highest accuracy of 91.9% was achieved with fractional anisotropy (FA), radial diffusivity (RD), and cortical thickness as key predictors. However, this study was limited by its small sample size (19 patients with ASD and 18 controls) and by targeting of only adults with high-functioning ASD (HFA), which affected the generalizability of its results to younger patients or those with low-functioning ASDs (LFAs) (Libero et al., 2015).

The young childhood age (age 3–6 years) group and the LFA group are understudied populations in ASD neuroimaging studies. This period is critical as it is close to the time of their clinical diagnosis and is a time of rapid and dynamic brain growth (Shen et al., 2016). As it precedes the age when behavioral treatments or medications influence the consolidation and adaptation of neural networks, the neural connectivity at this age may more closely reflect the emerging diagnostic features of ASD (Shen et al., 2016). The intelligence quotient (IQ) is known to be a major prognostic predictive factor, as verbal IQ was shown to be a significant predictor of autism trajectory in a longitudinal study from the age of 2 to 15 years (Gotham et al., 2012), and pretreatment IQ levels were found to show less improvement following treatment (Ameis & Catani, 2015). ASD with an intellectual disability accounts for 31% of all ASD cases and an additional 23% function in the borderline range (Alloway, 2010). Only 11% of the participants listed in the ABIDE database, which is the most widely studied ASD open dataset, were found to have an IQ of ≤ 85 (Di Martino et al., 2014). Despite the clinical significance of this subpopulation, the LFA preschooler group has been neglected in neuroimaging studies owing to difficulties associated with cooperation during long MRI scans. Only 5 of 59 ASD-related DTI studies highlighted in a review by Ameis and Catani included ASD samples with a mean age of < 5 years (Andrews et al., 2019). Only five voxel-based morphometry (VBM) studies considered each individual’s cognitive function by separating the whole sample into low-functioning autism and high-functioning autism groups (Cai et al., 2018).

The purpose of this study was twofold: (1) to classify LFA preschoolers and age- and sex-matched controls by applying machine learning methods to MRI data and (2) to examine whether multimodal data from both T1w MRI and DTI data is more effective and accurate than unimodal MRI data. We hypothesized that classification accuracy of LFA preschoolers could be achieved using MRI data, and the accuracy would be higher when data from both modalities are used than when those from a single modality.

Methods

Participants

A total of 58 individuals with ASD and 48 typically developing controls (TDCs) aged between 3 and 6 years were enrolled between October 2015 and September 2019. Patients with ASD visited the Child and Adolescent Psychiatry outpatient clinic at the Seoul National University Hospital (SNUH). For The TDC group, participants were recruited from the Seoul National University Hospital clinic and Hanyang University Medical Center (HYUMC) Pediatrics Department. From the SNUH clinic, 72 patients with ASD and 24 TDCs were recruited, among whom six patients with ASD and one TDC were excluded owing to poor image quality, and eight patients with ASD were excluded owing to missing IQ data. Among the 29 TDCs from HYUMC, four were excluded owing to poor image quality. The diagnosis of ASD was confirmed according to the criteria of the Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition, by board-certified child and adolescent psychiatrists, and presence of comorbidities was verified using the Kiddie Schedule for Affective Disorders and Schizophrenia—Present and Lifetime version (K-SADS-PL Kaufman et al., 1997; Kim et al., 2004)). We also applied the Autism Diagnostic Observation Schedule (ADOS) to confirm ASD diagnosis. We only included participants with an IQ of < 70 for the analyses, as this was a definition of LFA used in several previous studies (Cai et al., 2018; Erbetta et al., 2015; Reiter et al., 2018). We also defined a subgroup of LFA with ADOS total scores of 14–23 for the sensitivity analyses, as a previous study identified a LFA group of ADOS score of this range (Lord et al., 2000). The exclusion criteria for ASD were as follows: a hereditary genetic disorder; current or past history of brain trauma, organic brain disorder, seizure, or any neurological disorder; schizophrenia or any other childhood-onset psychotic disorder; major depressive disorder or bipolar disorder; Tourette’s syndrome or a chronic motor/vocal tic disorder; obsessive–compulsive disorder; and/or a history of antipsychotic or antidepressant treatment lasting for > 1 year or within the 4-week period before the initiation of the study. The TDC group included typically developing children with no signs of developmental delay.

Written informed consent was obtained from all parents/guardians, and the children provided verbal assent to participate after receiving an explanation of the study prior to enrollment. All study protocols were approved by the Institutional Review Board of SNUH and HYUMC.

Clinical Assessment

The ADOS is a semi-structured assessment of communication, social interaction, play and stereotyped behaviors, and restricted interests. It is currently considered the gold standard for diagnosing ASD (Lord et al., 2000). The examiner administered one of four modules according to the participant’s language level and age (ranging from nonverbal young children through verbally fluent adults). The IQ of participants was assessed using the Korean Educational Developmental Institute’s Wechsler Intelligence Scale for Children (Park et al., 1996) or the Korean Leiter International Performance scale—Revised (for nonverbal participants) (Shin & Cho, 2010).

MRI Acquisition

Neuroimaging data for psychiatric disorders were collected from two sites because of the limited capacity of a single site. We acquired both T1w MRI and DTI data using two whole-body 3 T magnetic resonance systems (Siemens Magnetom Trio Tim Syngo, Germany; Philips Achieve, Best, Netherlands). Due to the difficulty in obtaining MRI data in young ASD patients, we sedated all participants during the MRI scans, with the approval of the IRB. To reduce discomfort from loud noises, earplugs, headphones, or both were applied. The sedation and details of the MRI procedure were explained to the parents, and parents were informed to skip nap-time on the day of the procedure. Sedation was administered after an appropriate interval of fasting before sedation. Chloral hydrate was selected (dosage: 50 mg/kg; maximum dose: 1 g) due to its low complication rates and high efficacy when used in accordance with the guidelines for children’s sedation as published by the American Academy of Pediatrics (Committee on Drugs. American Academy of, 2002; Vade et al., 1995). All children were carefully monitored using pulse oximetry during the scans. The imaging session was interrupted if the child moved or woke up during scanning. After the imaging procedure was completed, discharge was permitted only when the participants fully recovered from the sedation effects.

At the SNUH, a three-dimensional (3D) T1w magnetization-prepared rapid acquisition gradient-echo (MPRAGE) scan was performed using a 3 T Siemens system with an 8-channel head coil. At the HYUMC, the T1w MRI was acquired using a 3 T Philips system equipped with a 16-channel head coil. The image parameters were as follows: 1) for the Siemens system, repetition time (TR) = 1900 ms, time to echo (TE) = 3.07 ms, inversion time (TI) = 900 ms, flip angle = 9°, voxel size = 0.9 × 0.9 × 0.9 mm3, field of view (FOV) = 230 mm2, slice thickness = 0.9 mm, and slice numbers = 176; 2); for the Philips system, TR = 8.3 ms, TE = 4.6 ms, TI = 1 ms, flip angle = 8°, voxel size = 0.9 × 0.9 mm2, FOV = 224 mm2, slice thickness = 1, and slice numbers = 150. Subsequently, 3D-DTI images with a single-shot echo-planar imaging (EPI) sequence were acquired using the follows parameters: 1) for the Siemens system, TR = 10,000 ms, TE = 88 ms, flip angle = 0°, voxel size = 1.9 mm2, field of view (FOV) = 240 mm2, slice thickness = 3.5 mm, slice numbers = 50, b-values = 0 and 700 s/mm2, and direction = 30; 2); for the Philips system, TR = 8192 ms, TE = 76 ms, flip angle = 90°, voxel size = 2.0 mm2, FOV = 224 mm2, slice thickness = 2 mm, slice numbers = 74, b-values = 0 and 800 s/mm2, and direction = 30.

Multiple Feature Extraction with Multimodal MRI Data

We extracted multiple features from both T1w MRI and DTI data to fully capture the structural characteristics of the brain of patients with ASD. The features are summarized in Table S1.

Cortical Features Based on Surface Modeling

Cortical features derived from structural T1w MRI have been adopted to distinguish individuals with ASD from normal controls (Li et al., 2017; Pagnozzi et al., 2018). Based on 3D cortical surface modeling, we estimated a variety of cortical features including cortical thickness, surface area, cortical volume, mean curvature, and gyrification index. T1w MRI was processed in several steps for bias field correction (Sled et al., 1998), brain extraction (Smith, 2002), spatial normalization (Collins, Neelin, Peters, & Evans, 1994), brain tissue segmentation (Zijdenbos et al., 2002) and surface construction for each hemisphere (MacDonald et al., 2000; Robbins et al., 2004). Two cortical surfaces, the inner and outer surface, are the boundary parameterized polygonal 3D meshes of gray/white matter and cerebrospinal fluid/gray matter tissues. After surface reconstruction, the cortical thickness was calculated as the Euclidean distance between the inner and outer cortical surfaces of linked vertices. The surface area was computed by the Voronoi area assigned to each vertex, and the area making up the surface model, which is considered as the overall degree of folding, was summed. The cortical volume was measured as the amount of gray matter volume multiplied by the cortical thickness and surface area at the vertex of the surface. The mean curvature has been considered to reflect the local complexity of cortical folding and measured at the vertices that lay within the sulcal regions on the smoothed surface. We adopted multiple mean values within given region of interests (ROIs) from 78 pre-defined cortical regions, defined by the well-received automated anatomical labeling (AAL) atlas template (Tzourio-Mazoyer et al., 2002), to obtain representative values that was can be used as input variables in the ASD classifier. The gyrification index originally proposed by Zilles et al. (1988) was calculated as the 3D surface-area ratio between the outer hull surface and the cortical outer surface for each hemisphere, which means that a higher gyrification index shows a greater cortical folding. These procedures were performed using the CIVET software version 2.1.0 (Montreal Neurological Institute, McGill University; https://github.com/aces/CIVET), and technical details are described in a previous study (Im et al., 2006, 2008; Yang et al., 2013).

Multiple Brain Structures Based on Volume

Previous studies have attempted to examine the variations in the brain anatomy of patients with ASD, relative to that of normal controls, using a variety of brain structures (Li et al., 2017; Pagnozzi et al., 2018). We focused on gray matter, white matter, and cerebrospinal brain volume, cerebellar volume, corpus callosum area, and subcortical gray matter volumes showing that these structural changes have been reported in many previous studies. Gray matter, white matter, and cerebrospinal fluid volume were calculated by measuring the volume of voxels within a segment of brain volume. The sum of these volumes represents the brain size as an intracranial volume (Zijdenbos et al., 2002). Cerebellar volume was calculated by measuring the segmented cerebellar volume, which was carried out using an automated procedure that sequentially combines cerebellar tissue classification, a template-based approach, and morphological operations (Lee et al., 2015). The area of the corpus callosum was calculated by segmentation of its structure in the midsagittal plane based on a Bayesian inference using sparse representation and multi-atlas voting (Park et al., 2018). Fifteen volumes of subcortical gray matter structures were automatically segmented using the FIRST (FMRIB's Integrated Registration and Segmentation Tool; http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FIRST) software version 6.0.1, which is a part of the FSL (FMRIB Software Library, Oxford Centre for Functional magnetic resonance imaging (MRI) of the Brain, Oxford, UK; http://www.fmrib.ox.ac.uk/fsl). It involves the accumbens, amygdala, caudate, hippocampus, pallidum, putamen, and thalamus for both hemispheres and brain stem (Patenaude et al., 2011).

Multi-regional Diffusion Measures

We estimated diffusion parameters such as FA and mean diffusivity (MD) from the white-matter ROIs using the Johns Hopkins University ICBM-DTI-81 White-Matter Labels Atlas (Oishi et al., 2008), which was provided in the FSL toolbox (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Atlases). DTI data were preprocessed by motion correction, eddy current distortion, re-alignment of all scans to the b0 images, and estimation of diffusion tensor matrices by fitting a tensor model using the diffusion toolbox (FDT; https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FDT) of the FSL package. The preprocessed volumes were aligned onto the standard template space using affine and nonlinear registration (https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FLIRT; https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/FNIRT) implemented in the FSL package and then averaged within an ROI from the white-matter atlas template. Ninety-six FA and MD values of the averaged white-matter ROIs were obtained in the processing steps.

Structural Network Features Based on Graph Theory

Graph theory analysis was conducted by generating a weighted and undirected structural brain network, which was composed of nodes and edges, to evaluate abnormalities in the brain organization of patients with ASD, relative to that of controls, using the Brain Connectivity Toolbox (https://sites.google.com/site/bctnet/) (Rubinov & Sporns, 2010) for network parameters. It can be represented by a symmetric 90 × 90 matrix, the columns or rows of the matrix indicating nodes and the elements of the matrix indicating edges connected between a pair of nodes. This approach summarizes complex brain topology down to a single number (referred as a global network parameter) or a node unit with assigned edge properties (referred as a local network parameter) and is thus attractive because of its simplicity. We calculated global network parameters including the mean degree, mean strength, clustering coefficient, characteristic path length, global efficiency, normalized clustering coefficient, normalized characteristic path length, and smallworldness, which were commonly used in previous studies for brain network analysis (Bullmore & Sporns, 2009; Li et al., 2018; Qin et al., 2018). Normalized parameters were scaled against the mean value of graphical parameters obtained from 100 matched random graphs that preserve the same number of nodes, edges, and degree sequence (Maslov & Sneppen, 2002). Additionally, three types of local network parameters were estimated to reflect particular connections relevant to ASD, which is presumed from short-range overconnectivity or long-range underconnectivity, including shortest path length, nodal efficiency (Bullmore & Sporns, 2009; Qin et al., 2018), and maximum flow (Yoo et al., 2015). We applied the procedures described in a previous study before analyzing network parameters (Qin et al., 2018). A brief summary follows. The nodes were defined according to the AAL atlas template (Tzourio-Mazoyer et al., 2002) mentioned above, which is consisted of 90 brain regions including 78 of parcellated cortical regions and 12 subcortical regions. Network edges were assigned by mean FA averaging on structurally connected fiber tracts between a pair of nodes using whole-brain fiber tracts. Whole-brain fiber tracts were constructed by selecting all white-matter voxels as a seeding region in diffusion native space using the deterministic fiber tracking algorithm (Yeh et al., 2013) with DSI Studio (http://dsi-studio.labsolver.org).

Harmonization of Multimodal Imaging Features

Harmonization was applied to combine MRI data obtained using different scanners and acquisition protocols from the SNUH and HYUMC using ComBat harmonization (https://github.com/Jfortin1/ComBatHarmonization). The ComBat harmonization is a well-established technique, which uses a multivariate linear mixed effects regression with empirical Bayes, and has been shown to improve the estimation of the model parameters in studies involving small samples, and to remove unwanted inter-site scanner variation while preserving inter-site biological variability (Fortin et al., 2017, 2018).

Based on the literature, we collected multiple features according to their homogeneity characteristics derived from cortical and volume parameters, diffusion parameters, and network parameters as a feature map and presented them in a matrix, in which rows and columns represented the imaging features and subjects, respectively. As we presumed that the magnitude of effects of the site is not constant across the different feature metrics, three metrics were identified to examine the degree to which parameters were affected by scanner-to-scanner variation and how ComBat harmonization was performed in each setting. Finally, three of the harmonized matrices were concatenated and used as input to the classifier.

Machine Learning Classification

We performed classification of the ASD group and the TDC group using five different machine learning techniques, including support vector machine (SVM), logistic regression (LR), random forest (RF), AdaBoost, and multi-layer perceptron (MLP), which are machine learning classifiers that were frequently applied in many previous studies (Jiao et al., 2010; Libero et al., 2015; Xiao et al., 2017; Zhou et al., 2014). All of these classifiers were implemented using Python’s scikit learning library (https://github.com/scikit-learn/scikit-learn) (Abraham et al., 2014; Pedregosa et al., 2011), to compare the performance of the classifiers based on the features acquired from T1w MRI and DTI data. Five-fold cross-validation was used to avoid overfitting on the test set and to improve generalization. In other words, the training dataset was randomly divided into five equal-sized partitions, and then four partitions were used as the training set, and the holdout partition as the test set. This procedure was repeated five times. We optimized the model parameters for each classifier based on five-fold cross-validation. For each fold, while removing features one-by-one, the highest performance points for each classifier were measured and compared. The classification performance was evaluated by comparing accuracy, sensitivity, specificity, positive predictive value and negative predictive value. We repeated this analysis after confining the LFA group to those with ADOS scores between 14 and 23 (n = 41).

The recursive feature elimination (RFE) method was adopted to remove unnecessary features and to evaluate the performance of the classifiers simultaneously (Qureshi et al., 2016) because the number of observations in neuroimaging studies tends to be far lesser than the number of features leading to unreliable classifier performance (Chu et al., 2012). The RFE calculates the importance of features and removes them from non-critical features one-by-one. It records the classifier performance at every moment and shows the features with the best performance. Our strategy involved identification of important features through RFE, evaluation of classification performance with them, and comparison of the high rank features with statistically significant ones.

To calculate the importance of features identified using RFE, we used the RF method for training, as it was the classifier with the highest performance. A list of selected features at the peak performance point was also recorded. In addition, we set the rank with the features selected from each fold. Specifically, a high rank set was counted for features that appeared repeatedly in multiple folds.

An ablation study was performed by mutually removing the features of the T1w MRI and DTI, and then we compared classification performance of single modalities to that of combining features from both T1w and DTI. The three models were validated using the classification performance of the RF classifier, such as accuracy, sensitivity, and specificity, based on five-fold cross-validation.

Statistical Analysis

The characteristics of the ASD and TDC groups were compared using two-sample t-test (continuous variables) and chi-squared test (categorical variables).

We evaluated the scanner effects between the SNUH and HYUMC sites before and after applying harmonization in each feature collection of cortical and volume measures, diffusion measures, and network parameters using two-sample t-tests. We assumed that harmonized data would show no significant differences between the TDC groups of different sites if they were well corrected.

We also compared the extracted features of the T1w MRI and DTI images between the ASD and TDC groups using two-sample t-test. We compared the features that were found to be statistically different between the two groups with the high-ranking RF classification features. The level of statistical significance was set to false discovery rate (FDR)-corrected P-value level of 0.05.

Results

The characteristics of the participants are presented in Table 1. There were no significant differences in age or sex between the ASD and TDC groups. The mean ages of participants in the ASD and TDC group were 4.1 ± 1.0 years and 4.3 ± 1.1 years, respectively. The proportion of boys was 74% in the ASD group and 67% in the TDC group. The IQ ranges in the ASD and TDC groups were 30–86 and 72–142, respectively. None were taking psychotropic medication at the time of enrollment, and most patients underwent module 1 of ADOS, with the exception of 4 ASD children who were administered module 2. The ADOS total scores ranged from 9 to 22.

Table 1 Demographic and clinical characteristics of the ASD and TDC groups

Data Harmonization

To evaluate the performance of data harmonization, we compared the data distribution of three matrices for multiple features between the TDC groups of SNUH and HYUMC before and after performing harmonization, as shown in Figure S1-S3. The statistical results indicate that the differences that existed between the TDC groups of two different sites before harmonization disappeared after the harmonization.

Classifier Performance

We compared the performance results of five classifiers in Table 2: RF, SVM, LR, AdaBoost, and MLP. The RF classifier showed the best performance with an accuracy of 88.8%; its sensitivity, specificity, positive predictive value, negative predictive value, and area under curve (AUC) were 93.0%, 83.8%, 87.9%, 89.1%, and 0.86, respectively (Fig. S4). On the contrary, the LR classifier had the lowest accuracy (75.0%) and sensitivity (71.7%). The SVM classifier had the lowest specificity (78.7%). The AdaBoost and MLP showed fair results overall, with accuracies, sensitivities, and specificities exceeding 80%.

Table 2 Comparison of the classification performance

Table 3 shows the ranks of the features after the application of RFE and RF classification. Among these, the top 10 regions with the strongest feature importance were the thickness of the right inferior occipital gyrus, followed by the MD of the middle cerebellar peduncle and the nodal efficient values of the left posterior cingulate gyrus, which was followed by the cortical thickness of the right supplementary motor area and right temporal pole of the superior temporal gyrus, volume of the left cuneus, mean curvature of the left inferior temporal gyrus, MD of the right fornix, shortest path length, and maximum flow of the left posterior cingulate gyrus.

Table 3 Rank of feature importance determined by the random forest classifier

As a sensitivity analysis, we confined the LFA group to those with high ADOS scores of 14 -23 (n = 41). As a result, although there was a decrease in all parameters (possibly due to reduced sample size) when using the RF classifier, performance was fair with an accuracy of 78.7%, sensitivity of 73.3%, specificity of 83.8%, positive predictive value of 80.8%, and a negative predictive value of 78.9%. Among the most predictive features, the cortical thickness of the left superior temporal gyurs, right middle temporal gyrus, left middle temporal gyrus, and the left inferior temporal gyrus were found to be predictive in the main analyses and also the sensitivity analyses.

Univariate Comparison Between ASD and TDC

Only cortical thickness and cortical volume of group comparison survived FDR correction for multiple comparisons in the cuneus, temporal, and occipital ROIs, which were included in the list of the top-ranking features identified by the RF classifier. The ASD group showed larger values of cortical thickness, cortical volume, and surface area in multiple regions and smaller values of cortical thickness of the supplementary motor area and paracentral gyrus than did the TDC group (Table S2). The mean curvature and subcortical volume showed mixed results, as some regions showed larger values in the ASD group while others did in the TDC group.

The ASD group showed higher FA values and MD values in multiple regions than did the TDC group (Table S3). In the structural network parameters, the ASD group exhibited an increase in the shortest path length and a decrease in the maximum flow, while nodal efficiency showed increases and decreases in several regions.

We compared the important features obtained by the RF classifier and the significant features obtained by the independent t-test. The cortical thickness of the right inferior occipital gyrus area was significantly higher in the ASD group than in the TDC group. The MD of the cerebellar peduncle and nodal efficiency of the posterior cingulate gyrus decreased significantly in the ASD group.

Ablation Study

Our data consisted of features extracted from both T1w MRI and DTI. After removing the features calculated using data obtained from the other modalities, we evaluated whether the collective use of T1w MRI and DTI features could improve the classification performance. Table S4 shows the classification results of the RF classifier when T1w MRI features alone, DTI features alone, and a combination of T1w MRI and DTI features were used. The results obtained using T1w MRI features alone and DTI features alone were similar. The performance was enhanced when both T1w MRI and DTI features were used (accuracy, 88.8%). On the contrary, the results obtained using T1w MRI features alone showed the lowest accuracy (78.0%), this was slightly lower than when DTI features alone were used (78.7%).

Discussion

To our knowledge, this is the first study to apply machine learning methods to differentiate preschoolers with LFA and age-matched TDCs using both T1w and DTI parameters. As hypothesized, classification accuracy was enhanced when T1w MRI and DTI data were combined and the most prominent classification features included cortical thickness, MD, and nodal efficiency. These findings emphasize the potential of multimodal imaging for the identification of LFA preschoolers.

The top three features included cortical thickness of the right inferior occipital gyrus, MD of the middle cerebellar peduncle, and nodal efficiency of the left posterior cingulate gyrus. The inferior occipital gyrus is a part of the face-processing network, along with the fusiform gyrus and amygdala (Domes et al., 2013), and numerous studies have reported decreased activity in these areas in patients with ASD (Pierce et al., 2001, 2004). With regard to the middle cerebellar peduncle, previous studies have reported abnormal white-matter integrity in this area, as Shukla et al. revealed reduced connectivity (Shukla et al., 2010), while Sivaswamy et al. reported overconnectivity in the right middle cerebellar peduncle (Sivaswamy et al., 2010). The middle cerebellar peduncle consists of incoming fibers from various cortical areas related to not only sensory-motor control but also language, social cognition, and emotion (Crippa et al., 2016). The mean curvature of the left inferior temporal gyrus was found to be among the top-ranking classification features in our main analyses and the cortical thickness of this area was found to be predictive in the sensitivity analyses. These results are in line with a study that compared children with LFA and those with HFA using whole-brain VBM-DARTEL (Cai et al., 2018). An increase in gray matter volume of the left inferior temporal gyrus was found in both the LFA and HFA groups and remained significant even after including IQ as a covariate, which suggests that this area is linked to autism neurobiology regardless of IQ levels.

Previous studies have reported findings that suggest that characteristics of the brain structure and function differ between patients with LFA and HFA. Cai et al. (2018) found that while an increase in the gray matter volume of the left inferior temporal gyrus was found in both subjects with LFA and HFA, an increase in the gray matter volume of the left middle temporal gyrus was found only in the patients with LFA and not in those with HFA. Patients with LFA have demonstrated more distinct patterns of atypical connectivity than have those with HFA in both adults and children (Gabrielsen et al., 2018; Reiter et al., 2019). Most studies have included patients with HFA alone or both patients with LFA and those with HFA but have not investigated the effect of IQ; therefore, future studies focusing on patients with LFA or stratifying the population according to IQ level are necessary to discriminate the effect of ASD depending on the effect of low IQ on brain morphology.

A strength of this study is the narrow age range of participants, which is associated with a very minor effect of brain development among patients. Classification studies using T1w MRI and DTI have been conducted in the past, but only two studies using T1w MRI data targeted the preschool age (Calderoni et al., 2012; Gori et al., 2015), while all the studies that used DTI data targeted populations over the age of 8 years (Deshpande et al., 2013; Ingalhalikar et al., 2011; Payabvash et al., 2019). Many studies suggest that brain morphological features of ASD change across age, for example, a longitudinal study on patients with ASD aged 3–35 years found that the whole-brain volume increased in young children and subsequently decreased during adolescence (Lange et al., 2015). A recent study reported that cerebral growth trajectory from the age of 2 to 13 years differed according to sex, as cerebral growth was higher in TDC boys than in those with ASD throughout childhood, whereas cerebral volumes were similar between TDC girls and those with ASD but with showed slower trajectories (Lee et al., 2020). As the brain development is dynamic in patients with ASD, studies targeting different age groups may inevitably yield different results. Future longitudinal studies are warranted to capture the developmental trajectories that are critical for understanding within-person and between-person variation in development (Lange et al., 2015).

As mentioned previously, only one study combined parameters from both T1w MRI and DTI. (Libero et al., 2015) This study combined T1w MRI, DTI, and MRS data and achieved the highest accuracy of 91.9% by including the RD for the right forceps minor, FA for the left forceps minor, and cortical thickness for the pars opercularis aspect of the inferior frontal gyrus as the best predicting features (Libero et al., 2015). This study was limited by possible leakage as the data points included were the significant resulting values of the statistical analyses of separate neuroimaging modalities (Mateos-Perez et al., 2018). Moreover, the sample size was very small (19 patients with ASD and 18 TDCs) and targeted only adults with HFA. Although we implemented different features and classifiers, our study achieved a similar accuracy of 88.8%, and both studies clearly showed that a higher accuracy can be achieved by combining data from both T1w MRI and DTI rather than using data from a single modality. From the feature combining point of view, strengths of T1w MRI include volume and segmented regions, while those of DTI include neural tracks. It is difficult to combine them because each modality provides different characteristics. Once combined and analyzed, areas that are not covered by one modality can be supplemented by the other. From an implementation point of view, we used T1w MRI and an extraction method to extract significant features from DTI. In another study, dimension reduction was performed using principal component analysis (PCA) or independent component analysis (ICA) method using data on millions of voxels (Mateos-Perez et al., 2018). While voxel data with many sections with similar intensities can obtain effective results through PCA, our method has little effect because different features were extracted. Instead, by using the RFE method, only features that distinguish ASD from normal controls were obtained effectively. From a comparative point of view, however, as the two studies target different age ranges and IQ ranges, direct comparison is difficult. We suggest that future studies using these modalities with wider age ranges, larger sample sizes, and ASDs with different characteristics are warranted.

There were some notable limitations to our study. First, there was a significant difference in IQ between the ASD and TDC groups. Further studies should include an IQ-matched control group without an ASD diagnosis. Second, as we included only preschoolers and child participants, adolescent and adult individuals with ASD were excluded from the results. The narrow age range is advantageous as we were able to reduce the effect of age development, but generalizability to other age groups of ASD is limited. Although we used an age-matched control, rapid neuroanatomical changes occur during the 3- 6 year old period (Gilmore et al., 2018), and age-related variances could have affected our results. Adjusting variables of interest for within-group variance in age may be warranted in the future (Turesky et al., 2021). In addition, the majority of our patients were boys, but the proportion of girls was larger than that in many studies. Moreover, the sample size was relatively small to achieve sufficient statistical power, so further studies with larger sample sizes and a larger proportion of girls are needed. We used two different tools to measure IQ, and the IQ score in ASD preschoolers may not be stable in older ages. Several studies have found cognitive score gains in ASD preschoolers, as high as 23-points, suggesting that LFA preschoolers may not meet LFA criteria at different ages (Dietz et al., 2007; Flanagan et al., 2015; Turner et al., 2006). We only used the IQ score as an indicator of LFA, but IQ may be an imprecise proxy for functional abilities in ASD (Tamm, Day & Duncan et al., 2021). Other tools like the Vineland Adaptive Behavior Scale could be implemented in the future. Data acquisition in preschoolers with ASD can be very challenging due to head motioned caused by difficulty in staying still in an unfamiliar and noisy environment (Tziraki et al., 2021). Most multicenter MRI studies in ASD, like the EU-AIMS Longitudinal European Autism Project (LEAP) and Autism Brain Imaging Exchange (ABIDE) (Di Martino et al., 2014; Isaksson et al., 2018), usually target older age populations, with the exception of Enhancing NeuroImaging Genetics Through Meta-Analysis (ENIGMA), and the possibility of conducting large-scale multimodal MRI study in this population in the near future is limited. However, recent studies have focused on developing protocols for acquiring MRI scans in ASD children (Gabrielsen et al., 2018; Nordahl et al., 2016; Tziraki et al., 2021), and establishment of a well-designed protocol could increase the chances for a confirmation study of our results. Also, the MRI data were obtained from two sites using different brain imaging protocols. Although we applied robust harmonization methods, scanner effects may have biased our results. We used the AAL template and the Johns Hopkins University ICBM-DTI-81 White-Matter Labels Atlas, which were derived from adult brain MRI, to identify the parcellated brain region. Many previous studies have used the brain template, but it could induce registration errors that make it difficult to guarantee the exact match of the anatomical boundaries of regions between the brains of children and adults. Finally, our study was performed in a controlled hospital setting, with a high percentage of ASD patients (54%), therefore results may not be generalizable to a community sample due to the imbalance in classification (less than 2% of ASD). External validation of the study results in a large-scale real-world setting are warranted to increase clinical utility (Ho et al., 2020).

This was the first study to generate classification models for LFA preschoolers using multiple machine learning techniques. Implementation of both T1w MRI and DTI data yielded better results than did the use a single imaging modality. Further multimodal classification studies with larger sample sizes, wider age ranges, and patients with various levels of functioning are needed to expand these results.