Genomics, Intellectual Disability, and Autism

Genomics, Intellectual Disability, and Autism
Intellectual disability, which is characterized by significant limitations in both intellectual functioning and adaptive behavior that begin before the age of 18 years,1 affects 1.5 to 2% of the population in Western countries.2 A diagnosis of intellectual disability is usually made when IQ testing reveals an IQ of less than 70, which means that often the diagnosis is not made until late childhood or early adulthood. However, most persons with intellectual disability are identified early in childhood on the basis of concern about developmental delays, which may include motor, cognitive, and speech delays. A genetic underpinning of this disorder has long been recognized in a subset of cases, with trisomy 21 (Down's syndrome) detectable by chromosomal studies since 1959.3 Trisomy 21 remains the most important chromosomal cause of intellectual disability. Single-gene causes have also been identified for a number of intellectual disability syndromes and include both autosomal and X-linked genes, with the fragile X syndrome being the most common of inherited syndromes caused by a single-gene defect leading to this phenotype in male patients.
Autism spectrum disorders have been estimated to affect as many as 1 in 100 to 1 in 150 children.4,5 Disorders on the autism spectrum share features of impaired social relationships, impaired language and communication, and repetitive behaviors or a narrow range of interests. Many children with autism spectrum disorders also have intellectual disability, and approximately 75% have lifelong disability requiring substantial social and educational support. Thus, autism and intellectual disability together represent an important health burden in the population and are frequent reasons for referral to genetics and developmental pediatrics clinics for a diagnostic workup.
During the past decade, advances in genetic research have enabled genomewide discovery of chromosomal copy-number changes and single-nucleotide changes in patients with intellectual disability and autism as well as in those with other disorders. These technological advances — which include array comparative genomic hybridization (CGH), single-nucleotide-polymorphism (SNP) genotyping arrays, and massively parallel sequencing — have transformed the approach to the identification of etiologic genes and genomic rearrangements in the research laboratory and are now being applied in the clinical diagnostic arena. Here we review these techniques and how they have enabled the rapid discovery of chromosomal and single-gene causes of intellectual disability and autism.
COPY-NUMBER CHANGES
Deletions and Duplications
A copy-number change is defined as a deletion or duplication of a stretch of DNA as compared with the reference human genome. Copy-number changes may range in size from a kilobase (kb) to several megabases (Mb) or even an entire chromosome (trisomies and monosomies) and can involve one or more genes. Deletions may be heterozygous, in which one of the usual two copies is missing; homozygous, in which both copies are missing; or hemizygous (e.g., X-chromosome deletions in a male patient).
Duplications often result in three copies, as compared with the usual two copies, although some regions of the genome are present in more than three copies and the range of observed copy numbers is much greater. Multiple studies of large control cohorts have shown that some regions of the genome are tolerant of copy-number changes and that every person carries many copy-number changes that are, for the most part, benign.6-10 Two individual genomes may differ by several megabases of DNA content because of copy-number changes. In this article, we focus on copy-number changes that underlie intellectual disability and autism and are generally not found in control cohorts.
Changes in chromosomal copy number were first recognized as a cause of intellectual disability in 1959, when it was discovered that an extra copy of chromosome 21 is the cause of Down's syndrome.3 Steady advances in chromosome-banding techniques (see the Glossary) facilitated the detection of unbalanced rearrangements, including translocations, large deletions or duplications, and supernumerary marker chromosomes. The minimum size of disrupted chromosome that can be detected by chromosome banding is approximately 5 to 10 Mb, and such cytogenetically visible rearrangements are responsible for 10 to 15% of cases of intellectual disability.11 It was soon recognized that some patients with syndromic forms of intellectual disability also had deletions in the same chromosomal region, a finding that resolved the molecular cause of microdeletion syndromes, including the Prader–Willi and Angelman syndromes (deletion of 15q11-q13),12 the Williams–Beuren syndrome (deletion of 7q11.23),13 and the Smith–Magenis syndrome (deletion of 17p12).14 It was also noted that 1 to 3% of patients with autism had a maternally inherited duplication involving 15q11-q13.15
Fluorescence in situ hybridization (FISH), which was developed in the 1980s, represented an important advance in the reliable detection of smaller chromosome rearrangements and allowed physicians to rapidly confirm the diagnosis of a suspected microdeletion or microduplication syndrome in a patient. Another assay that FISH permitted was the investigation of subtelomeric deletions and duplications, which were found to cause 2.5 to 5% of previously unexplained intellectual disability.16-18
The more recent introduction of genomewide techniques to identify submicroscopic copy-number changes has revolutionized both the approach used in the laboratory to identify chromosome abnormalities that are responsible for intellectual disability and the diagnostic approach used in the clinic for patients with developmental delays or intellectual disability. The two techniques that are routinely used for discovery of copy-number changes are array CGH and SNP genotyping arrays, collectively referred to as chromosome microarrays (see Chromosome Microarrays). Since their introduction, these techniques have been applied to large case series of patients with intellectual disability or developmental delays.19-24 Numerous studies have also investigated the role of rare copy-number changes in autism.25-30 Identification of specific copy-number changes in affected patients as compared with control subjects has led to a rapid increase in the discovery of novel microdeletion and microduplication syndromes associated with intellectual disability and autism.31 Many of these syndromes are listed in Table 1TABLE 1
Novel Recurrent Copy-Number Changes Associated with Intellectual Disability and Related Disorders.
 and several are discussed below.
Role in Intellectual Disability Syndromes
Several novel microdeletions have been identified in patients who have a similar clinical picture. Heterozygous deletions of 17q21.31, which were described by three groups simultaneously,20,23,24 are associated with moderate-to-severe intellectual disability, hypotonia, facial dysmorphic features, occasional cardiac and renal abnormalities, and seizures. The deletion is 500 to 650 kb in size and is not detectable by routine karyotyping. All 17q21.31 deletions that have been identified are de novo, and the deletion has never been seen in healthy control subjects. Its prevalence is estimated to be approximately 1 in 16,000 persons.75 Deletions of 15q24 are much rarer, but patients with 15q24 microdeletions also have an intellectual disability syndrome with recognizable features.55-57,76,77 Common features include developmental delay and intellectual disability that is usually moderate to severe; prolonged speech delay or the absence of speech; dysmorphic features, including a high anterior hairline, prominent forehead, and downslanting palpebral fissures; joint laxity; and hypotonia. Many patients also have some features of autism spectrum disorders. The 15q24 deletions that have bee
n described vary with respect to breakpoints and size, but most include the 1.1-Mb region that is thought to be critical for the phenotype.
Variable Phenotypes
In contrast to the syndromic microdeletions described above, several recurrent microdeletions and duplications have been associated with a wide range of phenotypic features and severity. Deletions of 1q21.1 have been associated with variable degrees of intellectual disability, and some patients have one or more congenital anomalies, including cataracts and congenital heart disease.32,33,78 The deletion is quite often inherited from one of the patient's parents, who may be only mildly affected or unaffected. Deletions of this region have also been associated with schizophrenia.34,35 Duplications in the same region are also associated with mild-to-moderate intellectual disability and autistic features in some patients.32,33 Although dysmorphic features have been reported in many patients, there is no characteristic constellation of features in the majority of patients. A study involving patients with congenital heart disease suggests an increased frequency of the 1q21.1 duplication in this population as well.36
Another example of a copy-number change with highly variable outcomes is the 16p11.2 deletion. Deletions of 16p11.2 were first identified in patients with autism29,79 and are present in up to 1% of those with autism spectrum disorders, but it is now clear that such deletions are also associated with intellectual disability without autistic features.59-62,80 Deletions of the same region are also associated with early-onset obesity in subjects with and those without developmental delays.63,64 The 16p11.2 deletion is associated with dysmorphic features, but like the 1q21.1 rearrangement, it is not associated with a recognizable constellation of clinical features.
Diagnostic Yield and Recommendations
Several large studies have addressed the overall importance of copy-number changes in the diagnostic workup for intellectual disability, autism, and developmental delays,21,22,81,82 and it is clear that the use of CGH has a higher diagnostic yield than the standard karyotype. The International Standards for Cytogenomic Arrays consortium81 reviewed 33 published studies involving 21,698 patients with developmental delays, congenital anomalies, or autism who were tested for copy-number variants with the use of a chromosome microarray. The diagnostic yield (i.e., the rate of a positive genetic diagnosis) was approximately 12% across the studies. Recently, Cooper and colleagues82 looked at data from 15,767 patients who had undergone array CGH analysis as part of the diagnostic workup. Overall, the authors concluded that about 14% of cases of developmental delay can be explained by a detectable copy-number variation; their study provides a genetic morbidity map of developmental delays resulting from copy-number variations. The current recommendation is to perform chromosome microarray analysis instead of standard karyotype analysis early in the diagnostic workup of children with developmental delays, congenital anomalies, intellectual disability, or autism (Figure 1FIGURE 1
A Diagnostic Algorithm for the Evaluation of a Patient with Intellectual Disability of Unknown Cause.
).81,83
THE GENETICS OF RELATED DISORDERS
Array CGH studies have also been applied to other disorders, many of which are related to and often coexist with intellectual disability and autism. Copy-number changes have been identified that are risk factors for schizophrenia,34,35 epilepsy,43,49,69,84 and attention deficit–hyperactivity disorder (ADHD).70,85,86 There is substantial overlap among the copy-number variations that have been identified in each of these disorders and in cases of intellectual disability and autism. For example, microdeletions of 15q13.3 have been associated with intellectual disability,50,51 autism,52-54 and schizophrenia34,35 and occur with increased frequency in patients with generalized epilepsy43,49,84,87 (Table 1).
Similarly, microdeletions of 1q21 are associated with autism, schizophrenia, and epilepsy and, most commonly, with intellectual disability. Deletions of 16p13.11 were first described in patients with autism and intellectual disability,44,71,88 but studies of epilepsy have shown that the frequency of this deletion is also significantly increased in patients with both generalized and focal forms of epilepsy.43,69,84 Duplications of 16p13.11 have also been associated with an increased risk of a range of neuropsychiatric disorders, including intellectual disability, autism, ADHD, and perhaps schizophrenia.44,71,72,86,89 The range of conditions that have been associated with these and other copy-number changes highlights the fact that these disorders are related and that common genetic factors have a causal role. Therefore, it is likely that etiologic sequence changes will be identified in some of the genes and gene networks that have been implicated in these disorders as well.
SINGLE-GENE CAUSES OF INTELLECTUAL DISABILITY
The advent of family-based genetic linkage studies and DNA sequencing in the 1990s led to the identification of increasing numbers of single genes causing intellectual disability. Many of these studies have been focused on identifying genes on the X chromosome, in part because X-linked forms of intellectual disability can be transmitted through unaffected females in families, allowing pedigree analysis. The most well-known example is the fragile X syndrome, which is caused by dynamic triplet-repeat-expansion mutations in the gene FMR1 and is the most common genetic cause of intellectual disability. Clinical trials are under way to test new therapies for the fragile X syndrome on the basis of the known function of FMR1. Another important X-linked cause of syndromic intellectual disability is mutation in MECP2, encoding methyl-CpG–binding protein 2, in Rett's syndrome (affecting girls). In a recent study, Tarpey and colleagues90 sequenced the exons of 718 genes on the X chromosome in 208 families and identified 9 genes associated with X-linked intellectual disability. Their study, which used standard sequencing methods, provided a foreshadowing of the type of data that are now being generated with higher-throughput methods.
Mutations in more than 90 X-linked genes are now known to cause intellectual disability and account for about 10% of cases.91 Autosomal genes have been more difficult to identify, because there are few familial forms of intellectual disability. Many genetic syndromes for which the causative genes are known are characterized by variable intellectual disability. Some examples include neurofibromatosis, myotonic dystrophy, Duchenne's muscular dystrophy, Noonan-spectrum disorders, and tuberous sclerosis. Many autosomal recessive metabolic disorders are also associated with poor developmental outcomes. However, it is thought that the majority of cases of moderate-to-severe intellectual disability are due to de novo mutations, which cannot be detected by means of linkage mapping. Similarly, single-gene causes of autism have been identified. Most notably, mutations in PTEN are associated with autism and macrocephaly in some patients,92 and mutations in SHANK3 have also been identified.93 As described below, new sequencing approaches are facilitating gene discovery in this previously intractable form of inheritance.
MASSIVELY PARALLEL SEQUENCING
Use in Gene Discovery
Sanger sequencing was introduced in the 1970s94 and has been the mainstay of gene sequence analysis for nearly three decades. The technology is robust and reliable but subject to relatively low throughput. It was used to produce the first complete human genome sequence. In the past several years, the development of next-generation sequencing has revolutionized the field and is likely to deliver the so-called $1,000 genome (on the basis of the anticipated cost). The emerging techniques that are enabling whole-genome seque
ncing have been reviewed in the Journal 95 and elsewhere.96 Briefly, the method that is now widely used is referred to as massively parallel sequencing, which involves highly parallelized sequence analysis of millions of short DNA fragments from the genome.
Whereas sequence analysis of the first human genome required $3 billion and took more than 10 years, whole-genome sequencing with the use of massively parallel sequencing can be completed in a matter of weeks at a cost of $50,000 or less, and the cost is rapidly decreasing. However, sequencing an entire genome with the use of massively parallel sequencing remains a relatively expensive and time-consuming task, both for humans and for computers. A more tractable approach that is making rapid inroads into the practice of medicine is sequencing of the protein-coding parts of the genome, called exome sequencing. The exome refers to the exons, or coding units, of genes, which comprise approximately 30 million base pairs, or 1% of the entire genome. Exome sequencing is accomplished by selectively capturing the exons with the use of one of several array-based or solution-based methods that are now commercially available. The captured DNA is then sequenced by massively parallel sequencing, and SNPs are identified by comparison with the reference genome.
This approach is attractive for several reasons. First, the majority of disease-causing sequence mutations that have been identified occur in exons. Therefore, it is likely that sequence analysis of the exome will continue to be a successful approach to identifying novel disease genes. Second, it is easier to assign functional and therefore clinical significance to changes in coding sequences (exons) than to changes in noncoding DNA, the function of which is largely unknown. In addition, the human and computer requirements for sequencing and analyzing a patient's exome are currently much more tractable than those for an entire genome, with a cost of approximately $1,000. It must be acknowledged that noncoding mutations (i.e., those that occur in promoters, introns, or other nonexonic sequences) will certainly be found to be important for some disorders, and these mutations will not be detected by exome sequencing.
Several experimental approaches have been successfully used for disease identification by means of exome sequencing (Table 2TABLE 2
Studies Using Massively Parallel Sequencing to Identify Genes Associated with Intellectual Disability and Autism.
 and Figure 2FIGURE 2
Three Strategies for Exome Sequencing in Gene Discovery.
). The first approach involves sequencing in several unrelated affected subjects with the same phenotype. The sequence data are then analyzed to identify genes in which all or most affected subjects have a potentially deleterious sequence variant. This approach assumes that the phenotype in all (or most) of the subjects being analyzed is a result of mutations in the same gene. Therefore, this approach has been most successful in subjects with recognizable or fairly homogeneous disorders. The first proof-of-principle experiment was successful on the basis of studies in only four subjects with the Freeman–Sheldon syndrome (also known as the whistling-face syndrome and already known to be caused by mutations in MYH3).103 Subsequently, this strategy has been used to identify the causative gene for the Kabuki syndrome (intellectual disability, facial dysmorphisms, and congenital heart disease caused by de novo mutations in MLL2)97 and the Schinzel–Giedion syndrome (severe intellectual disability, facial dysmorphisms, and multiple congenital anomalies caused by de novo mutations in SETBP1).98
In both the Kabuki and Schinzel–Giedion syndromes, the mutation in the child was not seen in either of the parents, and the de novo occurrence of mutations in clinically similar children is strong evidence of causality. The analysis of trios (i.e., genes from the affected patient and his or her parents) has been a particularly successful approach in interpreting the large volumes of exome sequencing data (Figure 2B). This strategy is used when the patient is expected to have a de novo mutation that is unlikely to be found in either parent's exome. It is predicted that the average newborn will harbor no more than one de novo sequence change that alters an amino acid.104 Therefore, the sequencing of the exomes of an affected child and his or her unaffected parents seems to be an efficient method for identifying de novo disease-causing mutations.
Trio analysis is proving to be an effective means of identifying underlying genetic causes in nonsyndromic intellectual disability as well. Vissers and colleagues99 applied this strategy to 10 cases of nonsyndromic intellectual disability without a family history in order to identify de novo changes. In 6 cases, they identified 9 true de novo variants (in 9 different genes). Two patients each had a de novo mutation in a gene with a known association with intellectual disability. In 4 other cases, patients had a de novo variant in a plausible candidate gene. Although each of the candidate genes that were identified in this study requires further study to confirm its role in intellectual disability, the results indicate that trio analysis is an efficient method of detecting de novo mutations and novel candidate genes. O'Roak and colleagues102 used the trio approach to analyze the exome sequence in 20 children with autism and their unaffected parents. In 4 of the 20 children, the authors found arguably compelling de novo mutations in genes that are known to be involved in brain development (FOXP1, 105 GRIN2B, 106 SCN1A, 107 and LAMC3 108).
Exome sequencing has also been used to identify genes associated with recessive diseases (Figure 2C). The first examples were the diagnosis of congenital chloride diarrhea in a child suspected of having another disorder109 and the identification of the gene causing the Miller syndrome, a craniofacial disorder.110 Several studies have used massively parallel sequencing to investigate autosomal recessive intellectual disability. In a large consanguineous family with multiple affected children, Calişkan and colleagues101 sequenced the exomes of the parents to look for heterozygous deleterious mutations within a 2-Mb linkage region. They identified a mutation in TECR that was homozygous in all affected children. Recently, Najmabadi and colleagues100 investigated autosomal recessive intellectual disability in 136 consanguineous families. Because they had linkage data for the families that narrowed the genomic regions of interest, they captured the subset of exons within linkage regions for each family instead of sequencing the entire exome. They found mutations in 23 known intellectual-disability genes in 26 families, providing a definitive diagnosis. In the remaining families, they identified 50 novel candidate genes, each with a homozygous mutation in a single family. Clearly, these candidate genes need to be validated in additional samples, but the study provides a framework for evaluation of recessive forms of intellectual disability.
The value of exome sequencing in the identification of novel gene mutations has been endorsed by the National Institutes of Health, which announced in December 2011 that it will provide $48 million during the next 4 years to three centers for the sequencing of exomes and genomes of persons who have rare disorders with causes that are still unknown (http://mendelian.org).
Use in Clinical Diagnostics
Next-generation sequencing has already moved into clinical diagnostic laboratories. Several laboratories now offer gene panels in which a set of known disease genes (rather than the whole exome) is captured and subjected to massively parallel sequencing. This approach provides simultaneous evaluation of multiple genes rather than the current gene-by-gene analysis that is often required in the clinic. For example, it is now possible to order an X-linked intellectual-disability panel that i
ncludes 30, 60, or 90 genes. Exome sequencing is moving very quickly into the clinical arena and is now offered by at least two clinical laboratories at a cost of approximately $10,000 for data generation and interpretation of results.
Although clinical exomes are likely to yield answers in some cases, it will be important to proceed cautiously with careful selection of patients. The studies described above and listed in Table 2 represent the success stories. However, there are challenges in interpreting exome data, and in the studies published to date, not every case has been solved. Each individual exome harbors approximately 20,000 sequence variants as compared with the human reference genome, including some 5000 variants that will affect protein sequence and could be considered potentially deleterious. The variants can be further filtered to exclude those reported in SNP databases or in control exome studies. Once these criteria are applied, each person generally carries 100 to 200 heterozygous private sequence variants that are potentially deleterious, as well as several genes that have potentially damaging recessive mutations. Careful follow-up of individuals and families and studies in additional patients will be necessary to interpret the clinical significance of many of the variants identified by exome sequencing.
SUMMARY
Chromosome microarrays and next-generation sequencing have revolutionized gene discovery in intellectual disability, autism, and other disorders. Chromosome microarray analysis, which is recommended as a first-line test in the genetic workup of children with intellectual disability, developmental delays, autism, or congenital anomalies, provides a molecular diagnosis in 15 to 20% of cases. Exome sequencing has proved to be successful in the research laboratory and is moving rapidly into the diagnostic laboratory. As the data continue to accumulate, our understanding of genes, pathways, and molecular mechanisms will continue to evolve and translate into better diagnosis, prognosis, and therapies for these severe disorders.
CHROMOSOME MICROARRAYS
Array comparative genomic hybridization (CGH): Array CGH is a comparative assay in which DNA from the patient is fluorescently labeled with one fluorescent dye and DNA from a healthy control subject (reference DNA) is labeled with a second fluorescent dye. The samples are cohybridized to an array containing known DNA sequences called probes. The fluorescence intensity of each dye at each spot is measured. Differences in relative fluorescence intensities at a given spot on the array reflect differences in copy number between the genome of the patient and that of the reference DNA. The size of the copy-number change that can be identified by this method varies according to the number and spacing of probes on the array.
Single-nucleotide-polymorphism (SNP) genotyping array: A SNP is a site in the genome at which two different alleles are present in the general population, often referred to as the A allele and the B allele. SNP genotyping arrays are fluorescence-based assays in which the A allele is tagged with one fluorescent dye and the B allele is tagged with another. Analysis of SNP array data includes measurement of the total fluorescence intensity for a site and calculation of the ratio of the fluorescence intensities for the two dyes. At each site, most subjects will have one of three genotypes, or combinations of alleles: AA, AB, or BB. If there is a deletion, the total fluorescence intensity will be lower and the subject will have only one allele (e.g., A−) at all SNP sites within the deleted region. Duplications are represented by an increased total fluorescence intensity and altered ratio of alleles: AAA, AAB, ABB, or BBB. Because SNP arrays provide genotype information, they can also be used to identify large stretches of homozygosity in the genome, which can represent consanguinity or uniparental disomy, neither of which is detectable by means of array CGH.
Disclosure forms provided by the authors are available with the full text of this article at NEJM.org.