Publications

Publication type:
  • Peer-reviewed publicaitons
  • Book Chapters
  • Opinions
  • Abstracts
  • Posters
  • Popular Science

  • Peer-reviewed publicaitons

    2017

    39.

    Elhaik, E. 2017. Selected advances in genetics-cream of the crop. European Journal of Human Genetics.

    More...
    Abstract The lives of scientists can be almost as complicated as those of the organisms they study. One particular challenge relates to the old generalists versus specialists argument. The choice between the two depends on the scientist's personality and skills, and there is no right answer. Specialists may forever disapprove of the way generalists misunderstand science at the micro-level, whereas a generalist may criticize the way a specialist's work lacks adequate frame of reference, is too niche, and lacks applicability to other fields. For this reason, a series like The Annual Reviews provides a rare neutral platform where both generalists and specialists can find useful coverage of the knowledge accumulated in their fields
    Authors Elhaik E.
    Keywords Genetics, Genomics, Patents


    Ancient Rock art at Mt. Karkom, Deep Desert Israel-Day Tours (2017)

    2016

    38.

    Marshall, S., Das, R., Pirooznia, M., and Elhaik, E.. 2016. Reconstructing Druze population history. Scientific Reports.
    Almetrics score: top 2% most-read paper of all papers of similar age

    More...
    Abstract The Druze are an aggregate of communities in the Levant and Near East living almost exclusively in the mountains of Syria, Lebanon and Israel whose ~1000 year old religion formally opposes mixed marriages and conversions. Despite increasing interest in genetics of the population structure of the Druze, their population history remains unknown. We investigated the genetic relationships between Israeli Druze and both modern and ancient populations. We evaluated our findings in light of three hypotheses purporting to explain Druze history that posit Arabian, Persian or mixed Near Eastern-Levantine roots. The biogeographical analysis localised proto-Druze to the mountainous regions of southeastern Turkey, northern Iraq and southeast Syria and their descendants clustered along a trajectory between these two regions. The mixed Near Eastern-Middle Eastern localisation of the Druze, shown using both modern and ancient DNA data, is distinct from that of neighbouring Syrians, Palestinians and most of the Lebanese, who exhibit a high affinity to the Levant. Druze biogeographic affinity, migration patterns, time of emergence and genetic similarity to Near Eastern populations are highly suggestive of Armenian-Turkish ancestries for the proto-Druze.
    Authors Marshall, S., Das, R., Pirooznia, M., and Elhaik, E.
    Keywords Druze, genetic isolate, Geographic population structure (GPS), Archaeogenetics, Mountains, Ararat, Lake Van,
    37.

    Elhaik, E.. 2016. A "wear and tear" hypothesis to explain sudden infant death syndrome. Frontiers in Neurology.
    Almetrics score: #1 most read in the journal since publication

    More...
    Abstract Sudden infant death syndrome (SIDS) is the leading cause of death among USA infants under 1 year of age accounting for ~2,700 deaths per year. Although formally SIDS dates back at least 2,000 years and was even mentioned in the Hebrew Bible (Kings 3:19), its etiology remains unexplained prompting the CDC to initiate a sudden unexpected infant death case registry in 2010. Due to their total dependence, the ability of the infant to allostatically regulate stressors and stress responses shaped by genetic and environmental factors is severely constrained. We propose that SIDS is the result of cumulative painful, stressful, or traumatic exposures that begin in utero and tax neonatal regulatory systems incompatible with allostasis. We also identify several putative biochemical mechanisms involved in SIDS. We argue that the important characteristics of SIDS, namely male predominance (60:40), the significantly different SIDS rate among USA Hispanics (80% lower) compared to whites, 50% of cases occurring between 7.6 and 17.6 weeks after birth with only 10% after 24.7 weeks, and seasonal variation with most cases occurring during winter, are all associated with common environmental stressors, such as neonatal circumcision and seasonal illnesses. We predict that neonatal circumcision is associated with hypersensitivity to pain and decreased heart rate variability, which increase the risk for SIDS. We also predict that neonatal male circumcision will account for the SIDS gender bias and that groups that practice high male circumcision rates, such as USA whites, will have higher SIDS rates compared to groups with lower circumcision rates. SIDS rates will also be higher in USA states where Medicaid covers circumcision and lower among people that do not practice neonatal circumcision and/or cannot afford to pay for circumcision. We last predict that winter-born premature infants who are circumcised will be at higher risk of SIDS compared to infants who experienced fewer nociceptive exposures. All these predictions are testable experimentally using animal models or cohort studies in humans. Our hypothesis provides new insights into novel risk factors for SIDS that can reduce its risk by modifying current infant care practices to reduce nociceptive exposures.
    Authors Elhaik, E.
    Keywords sudden infant death syndrome (SIDS), allostatic load, neonatal circumcision, trauma, pain, stress
    36.

    Elhaik, E.. 2016. In search of the judische Typus: a proposed benchmark to test the genetic basis of Jewishness challenges notions of "Jewish biomarkers". Frontiers in Genetics.
    Almetrics score: #2 most read in the journal since publication

    More...
    Abstract The debate as to whether Jewishness is a biological trait inherent from an "authentic" "Jewish type" (judische Typus) ancestor or a system of beliefs has been raging for over two centuries. While the accumulated biological and anthropological evidence support the latter argument, recent genetic findings, bolstered by the direct-to- consumer genetic industry, purport to identify Jews or quantify one's Jewishness from genomic data. To test the merit of claims that Jews and non-Jews are genetically distinguishable, we propose a benchmark where genomic data of Jews and non-Jews are hybridized over few generations and the observed and predicted Jewishness of the terminal offspring according to either the Orthodox religious law (Halacha) or the Israeli Lafw of Return are compared. Members of academia, the public, and 23andMe were invited to use the benchmark to test claims that Jews are genetically distinct from non-Jews. Here, we report the findings from these trials. We also compare the genomic similarity of ~300 individuals from nearly thirty Afro-Eurasian Jewish communities to a simulated judische Typus population. The results are discussed in light of modern trends in the genetics of Jews and related fields and provide a tentative answer to the ageless question "who is a Jew?"
    Authors Elhaik, E.
    Keywords Urjudischer Typus, Jewish Urtypus, Jewishness, ancestry, Jews
    35.

    Morozova, I., Flegontov, P., Mikheyev, A.S., Asgharian, H., Ponomarenko, P., Klyuchnikov, V., ArunKumar, G., Bruskin, S., Prokhorchouk, E., Gankin, Y., Rogaev, E., Nikolsky, Y., Baranova, A., Elhaik, E. and T.V., Tatarinova. 2016. Toward high-resolution population genomics using archaeological samples. DNA Research.
    #1 most-read in DNA Research

    More...
    Abstract The term 'ancient DNA' (aDNA) is coming of age, with over 1,200 hits in the PubMed database, beginning in the early 1980s with the studies of 'molecular paleontology'. Rooted in cloning and limited sequencing of DNA from ancient remains during the pre-PCR era, the field has made incredible progress since the introduction of PCR and next-generation sequencing. Over the last decade, aDNA analysis ushered in a new era in genomics and became the method of choice for reconstructing the history of organisms, their biogeography, and migration routes, with applications in evolutionary biology, population genetics, archaeogenetics, paleoepidemiology, and many other areas. This change was brought by development of new strategies for coping with the challenges in studying aDNA due to damage and fragmentation, scarce samples, significant historical gaps, and limited applicability of population genetics methods. In this review, we describe the state-of-the-art achievements in aDNA studies, with particular focus on human evolution and demographic history. We present the current experimental and theoretical procedures for handling and analysing highly degraded aDNA. We also review the challenges in the rapidly growing field of ancient epigenomics. Advancement of aDNA tools and methods signifies a new era in population genetics and evolutionary medicine research.
    Authors Morozova, I., Flegontov, P., Mikheyev, A.S., Asgharian, H., Ponomarenko, P., Klyuchnikov, V., ArunKumar, G., Bruskin, S., Prokhorchouk, E., Gankin, Y., Rogaev, E., Nikolsky, Y., Baranova, A., Elhaik, E. and T.V., Tatarinova.
    Keywords ancient DNA, bioinformatics, epigenetics, population genetics, next-generation sequencing
    34.

    The MetaSUB International Consortium. 2016. The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium inaugural meeting report. Microbiome.
    Almetrics score: #3 most read in the journal since publication

    More...
    Abstract The Metagenomics and Metadesign of the Subways and Urban Biomes (MetaSUB) International Consortium is a novel, interdisciplinary initiative comprised of experts across many fields, including genomics, data analysis, engineering, public health, and architecture. The ultimate goal of the MetaSUB Consortium is to improve city utilization and planning through the detection, measurement, and design of metagenomics within urban environments. Although continual measures occur for temperature, air pressure, weather, and human activity, including longitudinal, cross-kingdom ecosystem dynamics can alter and improve the design of cities. The MetaSUB Consortium is aiding these efforts by developing and testing metagenomic methods and standards, including optimized methods for sample collection, DNA/RNA isolation, taxa characterization, and data visualization. The data produced by the consortium can aid city planners, public health officials, and architectural designers. In addition, the study will continue to lead to the discovery of new species, global maps of antimicrobial resistance (AMR) markers, and novel biosynthetic gene clusters (BGCs). Finally, we note that engineered metagenomic ecosystems can help enable more responsive, safer, and quantified cities.
    Authors Consortium Lead: Christopher E. Mason
    Executive Directors: Ebrahim Afshinnekoo and Sofia Ahsanuddin
    External Advisory Board (EAB): Elodie Ghedin, Timothy Read, Claire Fraser, Joel Dudley, Mark Hernandez, and Christopher Bowler
    MetaSUB City Principal Investigators: Ariel Chernomoretz and Gustavo Stolovitzky (Buenos Aires, Argentina), Pawel P Labaj & Alexandra B. Graf (Vienna, Austria), Aaron Darling and Catherine Burke (Sydney, Australia), Houtan Noushmehr (Ribeirao Preto, Brasil), Emmanuel Dias-Neto (Sao Paulo, Brazil), Yongli Guo (Beijing, China), Zhi Xie (Guangzhou, China), Patrick Lee (Hong Kong, China), Leming Shi (Shanghai, China), Carlos A. Ruiz-Perez and Maria Mercedes Zambrano (Bogota, Colombia), Rania Siam and Amged Ouf (Cairo, Egypt), Hugues Richard and Ingrid Lafontaine (Paris, France), Lothar H. Wieler and Torsten Semmler (Berlin, Germany), Niyaz Ahmed, Bharath Prithiviraj, and Narasimha Nedunuri (Hyderabad, India), Shaadi Mehr and Kambiz Banihashemi (Tehran, Iran), Florigio Lista and Anna Anselmo (Rome, Italy), Haruo Suzuki, Makoto Kuroda, Riu Yamashita, Yukoto Sato, Eli Kaminuma (Tokyo and Sendai Japan), Celia M. Alpuche Aranda and Jesus Martinez (Mexico City, Mexico), Christopher Dada (Auckland, New Zealand), Marius Dybwad (Oslo, Norway), Manuela Oliveira (Lisbon, Portugal and Porto, Portugal), Stephan Schuster (Singapore, Singapore), Geoffrey H. Siwo (Johannesburg, South Africa), Soojin Jang, Sung Chul Seo, and Sung Ho Hwang (Seoul, South Korea), Stephan Ossowski and Daniela Bezdan (Barcelona, Spain), Salama Chaker and Aspassia D. Chatziefthimiou (Doha, Qatar), Klas Udekwu and Per Liungdahl (Stockholm, Sweden), Ugur Sezerman and Cem Meydan (Izmir, Turkey), Eran Elhaik (Sheffeild, UK), Gaston Gonnet (Montevideo, Uruguay), Lynn M. Schriml and Emmanuel Mongodin (Baltimore, USA and Washington D.C., USA), Curtis Huttenhower (Boston, USA), Jack Gilbert (Chicago, USA), Christopher E. Mason (New York City, USA), Jonathan Eisen (Sacramento and San Francisco, USA), David Hirschberg (Seattle, USA), Mark Hernandez (Denver, USA)
    Inaugural MetaSUB International Meeting Speakers: Jack Gilbert*, Curtis Huttenhower*, Andrew Kasarskis*, Patrick Lee, Christopher E. Mason, Julia Maritz, Ellen Jorgensen, Scott Tighe, Russel Neches, Tom Livelli, Leming Shi, Houtan Noushmehr, Haruo Suzuki, Jesus Martinez Barnetche, Catherine Burke, Aaron Darling, Hugues Richard, Zhi Xie, Stephan Ossowski, Edoardo Pasolli, Nick Greenfield, Nur Hasan, Ebrahim Afshinnekoo, Mohamed Donia, John Brownstein, Linda Nozick, Harold Michels, Lynn Schriml, Catherine Brownstein, Jeanne Garbarino, Abby Lyons, and Jeff Zhu
    Keywords Microbiome, Biosynthetic gene clusters, Built environment, Next-generation sequencing, Antimicrobial resistance markers
    33.

    Das, R., Wexler, P., Pirooznia, M., and Elhaik, E. 2016. Localizing Ashkenazic Jews to primeval villages in the ancient Iranian lands of Ashkenaz. Genome Biology and Evolution.
    Almetrics score (99th percentile of all articles, 1st in GBE)

    More...
    Abstract The Yiddish language is over one thousand years old and incorporates German, Slavic, and Hebrew elements. The prevalent view claims Yiddish has a German origin, whereas the opposing view posits a Slavic origin with strong Iranian and weak Turkic substrata. One of the major difficulties in deciding between these hypotheses is the unknown geographical origin of Yiddish speaking Ashkenazic Jews (AJs). An analysis of 393 Ashkenazic, Iranian, and mountain Jews and over 600 non-Jewish genomes demonstrated that Greeks, Romans, Iranians, and Turks exhibit the highest genetic similarity with AJs. The Geographic Population Structure (GPS) analysis localized most AJs along major primeval trade routes in northeastern Turkey adjacent to primeval villages with names that may be derived from "Ashkenaz." Iranian and mountain Jews were localized along trade routes on the Turkey's eastern border. Loss of maternal haplogroups was evident in non-Yiddish speaking AJs. Our results suggest that AJs originated from a Slavo-Iranian confederation, which the Jews call "Ashkenazic" (i.e., "Scythian"), though these Jews probably spoke Persian and/or Ossete. This is compatible with linguistic evidence suggesting that Yiddish is a Slavic language created by Irano-Turko-Slavic Jewish merchants along the Silk Roads as a cryptic trade language, spoken only by its originators to gain an advantage in trade. Later, in the 9th century, Yiddish underwent relexification by adopting a new vocabulary that consists of a minority of German and Hebrew and a majority of newly coined Germanoid and Hebroid elements that replaced most of the original Eastern Slavic and Sorbian vocabularies, while keeping the original grammars intact.
    Authors Das, R., Wexler, P., Pirooznia, M., and Elhaik, E.
    Keywords Archaeogenetics; Yiddish; Ashkenazic Jews; Ashkenaz; Geographic population structure (GPS); Rhineland hypothesis; Citizen Science


    Departure of the Winged Ship, Vladimir Kush (2000)

    2015

    32.

    Elhaik, E. and Zandi P. 2015. Dysregulation of the NF-kB pathway as a potential inducer of bipolar disorder. Journal of Psychiatric Research.

    More...
    Abstract A century of investigations enhanced our understanding of bipolar disorder although it remains a complex multifactorial disorder with a mostly unknown pathophysiology and etiology. The role of the immune system in this disorder is one of the most controversial notions in genetic psychiatry. Though inflammation has been consistently reported in bipolar patients, it remains unclear how the immunologic process influence the disorder. One of the core components of the immune system is the NF-kB, a major transcription factor that plays an essential role in the development of innate and adaptive immunity. Remarkably, the NF-kB pathway received only little attention in bipolar studies, in contrast to studies of related psychiatric disorders where dysregulation has been proposed to explain the neurodegeneration in patient conditions. If this is also true for bipolar disorder, it will underscore the role of the immune system in the chronicity and pathophysiology of the disorder and may promote personalized therapeutic strategies. This is the first review to summarize the current knowledge of the pathophysiological functions of NF-kB in bipolar disorder.
    Authors Elhaik E. and Zandi P.
    Keywords Bipolar disorder; NF-kB; inflammation; cytokines; psychiatric disorders; autoimmunity
    31.

    Guo, H., Chamberlain, S., Elhaik, E., Jalli, E., Lynes, A.R., Marczak, L., Sabath, N., Vargas, A., Wieski, K., Zelig, E.M., and Pennings, S.C. 2015. Geographic variation in plant community structure of salt marshes: species, functional and phylogenetic perspectives. PloS One.

    More...
    Abstract In general, community similarity is thought to decay with distance; however, this view may be complicated by the relative roles of different ecological processes at different geographical scales, and by the compositional perspective (e.g. species, functional group and phylogenetic lineage) used. Coastal salt marshes are widely distributed worldwide, but no studies have explicitly examined variation in salt marsh plant community composition across geographical scales, and from species, functional and phylogenetic perspectives. Based on studies in other ecosystems, we hypothesized that, in coastal salt marshes, community turnover would be more rapid at local versus larger geographical scales; and that community turnover patterns would diverge among compositional perspectives, with a greater distance decay at the species level than at the functional or phylogenetic levels. We tested these hypotheses in salt marshes of two regions: The southern Atlantic and Gulf Coasts of the United States. We examined the characteristics of plant community composition at each salt marsh site, how community similarity decayed with distance within individual salt marshes versus among sites in each region, and how community similarity differed among regions, using species, functional and phylogenetic perspectives. We found that results from the three compositional perspectives generally showed similar patterns: there was strong variation in community composition within individual salt marsh sites across elevation; in contrast, community similarity decayed with distance four to five orders of magnitude more slowly across sites within each region. Overall, community dissimilarity of salt marshes was lowest on the southern Atlantic Coast, intermediate on the Gulf Coast, and highest between the two regions. Our results indicated that local gradients are relatively more important than regional processes in structuring coastal salt marsh communities. Our results also suggested that in ecosystems with low species diversity, functional and phylogenetic approaches may not provide additional insight over a species-based approach.
    Authors Hongyu Guo, Scott A. Chamberlain, Eran Elhaik, Inder Jalli, Alana-Rose Lynes, Laurie Marczak, Niv Sabath, Amy Vargas, Kazimierz Wieski, Emily M. Zelig, and Steven C. Pennings
    Keywords Bio diversity, community structure, distance-decay of community similarity, functional traits, biogeographic variation, phylogenetic relationship, salt marsh
    30.

    Elhaik, E., Tatarinova, T., Klyosov, A., and Graur, D. 2015. Reply to Mendez et al.: The 'extremely ancient' chromosome that still isn't. European Journal of Human Genetics.

    More...
    Abstract Earlier this year, we discovered that an extreme age estimate for a Y chromosomal haplotype (237,000-581,000 years ago) by Mendez et al was based on analytical choices that consistently inflated its value.
    Authors Eran Elhaik, Tatiana Tatarinova, Anatole Klyosov, Dan Graur.
    Keywords A00 haplotype, Y chromosome, Albert Perry, Y-chromosomal Adam, TMRCA


    Water Lily Pond, Claude Monet (1899)

    2014

    29.

    Elhaik, E. and Graur, D. 2014. A Comparative Study and a Phylogenetic Exploration of the Compositional Architectures of Mammalian Nuclear Genomes. PloS Computational Biology.

    More...
    Abstract For the past four decades the compositional organization of the mammalian genome posed a formidable challenge to molecular evolutionists attempting to explain it from an evolutionary perspective. Unfortunately, most of the explanations adhered to the "isochore theory," which has long been rebutted. Recently, an alternative compositional domain model was proposed depicting the human and cow genomes as composed mostly of short compositionally homogeneous and nonhomogeneous domains and a few long ones. We test the validity of this model through a rigorous sequence-based analysis of eleven completely sequenced mammalian and avian genomes. Seven attributes of compositional domains are used in the analyses: (1) the number of compositional domains, (2) compositional domain-length distribution, (3) density of compositional domains, (4) genome coverage by the different domain types, (5) degree of fit to a power-law distribution, (6) compositional domain GC content, and (7) the joint distribution of GC content and length of the different domain types. We discuss the evolution of these attributes in light of two competing phylogenetic hypotheses that differ from each other in the validity of clade Euarchontoglires. If valid, the murid genome compositional organization would be a derived state and exhibit a high similarity to that of other mammals. If invalid, the murid genome compositional organization would be closer to an ancestral state. We demonstrate that the compositional organization of the murid genome differs from those of primates and laurasiatherians, a phenomenon previously termed the "murid shift," and in many ways resembles the genome of opossum. We find no support to the "isochore theory." Instead, our findings depict the mammalian genome as a tapestry of mostly short homogeneous and nonhomogeneous domains and few long ones thus providing strong evidence in favor of the compositional domain model and seem to invalidate clade Euarchontoglires.
    Authors Eran Elhaik, Dan Graur.
    Keywords Isochores, murid shift, compositional domains, isochore theory, power-law distribution, Euarchontoglires, opossum
    28.

    Elhaik, E., Tatarinova, T., Chebotarev, D., Piras, I.S., Calo`, C.M., Montis, A., Atzori, M., Marini, M., Tofanelli, S., Francalacci, P., Pagani, L., Tyler-Smith, C., Xue, Y., Cucca, F., Schurr, T.G., Gaieski, J.B., Melendez, C., Vilar, M.G., Owings, A.C., Go'mez, R., Fujita, R., Santos, F.R., Comas, D., Balanovsky, O., Balanovska, E., Zalloua, P., Soodyall, H., Pitchappan, R., GaneshPrasad, A., Hammer, M., Matisoo-Smith, L., Wells S.R., and The Genographic Consortium. 2014. Geographic population structure analysis of worldwide human populations infers their biogeographical origins. Nature Communications.

    Almetrics score (99% percentile of all articles, ranked 8th in Nature communications).
    Science highlight: Genetic 'App' Tells You Where You're From.
    Nature Communications highlight: Genes give clues to where in the World you came from.
    Nature Middle East highlight: Genes can reveal where we come from.
    Nature India highlight: Software tools solve mysteries of human origins, crimes.
    Nature Asia Genetic: whether your ancestors came from somewhere.

    More...
    Abstract The search for a method that utilizes biological information to predict humans' place of origin has occupied scientists for millennia. Over the past four decades, scientists have employed genetic data in an effort to achieve this goal but with limited success. While biogeographical algorithms using next-generation sequencing data have achieved an accuracy of 700 km in Europe, they were inaccurate elsewhere. Here we describe the Geographic Population Structure (GPS) algorithm and demonstrate its accuracy with three data sets using 40,000-130,000 SNPs. GPS placed 83% of worldwide individuals in their country of origin. Applied to over 200 Sardinians villagers, GPS placed a quarter of them in their villages and most of the rest within 50km of their villages. GPS's accuracy and power to infer the biogeography of worldwide individuals down to their country or, in some cases, village, of origin, underscores the promise of admixture-based methods for biogeography and has ramifications for genetic ancestry testing
    Authors Eran Elhaik, Tatiana Tatarinova, Dmitri Chebotarev, Ignazio S. Piras, Carla Maria Calo`, Antonella De Montis, Manuela Atzori, Monica Marini, Sergio Tofanelli, Paolo Francalacci, Luca Pagani, Chris Tyler-Smith, Yali Xue, Francesco Cucca, Theodore G. Schurr, Jill B. Gaieski, Carlalynne Melendez, Miguel G. Vilar, Amanda C. Owings, Roci'o Go'mez, Ricardo Fujita, Fabri'cio R. Santos, David Comas, Oleg Balanovsky, Elena Balanovska, Pierre Zalloua, Himla Soodyall, Ramasamy Pitchappan, ArunKumar GaneshPrasad, Michael Hammer, Lisa Matisoo-Smith, Spencer R. Wells & The Genographic Consortium.
    Keywords GPS, Biogeography, village, island, SPA, PCA, Genochip
    27.

    Elsik, C.G., Worley, K.C., Bennett, A.K., Beye, M., Camara, F., Childers, C.P., Graaf, D., Debyser, G., Deng, J., Devreese, B., Elhaik, E., et al. 2014. Finding the missing honey bee genes: lessons learned from a genome upgrade. BMC Genomics.
    Almetrics score (95 percentile of all articles, 3rd in BMC Genomics)

    More...
    Abstract Background The first generation of genome sequence assemblies and annotations have had a significant impact upon our understanding of the biology of the sequenced species, the phylogenetic relationships among species, the study of populations within and across species, and have informed the biology of humans. As only a few Metazoan genomes are approaching finished quality (human, mouse, fly and worm), there is room for improvement of most genome assemblies. The honey bee (Apis mellifera) genome, published in 2006, was noted for its bimodal GC content distribution that affected the quality of the assembly in some regions and for fewer genes in the initial gene set (OGSv1.0) compared to what would be expected based on other sequenced insect genomes. Results Here, we report an improved honey bee genome assembly (Amel_4.5) with a new gene annotation set (OGSv3.2), and show that the honey bee genome contains a number of genes similar to that of other insect genomes, contrary to what was suggested in OGSv1.0. The new genome assembly is more contiguous and complete and the new gene set includes ~5000 more protein-coding genes, 50% more than previously reported. About 1/6 of the additional genes were due to improvements to the assembly, and the remaining were inferred based on new RNAseq and protein data. Conclusions Lessons learned from this genome upgrade have important implications for future genome sequencing projects. Furthermore, the improvements significantly enhance genomic resources for the honey bee, a key model for social behavior and essential to global ecology through pollination.
    Authors Christine G Elsik, Kim C Worley, Anna K Bennett, Martin Beye, Francisco Camara, Christopher P Childers, Dirk C de Graaf, Griet Debyser, Jixin Deng, Bart Devreese, Eran Elhaik, Jay D Evans, Leonard J Foster, Dan Graur, Roderic Guigo, Katharina Jasmin Hoff, Michael E Holder, Matthew E Hudson, Greg J Hunt, Huaiyang Jiang, Vandita Joshi, Radhika S Khetani, Peter Kosarev, Christie L Kovar, Jian Ma, Ryszard Maleszka, Robin F Moritz, Monica C Munoz-Torres, Terence D Murphy, Donna M Muzny et al.
    Keywords Bee, isochores, compositional domains, build 5
    26.

    Elhaik, E., Tatarinova, T., Klyosov, A., and Graur D. 2014. The 'extremely ancient' chromosome that isn't: a forensic bioinformatic investigation of Albert Perry's X-degenerate portion of the Y chromosome. European Journal of Human Genetics.
    Almetrics score (99% of all articles of similar age, 1st in EJHG)

    More...
    Abstract Mendez and colleagues reported the identification of a Y chromosome haplotype (the A00 lineage) that lies at the basal position of the Y chromosome phylogenetic tree. Incorporating this haplotype, the authors estimated the time to the most recent common ancestor (TMRCA) for the Y tree to be 338 000 years ago (95% CI:237,000-581,000). Such an extraordinarily early estimate contradicts all previous estimates in the literature and is over a 100 000 years older than the earliest fossils of anatomically modern humans. This estimate raises two astonishing possibilities, either the novel Y chromosome was inherited after ancestral humans interbred with another species, or anatomically modern Homo sapiens emerged earlier than previously estimated and quickly became subdivided into genetically differentiated subpopulations. We demonstrate that the TMRCA estimate was reached through inadequate statistical and analytical methods, each of which contributed to its inflation. We show that the authors ignored previously inferred Y-specific rates of substitution, incorrectly derived the Y-specific substitution rate from autosomal mutation rates, and compared unequal lengths of the novel Y chromosome with the previously recognized basal lineage. Our analysis indicates that the A00 lineage was derived from all the other lineages 208 300 (95% CI:163,900-260,200) years ago.
    Authors Eran Elhaik, Tatiana Tatarinova, Anatole Klyosov, Dan Graur.
    Keywords A00 haplotype, Y chromosome, Albert Perry, Y-chromosomal Adam, TMRCA, interbreeding
    25.

    Elhaik, E., Pellegrini, M., and Tatarinova, T. 2014. Gene expression and nucleotide composition are associated with genic methylation level in Oryza sativa. BMC Bioinformatics. Data.

    More...
    Abstract Background. The methylation of cytosines at CpG dinucleotides, which plays an important role in gene expression regulation, is one of the most studied epigenetic modifications. Thus far, the detection of DNA methylation has been determined mostly by experimental methods, which are not only prone to bench effects and artifacts but are also time-consuming, expensive, and cannot be easily scaled up to many samples. It is therefore useful to develop computational prediction methods for DNA methylation. Our previous studies highlighted the existence of correlations between the GC content of the third codon position (GC3), methylation, and gene expression. We thus designed a model to predict methylation in Oryza sativa based on genomic sequence features and gene expression data.
    Results. We first derive equations to describe the relationship between gene methylation levels, GC3, expression, length, and other gene compositional features. We next assess gene compositional features involving sixmers and their association with methylation levels and other gene level properties. By applying our sixmer-based approach on rice gene expression data we show that it can accurately predict methylation (Pearson's correlation coefficient r = 0.79) for the majority (79%) of the genes. Matlab code with our model is included.
    Conclusions. Gene expression variation can be used as predictors of gene methylation levels.
    Authors Elhaik, Pellegrini, and Tatarinova.
    Keywords DNA methylation, gene expression, GC3, rice, Oryza sativa


    The Tree Skulls, Paul Cezanne (1900)

    2013

    24.

    Elhaik, E., Tatarinova, T., and Pellegrini, M. 2013. Cross-species analysis of genic GC3 content and DNA methylation patterns. Genome Biology and Evolution.

    More...
    Abstract Background. The GC-content in the third codon position (GC3) exhibits a unimodal distribution in many plant and animal genomes. Interestingly, grasses and homeotherm vertebrates exhibit a unique bimodal distribution. High GC3 was previously found to be associated with variable expression, higher frequency of upstream TATA boxes, and an increase of GC3 from 5' to 3'. Moreover, GC3- rich genes are predominant in certain gene classes and are enriched in CpG dinucleotides that are potential targets for methylation. Based on the GC3 bimodal distribution we hypothesize that GC3 has a regulatory role involving methylation and gene expression. To test that hypothesis, we selected diverse taxa (rice, thale cress, bee, and human) that varied in the modality of their GC3 distribution and tested the association between GC3, DNA methylation and gene expression.
    Results. We examine the relationship between cytosine methylation levels and GC3, gene expression, genome signature, gene length, and other gene compositional features. We find a strong negative correlation (Pearson's correlation coefficient r=-0.67, p-value <0.0001) between GC3 and genic CpG methylation. The comparison between 5'-3' gradients of CG3-skew and genic methylation for the taxa in the study suggests interplay between gene-body methylation and transcription-coupled cytosine deamination effect.
    Conclusions. Compositional features are correlated with methylation levels of genes in rice, thale cress, human, bee and fruit fly (which acts as an unmethylated control). These patterns allow us to generate evolutionary hypotheses about the relationship between GC3 and methylation and how these affect expression patterns. Specifically, we propose that the opposite effects of methylation and compositional gradients along coding regions of GC3-poor and GC3-rich genes are the products of several competing processes.
    Authors Elhaik, Tatarinova, and Pellegrini.
    Keywords DNA methylation, gene expression, GC3, grasses, homeotherms, Oryza sativa, Apis mellifera, Homo sapiens, Arabidopsis thaliana
    23.

    Simola, D.F., Wissler, L., Donahue, G., Waterhouse, R.M., Helmkampf, M., Roux, J., Nygaard, S., Glastad, K., Hagen, D.E., Viljakainen, L., Reese, J.T., Hunt, B.G., Graur, D., Elhaik, E., Kriventseva, E., Wen, J., Parker, B.J., Cash, E., Privman, E., Childers, C.P., Munos-Torres, M.C., Boomsma, J.J., Bornberg-Bauer, E., Currie, C., Elsik, C.G., Suen, G., Goodisman, M.A., Keller, L., Liebig, J., Rawls, A., Reinberg, D., Smith, C.D., Smith, C.R., Tsutsui, N., Wurm, Y., Zdobnov, E.M., Berger, S.L., and Gadau, J. 2013. Social insect genomes exhibit dramatic evolution in gene composition and regulation while preserving regulatory features linked to sociality. Genome Research.

    More...
    Abstract Genomes of eusocial insects code for dramatic examples of phenotypic plasticity and social organization. We compared the genomes of seven ants, the honeybee, and various solitary insects to examine whether eusocial lineages share distinct features of genomic organization. Each ant lineage contains ~4,000 novel genes, but only 64 of these genes are conserved among all seven ants. Many gene families have been expanded in ants, notably those involved in chemical communication (e.g., desaturases and odorant receptors). Alignment of the ant genomes revealed reduced purifying selection compared to Drosophila without significantly reduced synteny. Correspondingly, ant genomes exhibit dramatic divergence of non-coding regulatory elements, however extant conserved regions are enriched for novel non-coding RNAs and transcription factor binding sites. Comparison of orthologous gene promoters between eusocial and solitary species revealed significant regulatory evolution in both cis (e.g., CREB) and trans (e.g., Forkhead) for nearly 2000 genes, many of which exhibit phenotypic plasticity. Our results emphasize that genomic changes can occur remarkably fast in ants, as two recently diverged leaf-cutter ant species exhibit faster accumulation of species-specific genes and greater divergence in regulatory elements compared to other ants or Drosophila. Thus, while the "sociogenomes" of ants and the honeybee are broadly characterized by a pervasive pattern of divergence in gene composition and regulation, they preserve lineage-specific regulatory features linked to eusociality. We propose that changes in gene regulation played a key role in the origins of insect eusociality, whereas changes in gene composition were more relevant for lineagespecific eusocial adaptations.
    Authors Simola, Wissler, Donahue, Waterhouse, Helmkampf, Roux, Nygaard, Glastad, Hagen, Viljakainen, Reese, Hunt, Graur, Elhaik, Kriventseva, Wen, Parker, Cash, Privman, Childers, Muñoz-Torres, Boomsma, Bornberg-Bauer, Currie, Elsik, Suen, Goodisman, Keller, Liebig, Rawls, Reinberg, Smith, Smith, Tsutsui, Wurm, Zdobnov, Berger, and Gadau.
    Keywords Ants, social, Isochores, compositional domains, IsoPlotter, compositional maps, insects, Djs
    22.

    Elhaik, E. and Graur D. 2013. IsoPlotter+: A Tool for Studying the Compositional. ISRN Bioinformatics.

    More...
    Abstract Eukaryotic genomes, particularly animal genomes, have a complex, nonuniform, and nonrandom internal compositional organization. The compositional organization of animal genomes can be described as a mosaic of discrete genomic regions, called “compositional domains,” each with a distinct GC content that significantly differs from those of its upstream and downstream neighboring domains. A typical animal genome consists of a mixture of compositionally homogeneous and nonhomogeneous domains of varying lengths and nucleotide compositions that are interspersed with one another. We have devised IsoPlotter, an unbiased segmentation algorithm for inferring the compositional organization of genomes. IsoPlotter has become an indispensable tool for describing genomic composition and has been used in the analysis of more than a dozen genomes. Applications include describing new genomes, correlating domain composition with gene composition and their density, studying the evolution of genomes, testing phylogenomic hypotheses, and detect regions of potential interbreeding between human and extinct hominines. To extend the use of IsoPlotter, we designed a completely automated pipeline, called IsoPlotter+ to carry out all segmentation analyses, including graphical display, and built a repository for compositional domain maps of all fully sequenced vertebrate and invertebrate genomes. The IsoPlotter+ pipeline and repository offer a comprehensive solution to the study of genome compositional architecture. Here, we demonstrate IsoPlotter+ by applying it to human and insect genomes. The computational tools and data repository are available online.
    Authors Elhaik E. and Graur D.
    Keywords IsoPlotter+, Isochores, compositional domains, human genome, Bernardi, IsoFinder, IsoPlotter, compositional maps, ants, insect genome, Djs
    21.

    Graur D., Zheng Y., Price N., Azevedo R.B.R., Zufall R.A., and Elhaik, E. 2013. On the immortality of television sets: "function" in the human genome according to the evolution-free gospel of ENCODE. Genome Biology and Evolution .
    Almetrics score (99 percentile, ranked #1 in GBE).

    More...
    Abstract A recent slew of ENCODE Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is under 10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80 - 10 = 70% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these "functional" regions, or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly (1) by employing the seldom used "causal role" definition of biological function and then applying it inconsistently to different biochemical properties, (2) by committing a logical fallacy known as "affirming the consequent," (3) by failing to appreciate the crucial difference between "junk DNA" and "garbage DNA," (4) by using analytical methods that yield biased errors and inflate estimates of functionality, (5) by favoring statistical sensitivity over specificity, and (6) by emphasizing statistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree, many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten.
    Authors Graur D., Zheng Y., Price N., Azevedo R.B.R., Zufall R.A., and Elhaik E.
    Keywords ENCODE, critique,80%, functionality, function
    20.

    Elhaik, E., Greenspan E., Staats S., Krahn T., Tyler-Smith C., Xue Y., Tofanelli S., Francalacci P., Cucca F., Pagani L., Jin L., Li H., Schurr T.G., Greenspan B., Wells R.S., and the Genographic Consortium. 2013. The GenoChip: A New Tool for Genetic Anthropology. Genome Biology and Evolution.
    Almetrics score (90 percentile, ranked #1 in GBE).

    More...
    Abstract The Genographic Project is an international effort aimed at charting human migratory history. The project is non-profit and non-medical, and, through its Legacy Fund, supports locally led efforts to preserve indigenous and traditional cultures. While the first phase of the project was focused on uniparentally-inherited markers on the Y-chromosome and mitochondrial DNA, the current phase focuses on markers from across the entire genome to obtain a more complete understanding of human genetic variation. Although many commercial arrays exist for genomewide SNP genotyping, they were designed for medical genetic studies and contain medically related markers that are inappropriate for global population genetic studies. GenoChip, the Genographic Project's new genotyping array, was designed to resolve these issues and enable higher-resolution research into outstanding questions in genetic anthropology. The GenoChip includes ancestry informative markers obtained for over 450 human populations, an ancient human (Saqqaq), and two archaic hominins (Neanderthal and Denisovan) and was designed to identify all known Y-chromosome and mtDNA haplogroups. The chip was carefully vetted to avoid inclusion of medically relevant markers. To demonstrate its capabilities, we compared the FST distributions of GenoChip SNPs to those of two commercial arrays. While all arrays yielded similarly shaped (inverse J) FST distributions, the GenoChip autosomal and X-chromosomal distributions had the highest mean FST, attesting to its ability to discern subpopulations. The chip performances are illustrated in a principal component analysis for 14 worldwide populations. In summary, the GenoChip is a dedicated genotyping platform for genetic anthropology. With an unprecedented number of ~12,000 Y-chromosomal and ~3,300 mtDNA SNPs and over 130,000 autosomal and X-chromosomal SNPs without any known health, medical, or phenotypic relevance, the GenoChip is a useful tool for genetic anthropology and population genetics
    Authors Elhaik E., Greenspan E., Staats S., Krahn T., Tyler-Smith C., Xue Y., Tofanelli S., Francalacci P., Cucca F., Pagani L., Jin L., Li H., Schurr T.G., Greenspan B., Wells R.S., and the Genographic Consortium.
    Keywords Population structure, National Geographic, Genographic, GenoChip, Array, Genetic anthropology, AimsFinder, IsoPlotter
    Supplementary Materials Supplementary files,
    19.

    Elhaik, E.. 2013. The Missing Link of Jewish European Ancestry: Contrasting the Rhineland and the Khazarian Hypotheses. Genome Biology and Evolution. 5:61-74.
    Almetrics score (99 percentile, ranked #2 in GBE).
    Corrections.
    Venton, D.. 2013. Highlight: Out of Khazaria - Evidence for "Jewish Genome" Lacking. Genome Biology and Evolution. 5:75-76.

    More...
    Abstract The question of Jewish ancestry has been the subject of controversy for over two centuries and has yet to be resolved. The "Rhineland Hypothesis" depicts Eastern European Jews as a "population isolate" that emerged from a small group of German Jews who migrated eastward and expanded rapidly. Alternatively, the "Khazarian Hypothesis" suggests that Eastern European Jew descended from the Khazars, an amalgam of Turkic clans that settled the Caucasus in the early centuries CE and converted to Judaism in the 8th century. Mesopotamian and Greco-Roman Jews continuously reinforced the Judaized Empire until the 13th century. Following the collapse of their empire, the Judeo-Khazars fled to Eastern Europe. The rise of European Jewry is therefore explained by the contribution of the Judeo-Khazars. Thus far, however, the Khazar's contribution has been estimated only empirically, as the absence of genome-wide data from Caucasus populations precluded testing the Khazarian Hypothesis. Recent sequencing of modern Caucasus populations prompted us to revisit the Khazarian Hypothesis and compare it with the Rhineland Hypothesis. We applied a wide range of population genetic analyses to compare these two hypotheses. Our findings support the Khazarian Hypothesis and portray the European Jewish genome as a mosaic of Caucasus, European, and Semitic ancestries, thereby consolidating previous contradictory reports of Jewish ancestry. We further describe major difference among Caucasus populations explained by early presence of Judeans in the Southern and Central Caucasus. Our results have important implications on the demographic forces that shaped the genetic diversity in the Caucasus and medical studies.
    Authors Elhaik E.
    Keywords Population structure, Jewish genome, Jews, Khazars, Khazaria, Popualtion genetics, Ashkenazi Jews, population isolate, Judeans
    Supplementary Materials Supplementary files,


    L'eglise d'Auvers-sur-Oise (The Church at Auvers-sur-Oise), Dr Paul Gachet (1890)

    2012

    18.

    Elhaik, E.. 2012. Empirical Distributions of FST from Large-Scale Human Polymorphism Data. PloS ONE. 7:e49837.

    More...
    Abstract Studies of the apportionment of human genetic variation have long established that most human variation is within population groups and that the additional variation between population groups is small but greatest when comparing different continental populations. These studies often used Wright's FST that apportions the standardized variance in allele frequencies within and between population groups. Because local adaptations increase population differentiation, high-FST may be found at closely linked loci under selection and used to identify genes undergoing directional or heterotic selection. We re-examined these processes using HapMap data. We analyzed 3 million SNPs on 602 samples from eight worldwide populations and a consensus subset of 1 million SNPs found in all populations. We identified four major features of the data: First, a hierarchically FST analysis showed that only a paucity (12%) of the total genetic variation is distributed between continental populations and even a lesser genetic variation (1%) is found between intra-continental populations. Second, the global FST distribution closely follows an exponential distribution. Third, although the overall FST distribution is similarly shaped (inverse J), FST distributions varies markedly by allele frequency when divided into non-overlapping groups by allele frequency range. Because the mean allele frequency is a crude indicator of allele age, these distributions mark the time-dependent change in genetic differentiation. Finally, the change in mean-FST of these groups is linear in allele frequency. These results suggest that investigating the extremes of the FST distribution for each allele frequency group is more efficient for detecting selection. Consequently, we demonstrate that such extreme SNPs are more clustered along the chromosomes than expected from linkage disequilibrium for each allele frequency group. These genomic regions are therefore likely candidates for natural selection.
    Authors Elhaik E.
    Keywords Genetic variation, Population genetics, Population statistics, Population structure, Wright Fst, F-statistics, Hierarchical Fst, Mean Fst, Selection, HapMap, Neutral theory, Fst distribution, Geography, Humans
    Supplementary Materials Available at Supplementary materials,
    17.

    Elhaik, E., Chanda P., and Bader J.S. 2012. HAPZIPPER: sharing HapMap populations just got easier.Nucl. Acid Res. gks709.

    More...
    Abstract The rapidly growing amount of genomic sequence data being generated and made publicly available necessitate the development of new data storage and archiving methods. The vast amount of data being shared and manipulated also create new challenges for network resources. Thus, developing advanced data compression techniques is becoming an integral part of data production and analysis. The HapMap project is one of the largest public resources of human single-nucleotide polymorphisms (SNPs), characterizing over 3 million SNPs genotyped in over 1000 individuals. The standard format and biological properties of HapMap data suggest that a dedicated genetic compression method can outperform generic compression tools. We propose a compression methodology for genetic data by introducing HAPZIPPER, a lossless compression tool tailored to compress HapMap data beyond benchmarks defined by generic tools such as GZIP, BZIP2 and LZMA. We demonstrate the usefulness of HAPZIPPER by compressing HapMap 3 populations to <5% of their original sizes. HAPZIPPER is freely downloadable from https://bitbucket.org/pchanda/hapzipper/downloads/HapZipper.tar.bz2.
    Authors Chanda P., Elhaik E., and Bader J.S.
    Keywords Compression algorithm, HapZipper, HapMap.
    Supplementary Materials Supplementary file


    Le Moulin de la Galette, Renoir Pierre-Auguste (1876)

    2011

    16.

    Goes F.S., Rongione M., Chen Y.C., Karchin R., Elhaik, E., and Potash J.B. 2011. Exonic DNA Sequencing of ERBB4 in Bipolar Disorder. PLoS ONE. 6:e20242.

    More...
    Abstract The Neuregulin-ErbB4 pathway plays a crucial role in brain development and constitutes one of the most biologically plausible signaling pathways implicated in schizophrenia and, to a lesser extent, in bipolar disorder (BP). However, recent genome-wide association analyses have not provided evidence for common variation in NRG1 or ERBB4 influencing schizophrenia or bipolar disorder susceptibility. In this study, we investigate the role of rare coding variants in ERBB4 in BP cases with mood-incongruent psychotic features, a form of BP with arguably the greatest phenotypic overlap with schizophrenia. We performed Sanger sequencing of all 28 exons in ERBB4, as well as part of the promoter and part of the 39UTR sequence, hypothesizing that rare deleterious variants would be found in 188 cases with mood-incongruent psychosis from the GAIN BP study. We found 42 variants, of which 16 were novel, although none were non-synonymous or clearly deleterious. One of the novel variants, present in 11.2% of cases, is located next to an alternative stop codon, which is associated with a shortened transcript of ERBB4 that is not translated. We genotyped this variant in the GAIN BP case-control samples and found a marginally significant association with mood-incongruent psychotic BP compared with controls (additive model: OR = 1.64, P-value = 0.055; dominant model: OR = 1.73. P-value = 0.039). In conclusion, we found no rare variants of clear deleterious effect, but did uncover a modestly associated novel variant that could affect alternative splicing of ERBB4. However, the modest sample size in this study cannot definitively rule out a role for rare variants in bipolar disorder and studies with larger sample sizes are needed to confirm the observed association
    Authors Goes F.S., Rongione M., Chen Y.C., Karchin R., Elhaik, E., and Potash J.B
    Keywords Bipolar Disorder, ERBB4, Exome.
    15.

    Suen, G., Teiling, C., Li, L., Holt, C., Abouheif, E., Bornberg-Bauer, E., Bouffard, P., Caldera, E.J., Cash, E., Cavanaugh, A., Denas, O., Elhaik, E. et al. 2011. The Genome Sequence of the Leaf-Cutter Ant Atta cephalotes Reveals Insights into Its Obligate Symbiotic Lifestyle. PLoS Genetics. 7:e1002007.

    More...
    Abstract Leaf-cutter ants are one of the most important herbivorous insects in the Neotropics, harvesting vast quantities of fresh leaf material. The ants use leaves to cultivate a fungus that serves as the colony's primary food source. This obligate ant-fungus mutualism is one of the few occurrences of farming by non-humans and likely facilitated the formation of their massive colonies. Mature leaf-cutter ant colonies contain millions of workers ranging in size from small garden tenders to large soldiers, resulting in one of the most complex polymorphic caste systems within ants. To begin uncovering the genomic underpinnings of this system, we sequenced the genome of Atta cephalotes using 454 pyrosequencing. One prediction from this ant's lifestyle is that it has undergone genetic modifications that reflect its obligate dependence on the fungus for nutrients. Analysis of this genome sequence is consistent with this hypothesis, as we find evidence for reductions in genes related to nutrient acquisition. These include extensive reductions in serine proteases (which are likely unnecessary because proteolysis is not a primary mechanism used to process nutrients obtained from the fungus), a loss of genes involved in arginine biosynthesis (suggesting that this amino acid is obtained from the fungus), and the absence of a hexamerin (which sequesters amino acids during larval development in other insects). Following recent reports of genome sequences from other insects that engage in symbioses with beneficial microbes, the A. cephalotes genome provides new insights into the symbiotic lifestyle of this ant and advances our understanding of host-microbe symbioses.
    Authors Suen, G., Teiling, C., Li, L., Holt, C., Abouheif, E., Bornberg-Bauer, E., Bouffard, P., Caldera, E.J., Cash, E., Cavanaugh, A., Denas, O., Elhaik, E., Fave, M., Gadau, J., Gibson, J.D., Graur, D., Grubbs, K.J., Hagen, D.E., Harkins, T.T., Helmkampf, M,, Hu, H., Johnson, B.R., Kim, J., Marsh, S.E., Moeller, J.A., Muñoz-Torres, M.C., Murphy, M.C., Naughton, M.C., Nigam, S., Overson, R., Rajakumar, R., Reese, J.T., Scott, J.J., Smith, C.R., Tao, S., Tsutsui, N.D., Viljakainen, L., Wissler, L., Yandell, M.D., Zimmer, F., Taylor, J., Slater, S.C., Clifton, S.W., Warren, W.C., Elsik, C.G., Smith, C.D., Weinstock, G.M., Gerardo, N.M., and Currie, C.R.
    Keywords IsoPlotter, Isochores, GC content, Leaf-cutter ant, Atta cephalotes, Genome composition, Genome organization.
    14.

    Smith, C. D., Zimin, A., Holt, C., Abouheif, E., Benton, R., Cash, E., Croset, V., Currie, C. R., Elhaik, E. et al. 2011. Draft genome of the globally widespread and invasive Argentine ant (Linepithema humile). PNAS. Early publication.

    More...
    Abstract We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.
    Authors Smith, C. D., Zimin, A., Holt, C., Abouheif, E., Benton, R., Cash, E., Croset, V., Currie, C. R., Elhaik, E., Elsik, C. G., Fave, M. J., Fernandes, V., Gadau, J., Gibson, J. D., Graur, D., Grubbs, K. J., Hagen, D. E., Helmkampf, M., Holley, J. A., Hu, H., Viniegra, A. S., Johnson, B. R., Johnson, R. M., Khila, A., Kim, J. W., Laird, J., Mathis, K. A., Moeller, J. A., Munoz-Torres, M. C., Murphy, M. C., Nakamura, R., Nigam, S., Overson, R. P., Placek, J. E., Rajakumar, R., Reese, J. T., Robertson, H. M., Smith, C. R., Suarez, A. V., Suen, G., Suhr, E. L., Tao, S., Torres, C. W., van Wilgenburg, E., Viljakainen, L., Walden, K. K., Wild, A. L., Yandell, M., Yorke, J. A., and Tsutsui, N. D.
    Keywords IsoPlotter, Isochores, GC content, Argentine ant, Linepithema humile, Genome composition, Genome organization.
    13.

    Smith, C. R., Smith, C. D., Robertson, H. M., Helmkampf, M., Zimin, A., Yandell, M., Holt, C., Hu, H., Abouheif, E., Benton, R., Cash, E., Croset, V., Currie, C. R., Elhaik, E. et al. 2011. Draft genome of the red harvester ant Pogonomyrmex barbatus. PNAS. Early publication.

    More...
    Abstract We report the draft genome sequence of the red harvester ant, Pogonomyrmex barbatus. The genome was sequenced using 454 pyrosequencing, and the current assembly and annotation were completed in less than 1 y. Analyses of conserved gene groups (more than 1,200 manually annotated genes to date) suggest a high-quality assembly and annotation comparable to recently sequenced insect genomes using Sanger sequencing. The red harvester ant is a model for studying reproductive division of labor, phenotypic plasticity, and sociogenomics. Although the genome of P. barbatus is similar to other sequenced hymenopterans (Apis mellifera and Nasonia vitripennis) in GC content and compositional organization, and possesses a complete CpG methylation toolkit, its predicted genomic CpG content differs markedly from the other hymenopterans. Gene networks involved in generating key differences between the queen and worker castes (e.g., wings and ovaries) show signatures of increased methylation and suggest that ants and bees may have independently co-opted the same gene regulatory mechanisms for reproductive division of labor. Gene family expansions (e.g., 344 functional odorant receptors) and pseudogene accumulation in chemoreception and P450 genes compared with A. mellifera and N. vitripennis are consistent with major life-history changes during the adaptive radiation of Pogonomyrmex spp., perhaps in parallel with the development of the North American deserts.
    Authors Smith, C. R., Smith, C. D., Robertson, H. M., Helmkampf, M., Zimin, A., Yandell, M., Holt, C., Hu, H., Abouheif, E., Benton, R., Cash, E., Croset, V., Currie, C. R., Elhaik, E., Elsik, C. G., Fave, M. J., Fernandes, V., Gibson, J. D., Graur, D., Gronenberg, W., Grubbs, K. J., Hagen, D. E., Viniegra, A. S., Johnson, B. R., Johnson, R. M., Khila, A., Kim, J. W., Mathis, K. A., Munoz-Torres, M. C., Murphy, M. C., Mustard, J. A., Nakamura, R., Niehuis, O., Nigam, S., Overson, R. P., Placek, J. E., Rajakumar, R., Reese, J. T., Suen, G., Tao, S., Torres, C. W., Tsutsui, N. D., Viljakainen, L., Wolschin, F., and Gadau, J.
    Keywords IsoPlotter, Isochores, GC content, red harvester ant, Pogonomyrmex barbatus, Genome composition, Genome organization.


    The Elder Sister, William Bouguereau (1869)

    2010

    12.

    Kirkness, E. F., Haas, B. J., Sun, W., Braig, H. R., Perotti, M. A., Clark, J. M., Lee, S. H., Robertson, H. M., Kennedy, R. C., Elhaik, E. et al. 2010. Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle. PNAS. 107: 12168-12173 .

    More...
    Abstract As an obligatory parasite of humans, the body louse (Pediculus humanus humanus) is an important vector for human diseases, including epidemic typhus, relapsing fever, and trench fever. Here, we present genome sequences of the body louse and its primary bacterial endosymbiont Candidatus Riesia pediculicola. The body louse has the smallest known insect genome, spanning 108 Mb. Despite its status as an obligate parasite, it retains a remarkably complete basal insect repertoire of 10,773 protein-coding genes and 57 microRNAs. Representing hemimetabolous insects, the genome of the body louse thus provides a reference for studies of holometabolous insects. Compared with other insect genomes, the body louse genome contains significantly fewer genes associated with environmental sensing and response, including odorant and gustatory receptors and detoxifying enzymes. The unique architecture of the 18 minicircular mitochondrial chromosomes of the body louse may be linked to the loss of the gene encoding the mitochondrial single-stranded DNA binding protein. The genome of the obligatory louse endosymbiont Candidatus Riesia pediculicola encodes less than 600 genes on a short, linear chromosome and a circular plasmid. The plasmid harbors a unique arrangement of genes required for the synthesis of pantothenate, an essential vitamin deficient in the louse diet. The human body louse, its primary endosymbiont, and the bacterial pathogens that it vectors all possess genomes reduced in size compared with their free-living close relatives. Thus, the body louse genome project offers unique information and tools to use in advancing understanding of coevolution among vectors, symbionts, and pathogens.
    Authors Kirkness, E. F., Haas, B. J., Sun, W., Braig, H. R., Perotti, M. A., Clark, J. M., Lee, S. H., Robertson, H. M., Kennedy, R. C., Elhaik, E., Gerlach, D., Kriventseva, E. V., Elsik, C. G., Graur, D., Hill, C. A., Veenstra, J. A., Walenz, B., Tubio, J. M., Ribeiro, J. M., Rozas, J., Johnston, J. S., Reese, J. T., Popadic, A., Tojo, M., Raoult, D., Reed, D. L., Tomoyasu, Y., Krause, E., Mittapalli, O., Margam, V. M., Li, H. M., Meyer, J. M., Johnson, R. M., Romero-Severson, J., Vanzee, J. P., Alvarez-Ponce, D., Vieira, F. G., Aguade, M., Guirao-Rico, S., Anzola, J. M., Yoon, K. S., Strycharz, J. P., Unger, M. F., Christley, S., Lobo, N. F., Seufferheld, M. J., Wang, N., Dasch, G. A., Struchiner, C. J., Madey, G., Hannick, L. I., Bidwell, S., Joardar, V., Caler, E., Shao, R., Barker, S. C., Cameron, S., Bruggner, R. V., Regier, A., Johnson, J., Viswanathan, L., Utterback, T. R., Sutton, G. G., Lawson, D., Waterhouse, R. M., Venter, J. C., Strausberg, R. L., Berenbaum, M. R., Collins, F. H., Zdobnov, E. M., and Pittendrigh, B. R.
    Keywords IsoPlotter, Isochores, GC content, Human body louse, Genome composition, Genome organization.
    11.

    Elhaik, E., Graur, D., Josic, K., and Landan, G. 2010. Identifying compositionally homogeneous domains within the human genome using a novel segmentation algorithm. Nucl. Acid Res., e158.

    More...
    Abstract It has been suggested that the mammalian genome is composed mainly of long compositionally homogeneous domains. Such domains are frequentlyidentified using recursive segmentation algorithms based on the Jensen-Shannon divergence.However, a common difficulty with such methods is deciding when to halt the recursive partitioningand what criteria to use in deciding whether a detected boundary between two segments is real or not. We demonstrate that commonly used halting criteria are intrinsically biased, and propose IsoPlotter, a parameter-free segmentation algorithm that overcomes such biases by using a simple dynamic halting criterion and tests the homogeneity of the inferred domains. IsoPlotter was compared with an alternative segmentation algorithm, DJS, using two sets of simulated genomic sequences. Our results show that IsoPlotter was able to infer both long and short compositionally homogeneous domains with low GC content dispersion, whereas DJS failed to identify short compositionally homogeneous domains and sequences with low compositional dispersion. By segmenting the human genome with IsoPlotter, we found that one-third of the genome is composed of compositionally nonhomogeneous domains and the remaining is a mixture of many short compositionally homogeneous domains and relatively few long ones.
    Authors Eran Elhaik, Dan Graur, Kresimir Josic, and Giddy Landan.
    Keywords IsoPlotter, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.
    Software Available here.
    10.

    Elhaik, E., Graur, D., and Josic, K. 2010. Comparative testing of DNA segmentation algorithms using benchmark simulations. Mol. Biol. Evol. 27: 1015-1024.

    More...
    Abstract Numerous segmentation methods for the detection of compositionally homogeneous domains within genomic sequences have been proposed. Unfortunately, these methods yield inconsistent results. Here, we present a benchmark consisting of two sets of simulated genomic sequences for testing the performances of segmentation algorithms. Sequences in the first set are composed of fixed-sized homogeneous domains, distinct in their between-domain GC-content variability. The sequences in the second set are composed of a mosaic of many short domains and a few long ones, distinguished by sharp GC content boundaries between neighboring domains. We use these sets to test the performance of seven segmentation algorithms in the literature. Our results show that recursive segmentation algorithms based on the Jensen-Shannon divergence outperform all other algorithms. However, even these algorithms perform poorly in certain instances because of the arbitrary choice of a segmentation-stopping criterion.
    Authors Eran Elhaik, Dan Graur, and Kresimir Josic.
    Keywords Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition, Benchmark simulations.
    Software Available here.
    Supplementary materials Available here.
    9.

    Elhaik, E., Graur, D., and Josic, K. 2010. 'Genome order index' should not be used for defining compositional constraints in nucleotide sequences - a case study of the Z-curve. Biology Direct. 5:10.

    More...
    Abstract Background: The Z-curve is a three dimensional representation of DNA sequences proposed over a decade ago and has been extensively applied to sequence segmentation, horizontal gene transfer detection, and sequence analysis. Based on the Z-curve, a "genome order index," was proposed, which is defined as S = a^2+ c^2+t^2+g^2, where a, c, t, and g are the nucleotide frequencies of A, C, T, and G, respectively. This index was found to be smaller than 1/3 for almost all tested genomes, which was taken as support for the existence of a constraint on genome composition. A geometric explanation for this constraint has been suggested. Each genome was represented by a point P whose distance from the four faces of a regular tetrahedron was given by the frequencies a, c, t, and g. They claimed that an inscribed sphere of radius r = 1/3^0.5 contains almost all points corresponding to various genomes, implying that S < r2. The distribution of the points P obtained by S was studied using the Z-curve.
    Results: In this work, we studied the basic properties of the Z-curve using the "genome order index" as a case study. We show that (1) the calculation of the radius of the inscribed sphere of a regular tetrahedron is incorrect, (2) the S index is narrowly distributed, (3) based on the second parity rule, the S index can be derived directly from the Shannon entropy and is, therefore, redundant, and (4) the Z-curve suffers from over dimensionality, and the dimension stands for GC content alone suffices to represent any given genome.
    Conclusion: The "genome order index" S does not represent a constraint on nucleotide composition. Moreover, S can be easily computed from the Gini-Simpson index and be directly derived from entropy and is redundant. Overall, the Z-curve and S are over-complicated measures to GC content and Shannon H index, respectively. Reviewers: This article was reviewed by Claus Wilke, Joel Bader, Marek Kimmel and Uladzislau Hryshkevich (nominated by Itai Yanai).
    Authors Eran Elhaik, Dan Graur, and Kresimir Josic.
    Keywords Nucleotide composition; Genomic GC content; Shannon H function; Z-curve; Genome order index; Gini-Simpson index
    8.

    Werren J. H., Richards S., Desjardins C. A., Niehuis O., Gadau J., John K. J. K., Beukeboom L. W., Desplan C., Elsik C. G., Grimmelikhuijzen C. J. P., Kitts P., Lynch J., Murphy T., Oliveira D. C. S. G., Smith C. D., Zande L., Worley K. C., Zdobnov E. M., Aerts M., Albert S., Anaya V. H., Anzola J. M., Angel R., Barchuk A. R., Behura S. K., Bera A. N., Berenbaum M. R., Bertossa R. C., Bitondi M. M. G., Bordenstein S. R., Bork P., Bornberg-Bauer E., Brunain M., Cazzamali G., Chaboub L., Chacko J., Chavez D., Childers C. P., Choi J-H., Clark M. E., Claudianos C., Clinton R. A., Cree A. G., Cristino A. S., Dang P. M., Darby A. C, de Graaf D. C., Devreese B., Dinh H. H., Edwards R., Elango N., Elhaik, E. et al. 2009. Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science, 327:343-348.

    More...
    Abstract We report here genome sequences and comparative analyses of three closely related parasitoid wasps: Nasonia vitripennis, N. giraulti, and N. longicornis. Parasitoids are important regulators of arthropod populations, including major agricultural pests and disease vectors, and Nasonia is an emerging genetic model, particularly for evolutionary and developmental genetics. Key findings include the identification of a functional DNA methylation tool kit; hymenopteran-specific genes including diverse venoms; lateral gene transfers among Pox viruses, Wolbachia, and Nasonia; and the rapid evolution of genes involved in nuclearmitochondrial interactions that are implicated in speciation. Newly developed genome resources advance Nasonia for genetic research, accelerate mapping and cloning of quantitative trait loci, and will ultimately provide tools and knowledge for further increasing the utility of parasitoids as pest insect-control agents.
    Authors Werren J. H., Richards S., Desjardins C. A., Niehuis O., Gadau J., John K. J. K., Beukeboom L. W., Desplan C., Elsik C. G., Grimmelikhuijzen C. J. P., Kitts P., Lynch J., Murphy T., Oliveira D. C. S. G., Smith C. D., Zande L., Worley K. C., Zdobnov E. M., Aerts M., Albert S., Anaya V. H., Anzola J. M., Angel R., Barchuk A. R., Behura S. K., Bera A. N., Berenbaum M. R., Bertossa R. C., Bitondi M. M. G., Bordenstein S. R., Bork P., Bornberg-Bauer E., Brunain M., Cazzamali G., Chaboub L., Chacko J., Chavez D., Childers C. P., Choi J-H., Clark M. E., Claudianos C., Clinton R. A., Cree A. G., Cristino A. S., Dang P. M., Darby A. C, de Graaf D. C., Devreese B., Dinh H. H., Edwards R., Elango N., Elhaik, E., Ermolaeva O., Evans J. D., Foret S., Fowler G. R., Gerlach D., Gibson J. D., Gilbert D. G., Graur D., Grunder S., Hagen D. E., Han Y., Hauser F., Hultmark D., Hunter H. C., Hurst G. D. D., Jhangian S. N., Jiang H., Johnson R. M., Jones A. K., Junier T., Kadowaki T., Kamping A., Kapustin Y., Kechavarzi B., Kim J., Kim J., Kiryutin B., Koevoets T., Kovar C. L., Kriventseva E. V., Kucharski R., Lee H., Lee S. L., Lees K., Lewis L. R., Loehlin D. W., Logsdon J. M., Lopez J. A., Lozado R. J., Maglott D., Maleszka R., Mayampurath A., Mazur D. J., McClure M. A., Moore A. D., Morgan M. B., Jean Muller J., Munoz-Torres M. C., Donna Muzny D. M., Nazareth L. V., Neupert S., Nguyen N. B., Nunes F. M. F., Oakeshott J. G., Okwuonu G. O., Pannebakker B. A., Pejaver V. R., Peng Z., Pratt S. C., Predel R., Pu L-L., Ranson H., Raychoudhury R., Rechtsteiner A., Reese J. T., Reid J. G., Riddle M., Robertson H. M., Romero-Severson J., Rosenberg M., Sackton T. B., Sattelle D. B., Schluns H., Schmitt T., Schneider M., Schuler A., Schurko A. M., Shuker D. M., Simoes Z. L. P., Sinha S., Smith Z., Solovyev V., Souvorov A., Springauf A., Stafflinger E., Stage D. E., Stanke M., Tanaka Y., Telschow A., Trent C., Vattathil S., Verhulst E. C., Viljakainen L., Wanner K. W., Waterhouse R. M., Whitfield J. B., Wilkes T. E., Williamson M., Willis J. H., Wolschin F., Wyder S., Yamada T., Yi S. V., Zecher C. N., Zhang L., Gibbs R. A.
    Keywords Nasonia, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.


    The Nightmare, Henry Fuseli (1781)

    2009

    7.

    Elhaik, E., Landan, G., and Graur, D. 2009. Can GC Content at Third-Codon Positions Be Used as a Proxy for Isochore Composition? Mol. Biol. Evol., 26: 1829-1833.

    More...
    Abstract The isochore theory depicts the genomes of warm-blooded vertebrates as a mosaic of long genomic regions that are characterized by relatively homogeneous GC content. In the absence of genomic data, the GC content at third-codon positions of protein-coding genes (GC3) was commonly used as a proxy for the GC content of isochores. Oddly, in the postgenomic era, GC3 is still sometimes used as a proxy for the GC composition of isochores. Here, we use genic and genomic sequences from human, chimpanzee, cow, mouse, rat, chicken, and zebrafish to show that GC3 only explains a very small proportion of the variation in GC content of long genomic sequences flanking the genes (GCf), and what little correlation there is between GC3 and GCf was found to decay rapidly with distance from the gene. The coefficient of variation of GC3 was found to be much larger than that of GCf and, therefore, GC3 and GCf values are not comparable with each other. Comparisons of orthologous gene pairs from 1) human and chimpanzee and 2) mouse and rat show strong correlations between their GC3 values, but very weak correlations between their GCf values. We conclude that the GC content of third-codon position cannot be used as stand-in for isochoric composition
    Authors Eran Elhaik, Giddy Landan, and Dan Graur.
    Keywords Isochores, GC3, GC content, Flanking regions, Genome composition, Compositional patterns
    6.

    Elsik, C. G., Tellam, R. L., Worley, K. C., Gibbs, R. A., Muzny, D. M., Weinstock, G. M., Adelson, D. L., Eichler, E. E., Elnitski, E., Guigo, G., Hamernik, D. L., Kappes, S. M., Lewin, H. A., Lynn, D. J., Nicholas, F. W., Reymond, R., Rijnkels, R., Skow, L. C., Zdobnov, E. M., Schook, S., Womack, W., Alioto, A., Antonarakis, S. E., Astashyn, A., Chapple, C. E., Chen, C., Chrast, C., Camara, C., Ermolaeva, E., Henrichsen, C. N., Hlavina, H., Kapustin, K., Kiryutin, K., Kitts, K., Kokocinski, K., Landrum, L., Maglott, M., Pruitt, P., Sapojnikov, S., Searle, S. M., Solovyev, S., Souvorov, S., Ucla, U., Wyss, W., Anzola, J. M., Gerlach, G., Elhaik, E. et al. 2009. The Genome Sequence of Taurine Cattle: A Window to Ruminant Biology and Evolution. Science, 324: 522-528.

    More...
    Abstract To understand the biology and evolution of ruminants, the cattle genome was sequenced to about sevenfold coverage. The cattle genome contains a minimum of 22,000 genes, with a core set of 14,345 orthologs shared among seven mammalian species of which 1217 are absent or undetected in noneutherian (marsupial or monotreme) genomes. Cattle-specific evolutionary breakpoint regions in chromosomes have a higher density of segmental duplications, enrichment of repetitive elements, and species-specific variations in genes associated with lactation and immune responsiveness. Genes involved in metabolism are generally highly conserved, although five metabolic genes are deleted or extensively diverged from their human orthologs. The cattle genome sequence thus provides a resource for understanding mammalian evolution and accelerating livestock genetic improvement for milk and meat production.
    Authors Elsik, C. G., Tellam, R. L., Worley, K. C., Gibbs, R. A., Muzny, D. M., Weinstock, G. M., Adelson, D. L., Eichler, E. E., Elnitski, E., Guigo, G., Hamernik, D. L., Kappes, S. M., Lewin, H. A., Lynn, D. J., Nicholas, F. W., Reymond, R., Rijnkels, R., Skow, L. C., Zdobnov, E. M., Schook, S., Womack, W., Alioto, A., Antonarakis, S. E., Astashyn, A., Chapple, C. E., Chen, C., Chrast, C., Camara, C., Ermolaeva, E., Henrichsen, C. N., Hlavina, H., Kapustin, K., Kiryutin, K., Kitts, K., Kokocinski, K., Landrum, L., Maglott, M., Pruitt, P., Sapojnikov, S., Searle, S. M., Solovyev, S., Souvorov, S., Ucla, U., Wyss, W., Anzola, J. M., Gerlach, G., Elhaik, E., Graur, G., Reese, J. T., Edgar, R. C., Mcewan, J. C., Payne, G. M., Raison, J. M., Junier, J., Kriventseva, E. V., Eyras, E., Plass, P., Donthu, D., Larkin, D. M., Reecy, R., Yang, M. Q., Chen, C., Cheng, C., Chitko-Mckown, C. G., Liu, G. E., Matukumalli, L. K., Song, S., Zhu, Z., Bradley, D. G., Brinkman, F. S., Lau, L. P., Whiteside, M. D., Walker, W., Wheeler, T. T., Casey, C., German, B. J., Lemay, D. G., Maqbool, N. J., Molenaar, A. J., Seo, S., Stothard, S., Baldwin, C. L., Baxter, B., Brinkmeyer-Langford, C. L., Brown, W. C., Childers, C. P., Connelley, C., Ellis, S. A., Fritz, F., Glass, E. J., Herzig, C. T., Iivanainen, I., Lahmers, K. K., Bennett, A. K., Dickens, M. C., Gilbert, J. G., Hagen, D. E., Salih, S., Aerts, A., Caetano, A. R., Dalrymple, D., Garcia, J. F., Gill, C. A., Hiendleder, S. G., Memili, M., Spurlock, S., Williams, J. L., Alexander, A., Brownstein, M. J., Guan, G., Holt, R. A., Jones, S. J., Marra, M. A., Moore, M., Moore, S. S., Roberts, R., Taniguchi, T., Waterman, R. C., Chacko, C., Chandrabose, M. M., Cree, C., Dao, M. D., Dinh, H. H., Gabisi, R. A., Hines, H., Hume, H., Jhangiani, S. N., Joshi, J., Kovar, C. L., Lewis, L. R., Liu, L., Lopez, L., Morgan, M. B., Nguyen, N. B., Okwuonu, G. O., Ruiz, S. J., Santibanez, S., Wright, R. A., Buhay, B., Ding, D., Dugan- Rocha, D., Herdandez, H., Holder, H., Sabo, S., Egan, E., Goodell, G., Wilczek-Boney, W., Fowler, G. R., Hitchens, M. E., Lozado, R. J., Moen, M., Steffen, S., Warren, J. T., Zhang, Z., Chiu, C., Schein, J. E., Durbin, J. K., Havlak, H., Jiang, J., Liu, L., Qin, Q., Ren, R., Shen, S., Song, S., Bell, S. N., Davis, D., Johnson, A. J., Lee, L., Nazareth, L. V., Patel, B. M., Pu, P., Vattathil, V., Williams, R. L., Curry, C., Hamilton, H., Sodergren, S., Wheeler, D. A., Barris, B., Bennett, G. L., Eggen, E., Green, R. D., Harhay, G. P., Hobbs, H., Jann, J., Keele, J. W., Kent, M. P., Lien, L., Mckay, S. D., Mcwilliam, M., Ratnakumar, R., Schnabel, R. D., Smith, S., Snelling, W. M., Sonstegard, T. S., Stone, R. T., Sugimoto, S., Takasuga, T., Taylor, J. F., Van Tassell, C. P., Macneil, M. D., Abatepaulo, A. R., Abbey, C. A., Ahola, A., Almeida, I. G., Amadio, A. F., Anatriello, A., Bahadue, S. M., Biase, F. H., Boldt, C. R., Carroll, J. A., Carvalho, W. A., Cervelatti, E. P., Chacko, C., Chapin, J. E., Cheng, C., Choi, C., Colley, A. J., de Campos, T. A., Donato, M. D., Santos, I. K., de Oliveira, C. J., Deobald, D., Devinoy, D., Donohue, K. E., Dovc, D., Eberlein, E., Fitzsimmons, C. J., Franzin, A. M., Garcia, G. R., Genini, G., Gladney, C. J., Grant, J. R., Greaser, M. L., Green, J. A., Hadsell, D. L., Hakimov, H. A., Halgren, H., Harrow, J. L., Hart, E. A., Hastings, H., Hernandez, H., Hu, H., Ingham, I., Iso-Touru, I., Jamis, J., Jensen, J., Kapetis, K., Kerr, K., Khalil, S. S., Khatib, K., Kolbehdari, K., Kumar, C. G., Kumar, K., Leach, L., Lee, J. C., Li, L., Logan, K. M., Malinverni, M., Marques, M., Martin, W. F., Martins, N. F., Maruyama, S. R., Mazza, M., Mclean, K. L., Medrano, J. F., Moreno, B. T., More, D. D., Muntean, C. T., Nandakumar, H. P., Nogueira, M. F., Olsaker, O., Pant, S. D., Panzitta, P., Pastor, R. C., Poli, M. A., Poslusny, P., Rachagani, R., Ranganathan, R., Razpet, R., Riggs, P. K., Rincon, R., Rodriguez-Osorio, R., Rodriguez-Zas, S. L., Romero, N. E., Rosenwald, R., Sando, S., Schmutz, S. M., Shen, S., Sherman, S., Southey, B. R., Lutzow, Y. S., Sweedler, J. V., Tammen, T., Telugu, B. P., Urbanski, J. M., Utsunomiya, Y. T., Verschoor, C. P., Waardenberg, A. J., Wang, W., Ward, W., Weikard, W., Welsh, T. H., White, S. N., Wilming, L. G., Wunderlich, K. R., Yang, Y., and Zhao, Z.
    Keywords Cow, Cattle, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.


    Tower of Babel, Pieter Bruegel the Elder (1563)

    2008

    5.

    Richards S., Gibbs R. A., Weinstock G. M., Brown S. J., Denell R. E., Beeman R.W. , Bucher G., Friedrich M., Grimmelikhuijzen C. J. P., Klingler M., Lorenzen M., Roth S., Schroder R., Tautz D., Zdobnov E. M., Muzny D., Attaway T., Bell S., Buhay C. J., Chandrabose M. N., Chavez D., Clerk- Blankenburg K. P., Cree A., Dao M., Davis C., Chacko J., Dinh H., Dugan-Rocha S., Fowler G., Garner T. T., Garnes J., Gnirke A., Hawes A., Hernandez J., Hines S., Holder M., Hume J., Jhangiani S. N., Joshi V., Mohid Khan Z., Jackson L., Kovar C., Kowis A., Lee S., Lewis L. R., Margolis J., Morgan M., Nazareth L .V., Nguyen N., Okwuonu G., Parker D., Ruiz S-J., Santibanez J., Savard J., Scherer S. E., Schneider B., Sodergren E., Vattahil S., Villasana D., White C. S., Wright R., Park Y., Lord J., Oppert B., Wang L., Liu Y., Worley K., Elsik C. G., Reese J. T., Elhaik E. et al. 2008. The genome of the model beetle and pest Tribolium castaneum. Nature, 452: 949-955.

    More...
    Abstract Tribolium castaneum is a member of the most species-rich eukaryotic order, a powerful model organism for the study of generalized insect development, and an important pest of stored agricultural products. We describe its genome sequence here. This omnivorous beetle has evolved the ability to interact with a diverse chemical environment, as shown by large expansions in odorant and gustatory receptors, as well as P450 and other detoxification enzymes. Development in Tribolium is more representative of other insects than is Drosophila, a fact reflected in gene content and function. For example, Tribolium has retained more ancestral genes involved in cell-cell communication than Drosophila, some being expressed in the growth zone crucial for axial elongation in short-germ development. SystemicRNA interference in T. castaneum functions differently from that in Caenorhabditis elegans, but nevertheless offers similar power for the elucidation of gene function and identification of targets for selective insect control
    Authors Richards S., Gibbs R. A., Weinstock G. M., Brown S. J., Denell R. E., Beeman R.W. , Bucher G., Friedrich M., Grimmelikhuijzen C. J. P., Klingler M., Lorenzen M., Roth S., Schröder R., Tautz D., Zdobnov E. M., Muzny D., Attaway T., Bell S., Buhay C. J., Chandrabose M. N., Chavez D., Clerk- Blankenburg K. P., Cree A., Dao M., Davis C., Chacko J., Dinh H., Dugan-Rocha S., Fowler G., Garner T. T., Garnes J., Gnirke A., Hawes A., Hernandez J., Hines S., Holder M., Hume J., Jhangiani S. N., Joshi V., Mohid Khan Z., Jackson L., Kovar C., Kowis A., Lee S., Lewis L. R., Margolis J., Morgan M., Nazareth L .V., Nguyen N., Okwuonu G., Parker D., Ruiz S-J., Santibanez J., Savard J., Scherer S. E., Schneider B., Sodergren E., Vattahil S., Villasana D., White C. S., Wright R., Park Y., Lord J., Oppert B., Wang L., Liu Y., Worley K., Elsik C. G., Reese J. T., Elhaik E., Landan G., Graur D., Arensburger P., Atkinson P., Beidler J., Demuth J. P., Drury D. W., Du Y-Z., Fujiwara H., Maselli V., Osanai M., Robertson H. M., Tu Z., Wang J-J., Wang S., Song H., Zhang L., Werner D., Stanke M., Morgenstern B., Solovyev V., Kosarev P., Brown G., Chen H-C., Ermolaeva O., Hlavina W., Kapustin Y., Kiryutin B., Kitts P., Maglott D., Pruitt K., Sapojnikov V., Souvorov A., Mackey A. J., Waterhouse R. M., Wyder S., Kriventseva E. V., Kadowaki T., Bork P., Aranda M., Bao R., Beermann A., Berns N., Bolognesi R., Bonneton F., Bopp D., Butts T., Chaumot A., Ferrier D. E. K., Gordon C. M., Jindra M., Lan Q., Lattorff H. M. G., Laudet V., von Levetsow C., Liu Z., Lutz R., Lynch J. A., Nunes da Fonseca R., Posnien N., Reuter R., Schinko J. B., Schmitt C., Schoppmeier M., Shippy T. D., Simonnet F., Marques-Souza H., Tomoyasu Y., Trauner J., Van der Zee M., Vervoort M., Wittkopp N., Wimmer E. A., Yang X., Jones A. K., Sattelle D. B., Ebert P. R., Nelson D., Scott J. G., Muthukrishnan S., Kramer K. J., Arakane Y., Zhu Q., Hogenkamp D., Dixit R., Jiang H., Zou Z., Marshall J., Elpidina E., Vinokurov K., Oppert C., Evans J., Lu Z., Zhao P., Sumathipala N., Altincicek B., Vilcinskas A., Williams M., Hultmark D., Hetru C., Hauser F., Cazzamali G., Williamson M., Li B., Tanaka Y., Predel R., Neupert S., Schachtner J., Verleyen P., Raible F., Walden K. K. O., Angeli S., Foret S., Schuetz S., Maleszka R., Miller S. C., and Grossmann D.
    Keywords Tribolium, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.
    4.

    Elhaik, E., Graur, D., and Josić, K. 2008. 'Genome order index' should not be used for defining compositional constraints in nucleotide sequences. Comp. Biol. Chem., 32: 147

    More...
    Abstract The isochore theory depicts the genomes of warm-blooded vertebrates as a mosaic of long genomic regions that are characterized by relatively homogeneous GC content. In the absence of genomic data, the GC content at third-codon positions of protein-coding genes (GC3) was commonly used as a proxy for the GC content of isochores. Oddly, in the postgenomic era, GC3 is still sometimes used as a proxy for the GC composition of isochores. Here, we use genic and genomic sequences from human, chimpanzee, cow, mouse, rat, chicken, and zebrafish to show that GC3 only explains a very small proportion of the variation in GC content of long genomic sequences flanking the genes (GCf), and what little correlation there is between GC3 and GCf was found to decay rapidly with distance from the gene. The coefficient of variation of GC3 was found to be much larger than that of GCf and, therefore, GC3 and GCf values are not comparable with each other. Comparisons of orthologous gene pairs from 1) human and chimpanzee and 2) mouse and rat show strong correlations between their GC3 values, but very weak correlations between their GCf values. We conclude that the GC content of third-codon position cannot be used as stand-in for isochoric composition
    Authors Eran Elhaik, Dan Graur, and Kresimir Josic.
    Keywords Nucleotide composition; Genomic G+C content; Shannon H-function; Genome order index; Isochores; Z-curve


    Springtime at Giverny, Claude Monet (1886)

    2006

    3.

    Weinstock, G. M., Robinson, G. E., Gibbs, R. A., Worley, K. C., Evans, J. D., Maleszka, R., Robertson, H. M., Weaver, D. B., Beye, M., Bork, P., Elsik, C. G., Hartfelder, K., Hunt, G. J., Zdobnov, E. M., Amdam, G. V., Bitondi, M. M. G., Collins, A. M., Cristino, A. S., Lattorff, H. M. G., Lobo, C. H., Moritz, R. F. A., Nunes, F. M. F., Page Jr., R. E., Simoes, Z. L. P., Wheeler, Diana, Carninci, P., Fukuda, S., Hayashizaki, Y., Kai, C., Kawai, J., Sakazume, Sasaki, D., Tagami, M., Albert, S., Baggerman, G., Beggs, K. T., Bloch, G., Cazzamali, G., Cohen, M., Drapeau, M. D., Eisenhardt, D., Emore, C., Ewing, M. A., Fahrbach, S. E., Foret, S., Grimmelikhuijzen, C. J. P., Hauser, F., Hummon, A. B., Huybrechts, J., Jones, A. K., Kadowaki, T., Kaplan, N., Kucharski, R., Leboulle, G., Linial, M., Littleton, J. T., Mercer, A. R., Richmond, T. A., Rodriguez-Zas, S. L., Rubin, E. B., Sattelle, D. B., Schlipalius, D., Schoofs, L., Shemesh, Y., Sweedler, J. V., Velarde, R., Verleyen, P., Vierstraete, E., Williamson, M. R., Ament, S. A., Brown, S. J., Corona, M., Dearden, P. K., Dunn, W. A., Elekonich, M. M., Fujiyuki, T., Gattermeier, I., Gempe, T., Hasselmann, M., Kage, E., Kamikouchi, A., Kubo, T., Kunieda, T., Lorenzen, M., Milshina, N. V., Morioka, M., Ohashi, K., Overbeek, R., Ross, C. A., Schioett, M., Shippy, T., Takeuchi, H., Toth, A. L., Willis, J. H., Wilson, M. J., Gordon, K. H. J., Letunic, I., Hackett, K., Peterson, J., Felsenfeld, A., Guyer, M., Solignac, M., Agarwala, R., Cornuet, J. M., Monnerot, M., Mougel, F., Reese, J. T., Vautrin, D., Gillespie, J. J., Cannone, J. J., Gutell, R. R., Johnston, J. S., Eisen, M. B., Iyer, V. N., Iyer, V., Kosarev, P., Mackey, A. J., Solovyev, V., Souvorov, A., Aronstein, K. A., Bilikova, K., Chen, Y. P., Clark, A. G., Decanini, L. I., Gelbart, W. M., Hetru, C., Hultmark, D., Imler, J.-L., Jiang, H., Kanost, M., Kimura, K., Lazzaro, B. P., Lopez, D. L., Simuth, J., Thompson, G. J., Zou, Z., de Jong, P., Sodergren, E., Csuros, M., Milosavljevic, A., Osoegawa, K., Richards, S., Shu, C.-L., Duret, L., Elhaik, E. et al. 2006. Insights into social insects from the genome of the honeybee Apis mellifera. Nature, 443: 931-949.

    More...
    Abstract Here we report the genome sequence of the honeybee Apis mellifera, a key model for social behaviour and essential to global ecology through pollination. Compared with other sequenced insect genomes, the A. mellifera genome has high A1T and CpG contents, lacks major transposon families, evolves more slowly, and is more similar to vertebrates for circadian rhythm, RNA interference andDNAmethylation genes, among others. Furthermore, A. mellifera has fewer genes for innate immunity, detoxification enzymes, cuticle-forming proteins and gustatory receptors, more genes for odorant receptors, and novel genes for nectar and pollen utilization, consistent with its ecology and social organization. Compared to Drosophila, genes in early developmental pathways differ in Apis, whereas similarities exist for functions that differ markedly, such as sex determination, brain function and behaviour. Population genetics suggests a novel African origin for the species A. mellifera and insights into whether Africanized bees spread throughout the New World via hybridization or displacement.
    Authors Weinstock, G. M., Robinson, G. E., Gibbs, R. A., Worley, K. C., Evans, J. D., Maleszka, R., Robertson, H. M., Weaver, D. B., Beye, M., Bork, P., Elsik, C. G., Hartfelder, K., Hunt, G. J., Zdobnov, E. M., Amdam, G. V., Bitondi, M. M. G., Collins, A. M., Cristino, A. S., Lattorff, H. M. G., Lobo, C. H., Moritz, R. F. A., Nunes, F. M. F., Page Jr., R. E., Simões, Z. L. P., Wheeler, Diana, Carninci, P., Fukuda, S., Hayashizaki, Y., Kai, C., Kawai, J., Sakazume, Sasaki, D., Tagami, M., Albert, S., Baggerman, G., Beggs, K. T., Bloch, G., Cazzamali, G., Cohen, M., Drapeau, M. D., Eisenhardt, D., Emore, C., Ewing, M. A., Fahrbach, S. E., Forêt, S., Grimmelikhuijzen, C. J. P., Hauser, F., Hummon, A. B., Huybrechts, J., Jones, A. K., Kadowaki, T., Kaplan, N., Kucharski, R., Leboulle, G., Linial, M., Littleton, J. T., Mercer, A. R., Richmond, T. A., Rodriguez-Zas, S. L., Rubin, E. B., Sattelle, D. B., Schlipalius, D., Schoofs, L., Shemesh, Y., Sweedler, J. V., Velarde, R., Verleyen, P., Vierstraete, E., Williamson, M. R., Ament, S. A., Brown, S. J., Corona, M., Dearden, P. K., Dunn, W. A., Elekonich, M. M., Fujiyuki, T., Gattermeier, I., Gempe, T., Hasselmann, M., Kage, E., Kamikouchi, A., Kubo, T., Kunieda, T., Lorenzen, M., Milshina, N. V., Morioka, M., Ohashi, K., Overbeek, R., Ross, C. A., Schioett, M., Shippy, T., Takeuchi, H., Toth, A. L., Willis, J. H., Wilson, M. J., Gordon, K. H. J., Letunic, I., Hackett, K., Peterson, J., Felsenfeld, A., Guyer, M., Solignac, M., Agarwala, R., Cornuet, J. M., Monnerot, M., Mougel, F., Reese, J. T., Vautrin, D., Gillespie, J. J., Cannone, J. J., Gutell, R. R., Johnston, J. S., Eisen, M. B., Iyer, V. N., Iyer, V., Kosarev, P., Mackey, A. J., Solovyev, V., Souvorov, A., Aronstein, K. A., Bilikova, K., Chen, Y. P., Clark, A. G., Decanini, L. I., Gelbart, W. M., Hetru, C., Hultmark, D., Imler, J.-L., Jiang, H., Kanost, M., Kimura, K., Lazzaro, B. P., Lopez, D. L., Simuth, J., Thompson, G. J., Zou, Z., de Jong, P., Sodergren, E., Csuros, M., Milosavljevic, A., Osoegawa, K., Richards, S., Shu, C.-L., Duret, L., Elhaik, E., Graur, D., Anzola, J. M., Campbell, K. S., Childs, K. L., Collinge, D., Crosby, M. A., Dickens, C. M., Grametes, L. S., Grozinger, C. M., Jones, P. L., Jorda, M., Ling, X., Matthews, B. B., Miller, J., Mizzen, C., Peinado, M. A., Reid, J. G., Russo, S. M., Schroeder, A. J., St Pierre, S. E., Wang, Y. Zhou, P., Kitts, P., Ruef, B., Venkatraman, A., Zhang, L., Aquino-Perez, G., Whitfield, C. W., Behura, S. K., Berlocher, S. H., Sheppard, W. S., Smith, D. R., Suarez, A. V., Tsutsui, N. D., Wei, X., Wheeler, David, Havlak, P., Li, B., Liu, Y., Jolivet, A., Lee, S., Nazareth, L. V., Pu, L.-L., Thorn, R., Stolc, V., Newman, T., Samanta, M., Tongprasit, W. A., Claudianos, C., Berenbaum, M. R., Biswas, S., de Graaf, D. C., Feyereisen, R., Johnson, R. M., Oakeshott, J. G., Ranson, H., Schuler, M. A., Muzny, D., Chacko, J., Davis, C., Dinh, H., Gill, R., Hernandez, J., Hines, S., Hume, J., Jackson, L., Kovar, C., Lewis, L., Miner, G., Morgan, M., Nguyen, N., Okwuonu, G., Paul, H., Santibanez, J., Savery, G., Svatek, A., Villasana, D., and Wright, R.
    Keywords Bee, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.
    2.

    Sodergren, E., Weinstock, G. M., Davidson, E. H., Cameron, R. A., Gibbs, R. A., Angerer, R. C., Angerer, L. M., Arnone, M. I., Burgess, D. R., Burke, R. D., Coffman, J. A., Dean, M., Elphick, M. R., Ettensohn, C. A., Foltz, K. R., Hamdoun, A., Hynes, R. O., Klein, W. H., Marzluff, W., McClay, D. R., Morris, R. L., Mushegian, A., Rast, J. P., Smith, L. C., Thorndyke, M. C., Vacquier, V. D., Wessel, G. M., Wray, G., Zhang, L., Elsik, C. G., Ermolaeva, O., Hlavina, W., Hofmann, G., Kitts, P., Landrum, M. J., Mackey, A. J., Maglott, D., Panopoulou, G., Poustka, A. J., Pruitt, K., Sapojnikov, V., Song, X., Souvorov, A., Solovyev, V., Wei, Z., Whittaker, C. A., Worley, K., Durbin, K. J., Shen, Y., Fedrigo, O., Garfield, D., Haygood, R., Primus, A., Satija, R., Severson, T., Gonzalez-Garay, M. L., Jackson, A., R., Milosavljevic, A., Tong, M., Killian, C. E., Livingston, B. T., Wilt, F. H., Adams, N., Belle, R., Carbonneau, S., Cheung, R., Cormier, P., Cosson, B., Croce, J., Fernandez-Guerra, A., Geneviere, A.-M., Goel, M., Kelkar, H., Morales, J., Mulner-Lorillon, O., Robertson, A. J., Goldstone, J. V., Cole, B., Epel, D., Gold, B., Hahn, M. E., Howard-Ashby, M., Scally, M., Stegeman, J. J., Allgood, E. L., Cool, J., Judkins, K. M., McCafferty, S. S., Musante, A. M., Obar, R. A., Rawson, A. P., Rossetti, B. J., Gibbons, I. R., Hoffman, M. P., Leone, A., Istrail, S., Materna, S. C., Samanta, M. P., Stolc, V., Tongprasit, W., Tu, Q., Bergeron, K.-F., Brandhorst, B. P., Whittle, J., Berney, K., Bottjer, D. J., Calestani, C., Peterson, K., Chow, E., Yuan, Q. A., Elhaik, E. et al. 2006. The genome of the sea urchin Strongylocentrotus purpuratus. Science, 314: 941-952.

    More...
    Abstract We report the sequence and analysis of the 814-megabase genome of the sea urchin Strongylocentrotus purpuratus, a model for developmental and systems biology. The sequencing strategy combined whole-genome shotgun and bacterial artificial chromosome (BAC) sequences. This use of BAC clones, aided by a pooling strategy, overcame difficulties associated with high heterozygosity of the genome. The genome encodes about 23,300 genes, including many previously thought to be vertebrate innovations or known only outside the deuterostomes. This echinoderm genome provides an evolutionary outgroup for the chordates and yields insights into the evolution of deuterostomes.
    Authors Sodergren, E., Weinstock, G. M., Davidson, E. H., Cameron, R. A., Gibbs, R. A., Angerer, R. C., Angerer, L. M., Arnone, M. I., Burgess, D. R., Burke, R. D., Coffman, J. A., Dean, M., Elphick, M. R., Ettensohn, C. A., Foltz, K. R., Hamdoun, A., Hynes, R. O., Klein, W. H., Marzluff, W., McClay, D. R., Morris, R. L., Mushegian, A., Rast, J. P., Smith, L. C., Thorndyke, M. C., Vacquier, V. D., Wessel, G. M., Wray, G., Zhang, L., Elsik, C. G., Ermolaeva, O., Hlavina, W., Hofmann, G., Kitts, P., Landrum, M. J., Mackey, A. J., Maglott, D., Panopoulou, G., Poustka, A. J., Pruitt, K., Sapojnikov, V., Song, X., Souvorov, A., Solovyev, V., Wei, Z., Whittaker, C. A., Worley, K., Durbin, K. J., Shen, Y., Fedrigo, O., Garfield, D., Haygood, R., Primus, A., Satija, R., Severson, T., Gonzalez-Garay, M. L., Jackson, A., R., Milosavljevic, A., Tong, M., Killian, C. E., Livingston, B. T., Wilt, F. H., Adams, N., Bellé, R., Carbonneau, S., Cheung, R., Cormier, P., Cosson, B., Croce, J., Fernandez-Guerra, A., Genevière, A.-M., Goel, M., Kelkar, H., Morales, J., Mulner-Lorillon, O., Robertson, A. J., Goldstone, J. V., Cole, B., Epel, D., Gold, B., Hahn, M. E., Howard-Ashby, M., Scally, M., Stegeman, J. J., Allgood, E. L., Cool, J., Judkins, K. M., McCafferty, S. S., Musante, A. M., Obar, R. A., Rawson, A. P., Rossetti, B. J., Gibbons, I. R., Hoffman, M. P., Leone, A., Istrail, S., Materna, S. C., Samanta, M. P., Stolc, V., Tongprasit, W., Tu, Q., Bergeron, K.-F., Brandhorst, B. P., Whittle, J., Berney, K., Bottjer, D. J., Calestani, C., Peterson, K., Chow, E., Yuan, Q. A., Elhaik, E., Graur, D., Reese, J. T., Bosdet, I., Heesun, S., Marra, M. A., Schein, J., Anderson, M. K., Brockton, V., Buckley, K. M., Cohen, A. H., Fugmann, S. D., Hibino, T., Loza-Coll, M., Majeske, A. J., Messier, C., Nair, S. V., Pancer, Z., Terwilliger, D. P., Agca, C., Arboleda, E., Chen, N., Churcher, A. M., Hallböök, F., Humphrey, G. W., Idris, M. M., Kiyama, T., Liang, S., Mellott, D., Mu, X., Murray, G., Olinski, R. P., Raible, F., Rowe, M., Taylor, J. S., Tessmar-Raible, K., Wang, D., Wilson, K. H., Yaguchi, S., Gaasterland, T., Galindo, B. E., Gunaratne, H. J., Juliano, C., Kinukawa, M., Moy, G. W., Neill, A. T., Nomura, M., Raisch, M., Reade, A., Roux, M. M., Song, J. L., Su, Y.-H., Townley, I. K., Voronina, E., Wong, J. L., Amore, G., Branno, M., Brown, E. R., Cavalieri, V., Duboc, V., Duloquin, L., Flytzanis, C., Gache, C., Lapraz, F., Lepage, T., Locascio, A., Martinez, P., Matassi, G., Matranga, V., Range, R., Rizzo, F., Röttinger, E., Beane, W., Bradham, C., Byrum, C., Glenn, T., Hussain, S., Manning, G., Miranda, E., Thomason, R., Walton, K., Wikramanayke, A., Wu, S.-Y., Xu, R., Brown, C. T., Chen, L., Gray, R. F., Lee, P. Y., Nam, J., Oliveri, P., Smith, J., Muzny, D., Bell, S., Chacko, J., Cree, A., Curry, S., Davis, C., Dinh, H., Dugan-Rocha, S., Fowler, J., Gill, R., Hamilton, C., Hernandez, J., Hines, S., Hume, J., Jackson, L., Jolivet, A., Kovar, C., Lee, S., Lewis, L., Miner, G., Morgan, M., Nazareth, L. V., Okwuonu, G., Parker, D., Pu, L.-L., Thorn, R., and Wright, R.
    Keywords Sea urchin, Isochores, GC content, Segmentation algorithms, Jensen-Shannon divergence statistic, Entropy, Genome composition.
    1.

    Elhaik, E., Sabath, N., and Graur, D. 2006. The "inverse relationship between evolutionary rate and age of mammalian genes" is an artifact of increased genetic distance with rate of evolution and time of divergence. Mol. Biol. Evol., 23: 1-3.

    More...
    Abstract It has recently been claimed that older genes tend to evolve more slowly than newer ones (Alba and Castresana 2005). By simulation of genes of equal age, we show that the inverse correlation between age and rate is an artifact caused by our inability to detect homology when evolutionary distances are large. Since evolutionary distance increases with time of divergence and rate of evolution, homologs of fast-evolving genes are frequently undetected in distantly related taxa and are, hence, misclassified as "new." This misclassification causes the mean genetic distance of'new'genes to be overestimated and the mean genetic distance of "old" genes to be underestimated.
    Authors Eran Elhaik, Niv Sabath, and Dan Graur.
    Keywords Nonsynonymous substitutions; Novel genes; Divergence times.

     

    Book Chapters

    Elhaik, E. and Tatarinova, T 2012. GC3 Biology in Eukaryotes and Prokaryotes. DNA Methylation-From Genomics to Technology Free e-Book
    More...
    Chapter Introduction In this chapter we describe the distribution of Guanine and Cytosine (GC) content in the third codon position (GC3) distributions in different species, analyze evolutionary trends and discuss differences between genes and organisms with distinct GC3 levels. We scrutinize previously published theoretical frameworks and construct a unified view of GC3 biology in eukaryotes and prokaryotes.
    Book Abstract Epigenetics is one of the most exciting and rapidly developing areas of modern genetics with applications in many disciplines from medicine to agriculture. The most common form of epigenetic modification is DNA methylation, which plays a key role in fundamental developmental processes such as embryogenesis and also in the response of organisms to a wide range of environmental stimuli. Indeed, epigenetics is increasing regarded as one of the major mechanisms used by animals and plants to modulate their genome and its expression to adapt to a wide range of environmental factors. This book brings together a group of experts at the cutting edge of research into DNA methylation and highlights recent advances in methodology and knowledge of underlying mechanisms of this most important of genetic processes. The reader will gain an understanding of the impact, significance and recent advances within the field of epigenetics with a focus on DNA methylation.
    Keywords GC3, Methylation,
    Authors Eran Elhaik and Tatiana Tatarinova

     

    Opinions

    Third-codon position - genomics magic eight-ball

     

    Abstracts

    Elhaik, E. Pirooznia M., Goes F.S., Parla J., Karchin R., Chakravarti A., Zandi P.P., McCombie R.W., and Potash J.B. 2012. Whole-exome sequencing study of four families with bipolar disorder. The 62th Annual Meeting of American Society of Human Genetics Program #2305T
    More...
    Abstract Background: Bipolar disorder (BP) is a common mental disorder often associated with lifelong disability and premature mortality. We are conducting a whole-exome study of BP using next generation sequencing to examine the whole exomes of up to 100 multiplex BP families (with at least 6 individuals from 2-3 generations of each family), 1,800 BP cases, and 1,800 controls with the goal of identifying rare and common genetic variants associated with the disease. Methods: Exome sequencing was performed in a pilot sample of 22 individuals from four multiplex BP families using solution-based capture and paired-end sequencing on the Illumina GA II. Alignment and variant calling were performed with BWA, SAM tools, and GATK. SNVs were annotated with the SIFT and PolyPhen tools. Families were analyzed separately for the segregation of functionally relevant variants with disease. Results: We identified a single common (MAF=0.2) deleterious splice site variant (rs8373) in a zinc-finger protein gene (ZFP91) that segregated with all affected relatives and none of the unaffected married-in relatives in all four families. A family-based test indicated the variant was significantly associated with BP in these families (p=0.0026). ZFP91 is involved in the non-canonical nuclear factor kB (NF-kB) signaling pathway, which regulates the canonical NF-kB pathway. The non- canonical pathway is associated with adaptive immunity and protection against inflammation and apoptosis. Conclusions: Our initial analysis of four multiplex families with bipolar disorder revealed a common splice-site polymorphism in ZFP91 that segregates with disease in all pedigrees. Mutations inhibiting the non-canonical NF-kB, such as the one identified here, have been shown to induce apoptosis and inflammation due to the continuous activation of the complementary pathway. This variant was imputed in the Psychiatric GWAS Consortium (PGC) mega-analysis of Bipolar Disorder, but it was not significantly associated with illness. However, if the current finding can be replicated in other sequenced families, it may provide evidence of a potential inflammatory etiology in bipolar disorder. Such replication efforts are ongoing.
    Keywords Complex Traits, Polygenic Disorders, brain/nervous system, candidate gene, Nf-kb
    Authors Eran Elhaik, Pirooznia M., Goes F.S., Parla J., Karchin R., Chakravarti A., Zandi P.P., McCombie R.W., and Potash J.B

    Wells S., Greenspan E., Staats S., Krahn T., Tyler-Smith C., Xue Y., Tofanelli S., Francalacci P., Cucca F., Pagani L.,Jin L., Li H., Schurr T.G., Gaieski J.B., Melendez C., Vilar M.G., Owings A.C., Gomez R., Fujita R., Santos F., Comas D., Balanovsky O., Balanovska E., Zalloua P., Soodyall H., Pitchappan R., Kumar G.A., Hammer M.F., Greenspan B., and Elhaik, E. 2012. The GenoChip: a new tool for genetic anthropology. The 62th Annual Meeting of American Society of Human Genetics Program #3377W
    More...
    Abstract Background: The Genographic Project is an international effort aimed at charting human history using genetic data. The project is non-profit and non-medical, and through the sale of its public participation kits it supports cultural preservation efforts in indigenous and traditional communities. To extend our knowledge of the human journey, interbreeding with ancient hominins, and modern human demographic history, we designed a genotyping chip optimized for genetic anthropology research. Methods: Our goal was to design, produce, and validate a SNP array dedicated to genetic anthropology. The GenoChip is an Illumina HD iSelect genotyping bead array with over 130,000 highly informative autosomal and X-chromosomal SNPs ascertained from over 450 worldwide populations, ~13,000 Y-chromosomal SNPs, and ~3,000 mtDNA SNPs. To determine the extent of gene flow from archaic hominins to modern humans, we included over 25,000 SNPs from candidate regions of interbreeding between extinct hominins (Neanderthal and Denisovan) and modern humans. To avoid any inadvertent medical testing we filtered out all SNPs that have known or suspected health or functional associations. We validated the chip by genotyping over 1,000 samples from 1000 Genomes, Family Tree DNA, and Genographic Project populations. Results: The concordance between the GenoChip and the 1000 Genomes data was over 99.5%. The GenoChip has a SNP density of approximately (1/100,000) bases over 92% of the human genome and is highly compatible with Illumina and Affymetrix commercial platforms. The ~10,000 novel Y SNPs included on the chip have greatly refined our understanding of the Y-chromosome phylogenetic tree. By including Y and mtDNA SNPs on an unprecedented scale, the GenoChip is able to delineate extremely detailed human migratory paths. The autosomal and X-chromosomal markers included on the GenoChip have revealed novel patterns of ancestry that shed a detailed new light on human history. Interbreeding analysis with extinct hominids confirmed some previous reports and allowed us to describe the modern geographical distribution of these markers in detail. Conclusions: The GenoChip is the first genotyping chip completely dedicated to genetic anthropology with no known medically relevant markers. We anticipate that the large-scale application of the GenoChip using the Genographic Project's diverse sample collection will provide new insights into genetic anthropology and human history.
    Keywords Evolutionary and Population Genetics, population genetics, population structure, SNP analysis/discovery, genomic methodologies, microarrays
    Authors Wells S., Greenspan E., Staats S., Krahn T., Tyler-Smith C., Xue Y., Tofanelli S., Francalacci P., Cucca F., Pagani L.,Jin L., Li H., Schurr T.G., Gaieski J.B., Melendez C., Vilar M.G., Owings A.C., Gomez R., Fujita R., Santos F., Comas D., Balanovsky O., Balanovska E., Zalloua P., Soodyall H., Pitchappan R., Kumar G.A., Hammer M.F., Greenspan B., and Elhaik, E

    Gaieski J.B., Elhaik E., Owings A.C., Vilar M.G., Walia A.T., Gaieski D.F., Wells R.S., Schurr T.G., and The Genographic Consortium. 2012. Genetic ancestry and admixture analysis in a Bermudian population reveals evidence of Native American origins consistent with oral histories and genealogies. The 62th Annual Meeting of American Society of Human Genetics Program #3329W
    More...
    Abstract Background: Shortly after its colonization in the early 17th century, Bermuda became the first English speaking dependency to forcibly import its labor by trafficking in enslaved Africans, European ethnic minorities, and indigenous Americans. Unlike the many ethnic groups that now call the island home, Bermuda's St. David's Islanders claim to be linked to Native American ancestors. In particular, their use of oral traditions and complex genealogies helps to reinforce their Native American identity. To elucidate the influence of historical events on genetic ancestry and native cultural identity among St. David's Islanders, we examined mtDNA and Ychromosomal variation in over 100 individuals. We found that the majority of their mtDNA and Y-chromosome haplotypes (greater than 98%) were African and West Eurasian in origin. However, due to the limitations of this approach in reconstructing the genetic history of admixed populations, and because most participants were interested in learning more about their genetic genealogies, we expanded our analysis to include autosomal markers using a novel genotyping platform. Methods: To identify genetic contributions of putative indigenous American ancestors among the St. David's Islanders, we used the GenoChip to genotype Bermudians along with 200 samples from ~20 worldwide populations. Developed by Genographic Project scientists, the GenoChip is a SNP array ascertained from over 450 worldwide populations, and is dedicated to enhancing our knowledge of genetic anthropology. Results: Principal component analysis of the autosomal SNP data separated our participants into three discrete clusters. An admixture analysis identified up to 9% ancestry associated with Native Americans overall. The two largest clusters overlapped with African Americans and Puerto Ricans, and distributed evenly amongst the two main clusters (mean of 3% each). Samples from the third cluster averaged an unusually high Native American ancestry (mean of 6%). Conclusions: The GenoChip enabled us to detect otherwise elusory Native American ancestry among the Bermudians of St. David's Island. We speculate that the uneven distribution of this ancestry is due to admixture of Africans, Europeans, and Native Americans in varying degrees in the different source populations for modern-day St. David's Islanders. Application of this novel genotyping platform has provided new insights into the complex history of the Bermudian population.
    Keywords Evolutionary and Population Genetics, genetic diversity, SNP analysis/discovery, ethical, legal and social issues, genome sequencing, genomic methodologies
    Authors J.B. Gaieski, E. Elhaik, A.C. Owings, M.G. Vilar, A.T. Walia, D.F. Gaieski, R.S. Wells, T.G. Schurr, The Genographic Consortium

    Elhaik, E. and Chakravarti, A 2010. Empirical distributions of FST from large-scale polymorphism data. The 60th Annual Meeting of American Society of Human Genetics Program #1521
    More...
    Abstract Apportionment of human genetic variation has long established that most human variation is within groups and that the additional variation between groups is small but greatest when comparing continental populations. These studies have used Wright's FST that apportions the standardized variance in allele frequencies within and between groups in a hierarchical manner. High values of FST are unlikely in humans due to genetic drift and migration and are consequently used to identify genes undergoing directional or heterotic selection. The availability of the HapMap data from phases I - III now allows us to reexamine these questions. We analyzed data on ~3 million autosomal, X-linked, Y-linked, and mitochondrial SNPs from the HapMap database on 602 samples from 8 populations and a common subset of ~1 million autosomal and X-linked SNPs that have been genotyped in all populations. We identified two major features of the data. First, only a paucity (12%) of the total genetic variation is among populations of different continents and even a lesser (1%) amount among populations of the same continent. These data are remarkably consistent with the early observations of Lewontin in 1972. Second, we demonstrate that, although the overall distribution is similarly shaped (inverse J), the distribution of FST varies significantly by mean allele frequency. Since the mean allele frequency is a crude indicator of allele age, these distributions mark the time-dependent change in genetic differentiation. The change in mean FST of these distributions is linear in mean allele frequency suggesting the nature of allele frequency dynamics. These observations are true for autosomal, X-linked, and mitochondrial SNPs, but not Y-linked SNPs. These results suggest that investigating the extremes of the FST distribution for each allele frequency class may be more efficient for detection of selection. Consequently, we demonstrate that such extreme SNPs are more clustered that that expected from linkage disequilibrium for each allele frequency class. These genomic regions are likely candidates for natural selection
    Keywords Wright's FST, human genetic variation, HapMap 3
    Authors Eran Elhaik and Aravinda Chakravarti

    Berlinger, M.J., Lebiush-Mordechi, S., Fridja, D., Khasdan, V., Elhaik, E., and Rodman R. 1997. Potato Tuber Moth Parasites in Potato and Processing Tomato Fields: Preliminary Results. The 10th Conference of the Entomological Society of Israel (Abstracts), p. 154.

    More...
    Abstract The potato tuberworm, Phthorimaea operculella (Zeller), is a major pest of potato in Israel. To control the larvae that bore into the exposed tubers, commercial fields are usually treated with insecticides, irrespective of pest density. It is therefore likely that there is excessive use of harmful chemicals against the pest. The ultimate goal of this research project is to develop an integrated pest control program for the potato tuberworm in potatoes and to reduce the use of insecticides. During the last 2 years the pest was studied in two agricultural systems in the western Negev in Israel: cv. 'Kara' in sandy soil and cv. Desiree in loessial soil. We examined (i) the pest's phenology; (ii) the infestation level in the tubers; (iii) the pest distribution pattern in the field; and (iv) the larval parasitism rate. Additionally, we tested the importance of roller treatment and volunteer host plants on infestation levels. Catches of adult moths peaked consistently at the end of April. Larval infestation in the foliage and in tubers was significantly higher in the edge rows of the field than in its center. Infestation rates in the exposed (green) tubers were generally higher than in the unexposed (white) tubers. Finally, the presence of nearby volunteer potato plants and the timing of roller treatments had a major influence on pest populations. (P)
    Keywords Potatoe Tuber Moth; Phthorimaea operculella; Biological control
    Authors Menachem J. Berlinger , Sara Lebiush-Mordechi, Dvora Fridja, Vadim Khasdan, Eran Elhaik, and Rafi Rodman.

     

    Posters:

    Elhaik, E. and Desmond M. Ryan. A novel precision medicine approach to improve the outcome of clinical trials.
    Insigneo Showcase 2017, Sheffield, UK.


    Elhaik, E. and Desmond M. Ryan. A novel precision medicine approach to improve the outcome of clinical trials.
    HPC@Sheffield 2017, Sheffield, UK.


    Elhaik, E. and Desmond M. Ryan. A novel precision medicine approach to improve the outcome of clinical trials.
    Festival of Genomics 2017, London, UK.


    Elhaik, E., Das, R., Pirooznia, M., and Wexler, P. Localizing Ashkenazic Jews to primeval villages in the ancient Iranian lands of Ashkenaz.
    SMBE 2016, Queensland, Australia.


    Isiaq, A.J. Elhaik, E., and Chakravarti, A. Size Matters - Examining Mutation Enrichment In Relation To Gene Size. 2011.
    SIP program, Johns Hopkins University, MD.


    Elhaik, E. European and Asian Jews are proto-Khazars in origin. 2011.
    Johns Hopkins 3rd Annual Postdoc Symposium, Baltimore, MD.


    Elhaik, E. Empirical distributions of FST from large-scale polymorphism data. 2010.
    The 60th Annual Meeting of the American Society of Human Genetics (ASHG), Washington, DC.


    McCoy, E. Elhaik, E., and Chakravarti, A. The Extent of Genetic Variation in Human Genes. 2010.
    SIP program, Johns Hopkins University, MD.


    Sabath, N. Elhaik, E., and Graur D. Absence of similarity does not equal absence of homology: On the purported relationship between evolutionary rate and gene age. 2007.
    The Annual Meeting of the Society for Molecular Biology and Evolution, Dalhousie University, Halifax, Nova Scotia, Canada.


    Sabath, N. Elhaik, E., and Graur D. Is there a relationship between evolutionary rate and age of genes? 2007.
    Texas Genetics Society conference, San Antonio, Texas.


    Elhaik, E. and Graur D. Compositional heterogeneity and GC-content domains in animal genomes. 2007.
    Texas Genetics Society conference, San Antonio, Texas.


    Elhaik, E., Graur D., and Josic K. Nucleotides homogeneity within eukaryotes genomes: A comparison of three methods. 2006.
    SMBE conference, Arizona State University, Tempe, Arizona.


    Elhaik, E., Graur D.,and Josic K. An improved Haar wavelet analysis of the human genome. 2006.
    Sigma Xi Research Day, University of Houston, Houston, TX.


     

    Popular Science

    Solving the mystery of the Druze - a 2,000-year-old odyssey. AEON. 2016.

    Solving the 1,000-year-old mystery of Druze origin with a genetic sat nav. The Conversation. 2016.

    Uncovering ancient Ashkenaz - the birthplace of Yiddish speakers. The Conversation. 2016.

    How DNA traced the Ashkenazic Jews to northeastern Turkey. AEON. 2016.

    Uncovering ancient Ashkenaz - the birthplace of Yiddish speakers. Atlas of science. 2016.