|
Evolutionarily conserved elements in vertebrate, insect, worm and yeast genomes
We have conducted a comprehensive search for conserved elements in vertebrate genomes, using genome-wide multiple alignments of five vertebrate species (human, mouse, rat, chicken, and Fugu rubripes). Parallel searches have been performed with multiple alignments of four insect species (three species of Drosophila and Anopheles gambiae), two species of Caenorhabditis, and seven species of Saccharomyces. Conserved elements were identified with a computer program called phastCons, which is based on a two-state phylogenetic hidden Markov model (phylo-HMM). PhastCons works by fitting a phylo-HMM to the data by maximum likelihood, subject to constraints designed to calibrate the model across species groups, and then predicting conserved elements based on this model. The predicted elements cover roughly 3%–8% of the human genome (depending on the details of the calibration procedure) and substantially higher fractions of the more compact Drosophila melanogaster (37%–53%), Caenorhabditis elegans (18%–37%), and Saccharaomyces cerevisiae (47%–68%) genomes. From yeasts to vertebrates, in order of increasing genome size and general biological complexity, increasing fractions of conserved bases are found to lie outside of the exons of known protein-coding genes. In all groups, the most highly conserved elements (HCEs), by log-odds score, are hundreds or thousands of bases long. These elements share certain properties with ultraconserved elements, but they tend to be longer and less perfectly conserved, and they overlap genes of somewhat different functional categories. In vertebrates, HCEs are associated with the 3 UTRs of regulatory genes, stable gene deserts, and megabase-sized regions rich in moderately conserved noncoding sequences. Noncoding HCEs also show strong statistical evidence of an enrichment for RNA secondary structure.
这篇文章将会在很大程度上改变人们以往对真核基因组中所谓的“junk”序列(或者是desert)的非编码的基因间序列的认识,将会引领人们对基因间非编码区序列的研究和探索。由加州大学圣塔克鲁斯分校等单位的科学家通过对酵母、线虫、果蝇以及五种哺乳动物的全基因组的multiple alignment扫描统计分析,发现在基因间的所谓“junk”序列其实存在大量的以前不被人们所知的高度保守元件(highly conserved elements, HCEs),非编码区的这些HCEs经过统计分析发现富含RNA二级结构等,有些可能参与基因的调控等,具体还有待进一步的研究。他们为了方便基因组间的非编码序列的研究还专门开发了基于隐马可夫的进化分析的phastCons软件。
上一篇:Efficient transposition of the piggyBac(PB) transposon in mammalian cells and mi 下一篇:A Human Protein-Protein Interaction Network: A Resource for Annotating the Prote
|