Eight Clusters, Synchrony of Evolution and Unique Symmetry \\in Chloroplast Genomes: The Offering from Triplets
View/ Open:
Author:
Садовский, Михаил Георгиевич
Сенашова, Мария Юрьевна
Путинцева, Юлия Андреевна
Corporate Contributor:
Институт фундаментальной биологии и биотехнологии
Базовая кафедра защиты и современных технологии мониторинга лесов
Date:
2018-07Journal Name:
Chlorolplasts and evolution. Structure and functionBibliographic Citation:
Садовский, Михаил Георгиевич. Eight Clusters, Synchrony of Evolution and Unique Symmetry \\in Chloroplast Genomes: The Offering from Triplets [Текст] / Михаил Георгиевич Садовский, Мария Юрьевна Сенашова, Юлия Андреевна Путинцева // Chlorolplasts and evolution. Structure and function. — 2018. — С. 25-97Текст статьи не публикуется в открытом доступе в соответствии с политикой журнала.
Abstract:
We studied the features and characters of various chloroplast genomes that could be retrieved solely from the analysis of triplet composition. To do that, two types of triplet dictionaries were developed: the former lists all the triplets (with overlapping), so that each nucleotide yields a start for a triplet, and the latter is the entity where triplets do not overlap, but also have no gaps between them. Two main cores were studied: the former is the structuredness of a genome that manifests in the statistical properties of small fragments of the genome, each of them converted into a triplet frequency dictionary, and the latter is the relation between the triplet frequencies of a genome, and their phylogeny, when determined over a significant ensemble of genomes. It was found that the great majority of chloroplast genomes exhibit a specific eight-cluster pattern comprising these fragments (converted into triplet frequency dictionaries). The first cluster corresponds to junk fragments, and six more clusters correspond to the fragments corresponding to coding regions, so that each entity corresponds to the specific reading frame shift, and the strand (leading vs. ladder). Finally, the eighth cluster (called the ``tail'') differs from all those mentioned above, and comprises the fragments with excessive $\mathsf{GC}$-content values. In the observed pattern, two clusters corresponding to the third position of a reading frame but belonging to opposite strands always project one over the other, while the other four clusters do not. Moreover, there is a mirroring symmetry in the orientation of these two coincidental clusters against four others: each genome has either left-hand or right-hand orientation of these six clusters. The cluster structuredness of the chloroplasts found here differs from a similar one observed for bacterial or eukaryotic genomes. The aim of the second core investigation was to establish the relation between the triplet composition of chloroplast genomes and the taxonomy of their bearers; the latter was determined morphologically, by nuclear genomes. To reveal the relation, all the chloroplast genomes (approx. 900 entries) were converted into triplet frequency dictionaries of the first type, and then they were clustered by $K$-means, elastic maps and some other clustering techniques into two, three, four, five, six and seven classes, respectively. The composition of the classes was the subject of interest: it was found that the distribution of clades over the classes that developed due to clustering was very non-random, and followed, in general, a natural taxonomy of the bearers. Some further perspectives and problems are discussed.