Is the genomic sequence too esoteric? Can read with this set of tools

:2018-10-09

After the genome sequencing work is completed, which genes can translate proteins and which proteins can be translated has always been a hot and difficult point in life science research. Recently, the Ge Feng subject group of the Institute of Hydrobiology of the Chinese Academy of Sciences has made breakthroughs on this issue. The relevant research results were published online in the international journal Molecular Plants.

It is understood that the research team used protein genomics to deeply annotate the genome of Phaeodactylum tricornutum and construct a fine map of its proteome. The previous annotation information has been extensively revised and supplemented. Even more striking is the set of experimental procedures and analysis software that the team built in the experiment, which can be applied to all organisms that have completed genome sequencing, providing an important tool for genome interpretation.

Deciphering the long sequence of nucleotides, this "no word book"

The genome was sequenced to obtain a "wordless book" with a long sequence of nucleotides. Which small sequence constitutes a gene, whether this gene can direct synthetic proteins and which proteins can be synthesized, can't find an answer in this book. To answer the above questions, scientists need to make further comments.

Most of the past annotation work was carried out from the perspective of bioinformatics. This is an algorithmic calculation to predict the genes (also known as coding genes) that can translate proteins, and their respective positions in the genome, but this method will miss many coding genes, or make false comments on the genes.

With the development of proteomics, a research direction on the deep annotation of genomes using proteomic data has quietly emerged. Specifically, all the proteins in the biological cells are extracted, and the amino acid sequence of each protein is obtained by mass spectrometry, and then compared with the genome sequence on a computer, and the precise position of each coding gene is reversed.

“Seeing is believing is more reliable than based on operational speculation.” According to Yang Mingkun, the first author of the research, this experiment not only provides evidence for how many known coding genes are expressed at the protein level, but also finds Of the 606 new coding genes, 56 were previously mispredicted as non-coding genes, and 506 encoded genes were incorrectly annotated.

In addition, the ability to resolve protein post-translational modifications is another advantage of protein genomics. Proteins synthesized by 20 kinds of Amino Acids usually undergo a further processing process to become mature proteins with certain functions, and the types of processing are often diverse. This means that the same amino acid sequence may form different kinds of mature proteins. The post-translational modification of more than 20 proteins discovered in this study is a testament to this advantage.

A set of experimental procedures for a deep set of annotation information

How to draw a more detailed proteomic map and a more detailed annotation of esoteric genomic information has always been a major problem in protein genomics. This study has made a new breakthrough in the improvement of the variable shear body. Variable shear refers to the process of “cutting” a certain sequence of sequences during the transcription process and disrupting the remaining sequences. The same gene, different cleavage sites and arrangements, will form different proteins.

Yang Mingkun said in an interview with the Science and Technology Daily that previous studies have verified the known variable shear by identifying the presence of the corresponding protein. The work they did this time was to discover unknown variable shears.

"We designed related algorithms to find different shear sites that may exist on the amino acid sequence of the protein, and then compare the cut 'fragments' with the genomic data to find the corresponding variable shears." Yang Mingkun said . It is reported that the team found 21 new variable splicing bodies and modified the variable splicing sites of 73 known genes.

Deep annotation of the genome involves a large number of calculations. To improve work efficiency, the team integrated the algorithms used in each step to develop a data analysis software for all living things. With this software, you can directly get the relevant genomic annotation information by inputting the mass spectrometry data collected by the mass spectrometer and simple software operating parameters. In addition, the team has established an experimental process for other scientists to refer to. This also means that in the future, you can use the team's process and software to quickly complete the genome deep annotation of other species.

Yang Mingkun said that the research team will further optimize the software and continue to improve its computing speed and accuracy. “Because there are too many errors and omissions in the human proteome sketch completed in 2014, we are ready to further improve this work. Only by knowing which proteins are present in various tissues of the human body, can we better perform precise medical treatment on this basis.” Yang Mingkun Say. (Intern Liu Yuting)

Source: Technology Daily

Amino Acids

Amino acids are carboxylic acids containing amino groups. Amino acids are the building blocks of protein for animal nutrition. Proteins in living things are made up of 20 basic amino acids.

Amino acids in human body through metabolism can play the following roles: (1) synthesis of tissue protein; (2) into acids, hormones, antibodies, creatine and other ammonia containing substances; (3) to carbohydrates and fat; (4) oxidation into carbon dioxide and water and urea, produce energy.

Alanine for the synthesis of Alitame, intermediate amino acid Serine, nonessential amino acid Arginine

Allied Extracts Solutions , https://www.nballiedbiosolutions.com