TY - JOUR
T1 - Global pentapeptide statistics are far away from expected distributions
AU - Poznański, Jarosław
AU - Topiński, Jan
AU - Muszewska, Anna
AU - Dębski, Konrad J.
AU - Hoffman-Sommer, Marta
AU - Pawłowski, Krzysztof
AU - Grynberg, Marcin
N1 - Publisher Copyright:
© 2018, The Author(s).
PY - 2018/12/1
Y1 - 2018/12/1
N2 - The relationships between polypeptide composition, sequence, structure and function have been puzzling biologists ever since first protein sequences were determined. Here, we study the statistics of occurrence of all possible pentapeptide sequences in known proteins. To compensate for the non-uniform distribution of individual amino acid residues in protein sequences, we investigate separately all possible permutations of every given amino acid composition. For the majority of permutation groups we find that pentapeptide occurrences deviate strongly from the expected binomial distributions, and that the observed distributions are also characterized by high numbers of outlier sequences. An analysis of identified outliers shows they often contain known motifs and rare amino acids, suggesting that they represent important functional elements. We further compare the pentapeptide composition of regions known to correspond to protein domains with that of non-domain regions. We find that a substantial number of pentapeptides is clearly strongly favored in protein domains. Finally, we show that over-represented pentapeptides are significantly related to known functional motifs and to predicted ancient structural peptides.
AB - The relationships between polypeptide composition, sequence, structure and function have been puzzling biologists ever since first protein sequences were determined. Here, we study the statistics of occurrence of all possible pentapeptide sequences in known proteins. To compensate for the non-uniform distribution of individual amino acid residues in protein sequences, we investigate separately all possible permutations of every given amino acid composition. For the majority of permutation groups we find that pentapeptide occurrences deviate strongly from the expected binomial distributions, and that the observed distributions are also characterized by high numbers of outlier sequences. An analysis of identified outliers shows they often contain known motifs and rare amino acids, suggesting that they represent important functional elements. We further compare the pentapeptide composition of regions known to correspond to protein domains with that of non-domain regions. We find that a substantial number of pentapeptides is clearly strongly favored in protein domains. Finally, we show that over-represented pentapeptides are significantly related to known functional motifs and to predicted ancient structural peptides.
UR - http://www.scopus.com/inward/record.url?scp=85054775034&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054775034&partnerID=8YFLogxK
U2 - 10.1038/s41598-018-33433-8
DO - 10.1038/s41598-018-33433-8
M3 - Article
C2 - 30310110
AN - SCOPUS:85054775034
SN - 2045-2322
VL - 8
JO - Scientific Reports
JF - Scientific Reports
IS - 1
M1 - 15178
ER -