Book of Abstracts: Albany 2003

category image Albany 2003
Conversation 13
Abstract Book
June 17-21 2003

Computational Proteomics: Genome-scale Analysis of Protein Structure, Function, & Evolution

My talk will address two major post-genomic challenges: trying to predict protein function on a genomic scale and interpreting intergenic regions. I will approach both of these through analyzing the properties and attributes of proteins in a database framework. The work on predicting protein function will discuss the strengths and limitations of a number of approaches: (i) using sequence similarity; (ii) using structural similarity; (iii) clustering microarray experiments; and (iv) data integration. The last approach involves systematically combining information from the other three and holds the most promise for the future. For the sequence analysis, I will present a similarity threshold above which functional annotation can be transferred, and for the microarray analysis, I will present a new method of clustering expression timecourses that finds "time-shifted" relationships. In the second part of the talk, I will survey the occurrence of pseudogenes in several large eukaryotic genomes, concentrating on grouping them into families and functional categories and comparing these groupings with those of existing "living" genes.

In particular, we have found that duplicated pseudogenes tend to have a very different distribution than one would expect if they were randomly derived from the population of genes in the genome. They tend to lie on the end of chromosomes, have an intermediate composition between that of genes and intergenic DNA, and, most importantly, have environmental-response functions. This suggests that they may be resurrectable protein parts, and there is a potential mechanism for this in yeast.

Mark Gerstein*
P. Harrison
J. Qian
R. Jansen
V. Alexandrov
P. Bertone
R. Das
D. Greenbaum
W. Krebs
Y. Liu
H. Hegyi
N. Echols
J. Lin
C. Wilson
A. Drawid
Z. Zhang
Y. Kluger
N. Lan
N. Luscombe
S. Balasubramanian

Molecular Biophysics & Biochemistry Department
Yale University
New Haven, CT 06520

References and Footnotes
  1. P. Harrison, H. Hegyi, P. Bertone, N. Echols, T. Johnson, S. Balasubramanian, N. Luscombe, M. Gerstein. Nucleic Acids Res. 29, 818-30 (2001).
  2. J. Qian, M. Dolled-Filhart, J. Lin, H. Yu, M. Gerstein. J. Mol. Biol. 314, 1053-1066 (2001).
  3. R. Jansen, D. Greenbaum, M. Gerstein. Genome Research 12, 37-46 (2002).
  4. P. Harrison, H. Hegyi, P. Bertone, N. Echols, T. Johnson, S. Balasubramanian, N. Luscombe, M. Gerstein. Genome Research 12, 273-281 (2002).
  5. Identification and analysis of over 2000 ribosomal protein pseudogenes in the human genome.
  6. Z. Zhang, P. Harrison, M. Gerstein. Genome Res. 12, 1466-82 (2002).
  7. Y. Kluger, R. Basri, J. T. Chang, M. Gerstein. Genome Res. (in press).
  8. A. M. Edwards, B. Kus, R. Jansen, D. Greenbaum, J. Greenblatt, M. Gerstein. Trends Genet 18, 529-36 (2002).