Book of Abstracts: Albany 2003
June 17-21 2003
Computational Proteomics: Genome-scale Analysis of Protein Structure, Function, & Evolution
My talk will address two major post-genomic challenges: trying to predict protein function on a genomic scale and interpreting intergenic regions. I will approach both of these through analyzing the properties and attributes of proteins in a database framework. The work on predicting protein function will discuss the strengths and limitations of a number of approaches: (i) using sequence similarity; (ii) using structural similarity; (iii) clustering microarray experiments; and (iv) data integration. The last approach involves systematically combining information from the other three and holds the most promise for the future. For the sequence analysis, I will present a similarity threshold above which functional annotation can be transferred, and for the microarray analysis, I will present a new method of clustering expression timecourses that finds "time-shifted" relationships. In the second part of the talk, I will survey the occurrence of pseudogenes in several large eukaryotic genomes, concentrating on grouping them into families and functional categories and comparing these groupings with those of existing "living" genes.
In particular, we have found that duplicated pseudogenes tend to have a very different distribution than one would expect if they were randomly derived from the population of genes in the genome. They tend to lie on the end of chromosomes, have an intermediate composition between that of genes and intergenic DNA, and, most importantly, have environmental-response functions. This suggests that they may be resurrectable protein parts, and there is a potential mechanism for this in yeast.