An index to the China series appears at the first post.
September 21st, 2016
Today represented the core of my work with the Institute for Computing Technology. I was here to visit the “pFinders,” a group of five faculty members who produce algorithms for computational proteomics. I started with an interview with two of the senior faculty, Si-Min He and Rui-Xiang Sun. We have had extensive interactions over the years from a distance, but I have not previously met Professor He in person. We got along well from the word “go.” I have tremendous respect for the work that the pFinders have produced over the last ten years. I first began to understand the significance of this effort when I read their work adapting their database search engine pFind for use in electron transfer dissociation (ETD) data sets. Rather than make perfunctory alterations to which fragment ions were modeled to appear for a given sequence, the team gave serious thought to the problems associated with neutral loss ions from the intact peptide ion. Their paper and one from Robert Chalkley at UCSF had been the standouts in my mind for explaining how to accommodate this data type. I was very lucky to see Dr. Sun again (we had met years before at the UCSD RECOMB Proteomics meetings). He was on his way in just one week to spend some sabbatical time at the University of Wisconsin. It was a very collegial start to my day.
I chose to present an open problem for the proteomics community. What is the best way in which to identify proteins via tandem mass spectrometry from non-model organisms (i.e. those for which we have not already produced an annotated genome)? I presented a case study of a plant that my colleagues at the University of the Western Cape had been investigating. What species would be the best reference for understanding its transcriptome? How can we leverage RNA-Seq for proteomics? From there, I moved to the problems posed by meta-proteomics, cases in which a sample features the proteins of many species that coexist in a ecological niche. I spoke mostly about ecological problems that we can address with this technology, but the same problems arise in understanding the microbial communities that inhabit our bodies. We talked about three workflows that could potentially enable these investigations. I think the pFind team is quite well-suited to implement software for one of those avenues. The group was highly interactive in the course of the talk, and I hope that some of them will follow up with possible solutions.
We enjoyed lunch most thoroughly at a lovely restaurant just down the street from the ICT building. I would highlight two aspects of the meal. First, one of our dishes contained some fascinating knots; yes, they looked exactly like strips of fabric tied in knots, covered in a brown sauce! The lunch group reaffirmed many times that the fabric actually tofu. I was dubious, but I took a bite anyway. Fabulous! I had made a passing remark about a dessert in the menu, and the group took it as a prompt to add it to our meal. I am so glad they did! It appeared to be a gelatin of sorts, again made from soy. The flavors, though, were truly outstanding. The center line of the snack represented tiny pieces of dates, and the white bands were coconut. I ate three of them!
After lunch, two of the faculty presented an overview of the program. I began to understand just how big a scope this group has attempted. Their papers really have grown substantially in number. Just look at the major toolsets this team has generated:
- The database search engine is the fundamental tool for routine peptide identification. pFind is one of the most fully-featured and discriminating tools of the type.
- Handling peptides that have been chemically cross-linked to each other is a daunting informatic task. They produced one of the earliest genuine successes in this space.
- Inferring a sequence directly from a tandem mass spectrum is not easy, but this group has developed a powerful system for that purpose.
- If one does not digest the proteins before introducing it to the tandem mass spectrometer, the produced spectra will be far more complex than for peptides. They have published a tool that competes well against others.
- This software is the subject of my latest effort to publish with this group. It quantifies proteins and peptides by comparing the intensity seen for peptides between pairs of LC-MS/MS experiments or in isotopically labeled experiments.
- Proteomics has shown itself to be a powerful complement to genomics; we call this field proteogenomics. Of course this talented team has taken an interest in the possibilities from multi-Omics!
- Recognizing the structure of sugars that have been connected to a peptide is a daunting computational problem, but the group has tackled it with their usual flair.
Near the end of the day, graduate student Hao Yang presented the work he had been conducting in order to recognize the uncharacterized chemical modifications that appear on individual peptides. He has been adapting the pNovo project to make these open searches possible. I asked him any number of questions, and he did quite a nice job in bouncing back from each challenge. I am always happy to see young scientists responding well to the scrutiny each advance receives. The future arrives bit by bit, with each brick atop one emplaced by earlier research.