I have now attended three HUPO-PSI (Human Proteome Organization Proteomics Standards Initiative) meetings: Ghent, Beijing, and Heidelberg. As an early skeptic about the standard data formats for mass spectrometry, I confess I have substantially revised my opinion on the usefulness of HUPO-PSI. I have now served as the Quality Control Working Group chair for two full years, and I feel I understand quite a lot more of what these meetings accomplish.
Who is HUPO-PSI?
HUPO-PSI may receive less name recognition than the formats that they have made possible. Thousands of biological mass spectrometrists have used the mzML format, frequently employing ProteoWizard software to produce it. The mzML format, then is probably HUPO-PSI’s most conspicuous success. This format, though, was not the first XML-based attempt to capture proteomics data. We would probably point to mzXML for that. Actually, mzML wasn’t even the first file format canonized by HUPO-PSI for this purpose! We would point to mzData for that. Because HUPO-PSI was humble enough to seek the input of the mzXML team, a far more fully-fledged format, mzML, could be produced by merging the best aspects of mzData and mzXML. I also see a huge splash from the Molecular Interactions side of HUPO-PSI, though I have always associated with the mass spectrometry side instead.
Two topics have mystified me more than any others about the operation of HUPO-PSI. The first is the reliance on a “CV” and “ontology.” Controlled Vocabularies are essentially sets of terms that have been rigorously defined for use in reporting a kind of information. An ontology relates these terms together, for example through “IS_A”, “PART_OF,” “HAS_PART,” and “REGULATES” relationships employed in the Gene Ontology. HUPO-PSI makes use of ontologies that have been defined for other efforts, such as the Units of Measurement Ontology created by European Bioinformatics Institute and Phenotype and Trait Ontology. It also maintains its own set of controlled vocabularies, such as the PSI-MS. For my quality control to create a HUPO-PSI compatible format means connecting into these information resources rather than “reinventing the wheel” with an altogether new ontology.
The second topic that mystifies me is the “Document Process.” Once a working group has beaten all the problems they can find from a proposed file type (a schema, or format to store the information, along with a CV that defines key terms for that file type), they submit the package to the document process, wherein more experienced standards creators draw attention to potential problems and external reviewers evaluate the extent to which that format meets the needs of the community for which it is intended. I will learn a lot more about the Document Process when our proposal is ready for this group!
The mzML format seems very stable and very capable in its version 1.1.0. Mass spectrometry technologies, however, are always improving! For the last decade and more, ion mobility technology has been maturing in technology development laboratories, and a few mass spectrometry vendors now offer instruments that incorporate this separation technology. The mzML CV and schema, however, has had somewhat patchy support for the information from this separation. At this year’s meeting, Eric Deutsch convened a small group of people to discuss the best way to support this technology within mzML, ideally without forcing a major update in the format. Hans Visser of Waters Corporation has made a lot of contributions on this score, and Matt Chambers (a wunderkind whose company I enjoyed during my decade at Vanderbilt) had offered some feedback on how to incorporate this information. Our meeting at HUPO-PSI helped set us on a course for formal support for ion mobility!
The HUPO-PSI Quality Control Working Group
I was really proud of the Quality Control Working Group. I assigned us all a bit of homework for this meeting. Three committee members create tools that generate quality metrics; all of us were assigned the task of creating a mock-up of the qcML we thought our software should produce. One of us produced a database for storage of quality metrics; he was tasked to demonstrate what a qcML holding an analysis of these metrics should look like. As a result, this meeting was far more concrete about what we need to do to finalize this format. In particular, we grappled with the challenges of embedding information in JSON format within an XML wrapper. Our consideration of complex data structures for particular metrics, such as three-dimensional matrices, is now much more applied in nature.
The field of proteomics needs to improve its ability to communicate issues of quality control. There’s a perception of irreproducibility that hangs over the field. While there is some basis in reality for these reproducibility claims, a fair bit of the problem is that researchers shy away from discussing quality issues in their papers. A Ph.D. student in Shanghai has been heading the Quality Control Working Group recommendations for “MIAPE-QC:” the Minimal Information About a Proteomics Experiment for Quality Control. I was sad that she could not attend the meeting due to grad school requirements, but a colleague of hers from Beijing presented the current state of the MIAPE-QC document. We had a really good conversation about it, but I think our recommendations were a bit garbled on their way back to her; she was discouraged by our feedback and felt we were arguing for her to start over. We’re working to clear up the confusion. We will support her valuable efforts in educating our community.
Next stop: Cape Town?
As the meeting drew to a close, I put on my presenter hat one last time. It was time to state my case for hosting the next HUPO-PSI at Cape Town! Several different sites are bidding to host: Adelaide, Tokyo, San Diego, and Cape Town are all in the mix. I started by taking the bull by the horns. Cape Town may seem very far away, but it is actually in the same time zone as Heidelberg! Flying from New York City is pretty rough, though, with a flight time of almost 15 hours. My friend Eric Deutsch would have one of the worst routes since he is coming from Seattle. Still, South Africa is seeing good growth in mass spectrometry, and we would love to see more of its laboratories corresponding with HUPO-PSI. I highlighted some of the lovely attractions and hosting sites that we might visit as a group. Hopefully, the steering committee will see its way to Cape Town in the near future!