SP – You switched fields (or at least disciplines) between your graduate work and your post-doc work. Deciding how to choose a post-doc lab is a big decision for many young scientists. How did you make the decision? Was it tough to get up to speed in proteomics? And do you have any advice for graduate students out there looking to make their move to a post-doctoral position?
Mike – It is a long story on how I made the decision to get into protein mass spectrometry/proteomics for a post doc. The short version is Fred McLafferty gave a series of lectures in the Michigan State University Department of Chemistry a few months after I defended my Ph.D. I was inspired by his seminars to find a post-doc in proteomics. Seeing as though I had no experience in any type of mass spectrometry, few prominent labs had any interest in my candidacy for a position. Fortunately, John Yates, then at the University of Washington, was interested in hiring a biochemist and having people in the group who knew things other than protein mass spectrometry. John gave me an opportunity that I will forever be grateful for.
It was not terribly difficult to get up to speed in proteomics since it was not as large a field in 1999 as it is now, and I was in a great lab with great people. I learned the most from the people around me and by jumping into the project. Also, reading the Yates lab papers and the papers suggested by my colleagues in the Yates lab was valuable.
I think there are two important considerations for a graduate student looking for a post doc. The first is they should be inspired and fascinated by the research in the group they are applying to work in. People always do their best work when they enjoy what they are doing and learning. The second thing that a graduate student needs to consider is what sort of professional opportunities will exist once their post-doc is finished? Does the training in any particular group or discipline provide them with multiple career opportunities beyond academia? This actually was one of the biggest reasons I contacted John. He had a website where all of the people he had trained had jobs, mostly in industry. My original goal was to finish a post-doc and try to find a job in industry.
SP – Tell us about the genesis of the MudPIT approach. Who came up with the idea? Were there different opinions in the lab about how to do it and how effective it would be? And at the time did you realize it would have such a big impact on the field of proteomics?
Mike – There were many people who played key roles in the development of MudPIT. Clearly, John Yates played the critical role in providing an environment, the resources, and the people to set things up for success. Also, Andy Link’s 1999 Nature Biotechnology paper entitled “Direct analysis of protein complexes using mass spectrometry” was really the place where MudPIT was first described, see Figure 4 of that paper. Andy coined the MudPIT term, but if I recall correctly, he was not able to use it in that paper. After Andy left John’s lab, Dirk Wolters and I worked together to try to improve the capabilities of MudPIT, demonstrate the improved capabilities on a highly complex mixture, and try to understand why it worked. This led to our publications in 2001 in Nature Biotechnology and Analytical Chemistry that are both cited well. Dirk deserves equal credit for those two papers. He and I made a great team for several years. I think everyone in the lab was excited about MudPIT for quite a while before I joined the group. Finally, since I had no experience in proteomics or protein mass spectrometry before joining John’s group I had no idea it would have a big impact.
SP – Spectral counting–based quantitative approaches have become very widely used. Your lab has made major strides in advancing this technology. Intuitively it seems that ‘dynamic exclusion’ would cause spectral counting not to work. However, many papers have demonstrated that this does not cause a problem, including a paper you published in Analytical Chemistry in 2009. Why isn’t dynamic exclusion a problem? Is the quantitative accuracy of this method dependent on instrument settings? And do these settings affect protein identification?
Mike – The funny thing about this question is I get it all the time, but no one has ever given me an explanation why dynamic exclusion should cause spectral counting not to work. The reason that it doesn’t impact the method is that it is always on and treats all peptides the same way. If you turn it off you get sampling of highly abundant proteins. Turning on dynamic exclusion affects protein identification dramatically by increasing the number of protein identifications. We believe that there is a ‘sweet spot’ for dynamic exclusion where the optimal time can be determined that will balance protein ids with maximal spectral counts. The paper you mention that Tim Wen, Ying Zhang, myself, and Laurence Florens published in 2009 in Analytical Chemistry addresses these questions. Also, MudPIT and other similar methods result in many peptides from many proteins being seen many times since it is a good sampling method. As instruments become more and more sensitive, they detect and identify more and more peptides in a given unit of time. It is also important to keep in mind that most of our research is on protein complexes where the sample complexity is reduced and we are analyzing complexes with MudPIT so we easily get 100s to 1000s of spectral counts for proteins in complexes. Spectral counting has worked very well for us when analyzing protein complexes.
SP – A big concern regarding spectral counting is dynamic range. As we know, when the count values are less than a certain number (~10), the spectral count ratios become digitized, and it does not make much sense to judge significance. What are the smallest and the largest detectable differences you normally observe using a spectral counting method. And how many counts do you typically need to assess significance?
Mike – Depending on the nature of the study, I think the key word is significance. For all fields this should mean statistical significance. This means that three biological replicates should be generated for conclusions to be drawn. In a comparative analysis, 10 spectral counts under one circumstance versus 100, or possibly zero, in another can certainly be statistically significant. My group and Alexey Nesvizhskii, for example, both published papers in 2008 in Molecular and Cellular Proteomics describing methods to carry out statistical/significance analysis of spectral counting based approaches. Assessing significance is all relative in all disciplines. It depends on your dynamic range. It depends on how noisy your data set is. This can be technical or biological noise. Measuring biological noise is more important due to stochasticity in biological systems. One of the themes in these papers are the lower the spectral counts the bigger the difference between conditions needed to claim statistical significance. This is a theme also in mRNA analysis; it should be a theme for all of quantitative proteomics. If you have counts from 5 to 5000, which is easy to obtain with MudPIT, this is a 1000 fold difference. With MudPIT coupled to sensitive instruments like an LTQ, excellent dynamic ranges are readily obtained.
SP – For studies of post-translational modification in which individual peptides need to quantified, spectral counting methods may suffer from low ‘per peptide’ counts. Have you had success applying spectral counting methods in PTM studies?
Mike – For estimating occupancy, high sequence coverage with strong spectral counts numbers works well for us. We do not undertake post-translational searches on all proteins in a sample (e.g. on a whole cell lysate), but rather on only a few proteins of interest enriched by affinity purification. If proteins of interest are not detected by 100s to 1000s of spectra mapping to unmodified peptides, it’s not worth the cpu time to look for substoichiometric modified peptides. We would rather perform systematic PTM analyses to answer narrow biological questions as opposed to just provide list of modified proteins that would be meaningless for our biochemical collaborators.
We use spectral counting methods for all our PTM studies. While we have only published 4 manuscripts in which we reported post-translational modifications, we quantified these PTMs using local spectral counts to estimate phosphorylation levels on Drosophila polo kinase and matrimony (Xiang et al. 2007), N-myristoylation on mouse neuralized (Koutelou et al. 2008), methylation status of human histone H3 K9 (Mohan et al. 2010), and phosphorylations on human DREAM complex (Litovchick et al. 2011). In the case of the H3K79 methylation paper, we quantified relative levels of mono-, di-, and trimethylations with ion intensities (Fig1C) and peak areas (Fig1D), but a panel that did not end up in this figure showed the same me1/me2/me3 ratios as calculated using spectral counts.
SP – Have you compared spectral counting to other quantitative methods such as label-free intensity-based quantitation, or metabolic labeling, or isobaric tagging? How do these compare in terms of effort required, cost, accuracy, and applicability to low-complexity and high-complexity samples?
Mike – My first papers in quantitative proteomics were with metabolic labeling and mixing two samples together prior to analysis. This at least doubles the complexity of the sample being analyzed, which decreases the amount of proteins identified in a sample. When we analyze two samples separately we get more information than when we analyze two samples when they are mixed. This is a simple concept. Also, once two samples are mixed they cannot be unmixed in case you want to make a different comparison. A new experiment needs to be run. I have not ever used isobaric tagging. Quantitative analysis using MS1 scans in the LCQ or Deca, where I started, was very difficult due to the challenges of picking peak endpoints. The Orbitrap, for example, is a much better instrument for doing MS1 scan quantitation, but you still need good algorithms for picking peak endpoints. This is something that we are also working on. In 2005, we demonstrated that spectral counting and MS1 scan quantitation correlate very well in an Analytical Chemistry paper. Natalie Ahn’s group had a paper drawing similar conclusions in 2005 in a Molecular and Cellular Proteomics paper. Given instrumentation advances with sensitivity, speed, and resolution since that time, I imagine that revisiting this is worthwhile and the correlation would be even better. Label free analysis is cost affective, but it will depend on the objectives of the study. The biggest advantage of label free approaches is that any sample can they be compared to any sample and data can be used retroactively to compare in the future so long as it was generated the same way. This can be a big cost savings since in general mixed samples cannot be used in this way.
SP – Some claim that label-based methods are actually cheaper than label-free because they allow you to multiplex, and the cost of extra reagents is actually lower than that of instrument time. Do you agree with this position?
Mike – In short, no. On a dollar per experiment from a reagent perspective multiplexing might be cheaper. It is possible. The cost of materials alone for a single MudPIT analysis is around $125. This does not include capital equipment cost and maintenance, for example. I don’t know what the cost of a 4 plex experiment is, but I’m sure someone else does. However, I think this is the wrong discussion. While mass spectrometers are very expensive to acquire and maintain the greatest cost and most valuable asset in science is people’s time and effort. We are lucky to have several mass spectrometers in the lab that allow us to run MudPIT on everything. Running one sample for 22 hours to get as much information as possible about the sample is not that much of a cost when compared to the time that it may have taken to prepare the sample itself. A protein complex analysis that originated from a clone that was subcloned into a vector, that was transformed into a cell, that was selected after screening many transformants, where one transformant was expanded to a large cell population, where the protein complex was affinity purified is an expensive, valuable, and time consuming process. Given the efforts that go into excellent sample preparation, I want to know as much as possible about any given sample.
SP – Some studies have used spectral count information to determine absolute abundance of proteins or to compare abundances of proteins in the same sample. Is this a viable approach?
Mike – Personally speaking, I am currently skeptical about absolute quantitation, but I think it is possible. Right now, I think the only way to do absolute abundance is to use stable isotopes to make a standard curve. To take a totally label free approach, using spectral counts or MS1 scans, requires a substantial amount of additional research.
With regards to comparing abundances of proteins in the same sample, that’s what we use normalized spectral counts for: each protein spectral abundance factor (SpC/length) is normalized against the sum of all spectral abundance factors within a sample. A protein NSAF value closer to 1 will definitely be more abundant than a protein with a NSAF value closer to 0.
SP – Your group is one of few proteomics groups that typically performs multiple biological replicates. Can you speak to the importance of replicates? Are they always necessary? And why do you think they are so rare in current proteomic approaches.
Mike – To this day, I am stunned by the lack of biological replicates in quantitative proteomics. I believe that this hurts our field’s credibility. Given the stochasticity of biological systems, if you do something only once you have a strong chance of finding something that would happen solely by chance alone. Also, given the stochasticity of biological systems, you need to know the biological variance, i.e. the standard deviations, to properly interpret data in a quantitative proteomics study. Variance and large error bars are not inherently bad, which is what most people in protein mass spectrometry seem to think, it can in fact be highly informative. An example is a stable interaction versus a transient protein protein interaction. A transient interaction should be lower abundance and more highly variable from replicate to replicate, but it can still be a highly significant and functionally relevant interaction where two pathways are interacting, for example. We discuss this topic to a certain extent in a paper in on RNA Polymerases in Molecular and Cellular Proteomics in 2011.
Reproducibility of methods is critical for generating real biological insights. I was trained as a biochemist and enzymologist as a graduate student. In enzymology you simply cannot report data in the literature where experiments were run only once. In other fields, you cannot publish papers without biological replicates, why does proteomics have such a low standard? That being said, biological replicates of everything are not always necessary or possible. One example would be a large scale protein interaction network, is it feasible or reasonable to do three biological replicates of everything? No. Is it needed? No. However, there should be some proteins that were analyzed with at least three biological replicates to show the reproducibility of the method. If one is making quantitative and statistical claims in a manuscript and you do not have three biological replicates, there better be a great reason for it or the manuscript should be rejected. I have no idea why multiple biological replicates are rare in quantitative proteomics studies, it should not be the case and it is not acceptable.
SP – Are there any trends in proteomics that you find particularly troubling? What are we doing right in proteomics and what are we doing wrong? Or where could we improve?
Mike – Probably two things, the lack of biological replicates in quantitative proteomics studies bothers me deeply. Secondly, I think that the very large numbers reported in several studies these days are possibly due to accepting very low scoring tandem mass spectra that only serves to inflate identifications. This is also problematic. What if someone wastes 6 months chasing some insight from a proteomics study in a biological lab that was a false positive? A 5% false discovery rate on a dataset with 100,000 peptides can be interpreted to mean 5000 peptides were false positives. The trend toward making larger and larger claims of finding 1000s of proteins or peptides of some type bothers me. As a field, we need to be careful about what we are claiming so that biologists when they use our data for inspiration for their own studies actually are not chasing false leads.
SP – Some researchers focus solely on proteomic methods while others neglect the methods and focus solely on biological questions. Your lab has found a way to balance the two; advancing proteomic methods while tackling important biological questions regarding the structure of transcriptional complexes. How do you balance these two aspects of your research? And do you find one or the other more interesting or more challenging?
Mike – From an NIH funding perspective, this has hurt me. My groups work does not fit well anywhere since we try to balance method development and biological insight. I am always told I do not do enough of one or the other, but really, the key is in the balance. I am lucky to be at the Stowers Institute where we have been able to do both. Methods will often be developed when we want to try to answer a biological question in some way or we are trying to find a way to be more efficient in some way. Once a method is developed, it needs to be tested to see if it helps or is valuable. The NSAF approach, for example, has greatly facilitated our collaborations with biochemists when trying to figure out what is changing and how much in different proteins complexes, for example. We use this approach in all of our collaborations and it has greatly facilitated other people’s research and sped up the path to publication. This is just one example. Finally, I think biologically relevant method development is more challenging. I find it very rewarding when we generate new biological insights or new ways of thinking about problems due to new methods or approaches.
Discuss this interview with other SharedProteomics members here.
Director of Proteomics Center
Stowers Institute for Medical Research
Affiliated Associate Professor, Department of Pathology & Laboratory Medicine
The University of Kansas Medical Center
During his post doc with John Yates, Mike participated in the development of MudPIT (multidimensional protein identification technology) which greatly enhanced the fraction of a proteome that could be identified in a given experiment. Since forming his own lab at the Stowers Institute for Medical Research he has advanced label free quantitative proteomics methods and applied these methods to characterize transcriptional complexes.
Lee KK, Sardiu ME, Swanson SK, Gilmore JM, Torok M, Grant PA, Florens L, Workman JL, Washburn MP.Combinatorial depletion analysis to assemble the network architecture of the SAGA and ADA chromatin remodeling complexes.Mol Syst Biol. 2011 Jul 5;7:503. doi: 10.1038/msb.2011.40. PubMed PMID: 21734642; PubMed Central PMCID: PMC3159981.
Sardiu ME, Cai Y, Jin J, Swanson SK, Conaway RC, Conaway JW, Florens L, Washburn MP. Probabilistic assembly of human protein interaction networks from label-free quantitative proteomics. Proc Natl Acad Sci U S A. 2008 Feb 5;105(5):1454-9. Epub 2008 Jan 24. PubMed PMID: 18218781; PubMed Central PMCID: PMC2234165.
Hobby – Old dudes soccer
Favorite thing about academic research – Tie between 1) The chance to make a difference doing something cool and 2) The people that you meet from all over the world.