Who are CPR?

We are the Centre for Proteome Research, located on the ground floor of the Biosciences Building. The group is led by Claire Eyers and Edward Emmott, both of whom have independent, research programmes. In addition, we are the Proteomics Shared Research facility (SRF) . This means that we can also conduct proteomics experiments for colleagues in the Faculty, and beyond..

Accessing the MASCOT server at Liverpool

We no longer operate a MASCOT server for University of Liverpool users. Access is now limited to PFG and affiliates. This is regrettable, but there are no external input streams to support this package.

From the manufacturers's web site:

Mascot is a powerful search engine which uses mass spectrometry data to identify proteins from primary sequence databases. While a number of similar programs available, Mascot is unique in that it integrates all of the proven methods of searching. These different search methods can be categorised as follows:

Peptide Mass Fingerprint in which the only experimental data are peptide mass values, Sequence Query in which peptide mass data are combined with amino acid sequence and composition information. A super-set of a sequence tag query,

MS/MS Ion Search using uninterpreted MS/MS data from one or more peptides,

The general approach for all types of search is to take a small sample of the protein of interest and digest it with a proteolytic enzyme, such as trypsin. The resulting digest mixture is analysed by mass spectrometry

Different types of mass spectrometer have different capabilities. A simple instrument will measure a set of molecular weights for the intact mixture of peptides. An instrument with MS/MS capability can additionally provide structural information by recording the fragment ion spectrum of a peptide. Usually, the digest mixture will be separated by chromatography prior to MS/MS analysis, so that MS/MS spectra from individual peptides can be measured.

The experimental mass values are then compared with calculated peptide mass or fragment ion mass values, obtained by applying cleavage rules to the entries in a comprehensive primary sequence database. By using an appropriate scoring algorithm, the closest match or matches can be identified. If the "unknown" protein is present in the sequence database, then the aim is to pull out that precise entry. If the sequence database does not contain the unknown protein, then the aim is to pull out those entries which exhibit the closest homology, often equivalent proteins from related species

What capabilities do you have?

We have an extensive suite of instrumentation. These instruments have all been brought into CPR by grants awarded to Rob and Claire (with other colleagues) and are primarily directed towards the research programmes that they have to support. However, we are very willing to engage with other groups, as collaborators or in the context of the Shared Rsearch Facility.

• 2006: Waters GC-TOF Premier GS/MS system

• 2009: Waters Xevo QqQ/nanoAcquity
• 2009: Waters Xevo QqQ/nanoAcquity

• 2010: Thermo Velos Orbitrap/nanoAcquity (upgraded to Elite in 2015 for metabolomics)
• 2010: Waters Synapt G2/nanoAquity high resolution ion mobility QToF
• 2010: Bruker Amazon high speed ion trap/nanoAcquity
• 2010: Bruker Ultraflex Extreme 1kHz MALDI-TOF/TOF

• 2012: Thermo QExactive Orbitrap instrument/Dionex u3000 nano

• 2013: Waters G2si IM-QTOF for intact protein research
• 2013: Waters G2si IM-QTOF/ nanoAquity for proteomics
• 2013: Waters Xevo TQS QqQ/nanoAcquity

• 2014: Waters MALDI-Synapt G2si imaging system
• 2014: Waters LAESI-Synapt G2si imaging system, upgrade to include DESI in 2015

• 2015: Thermo Fusion tribrid mass spectrometer
• 2015: Thermo QExactive HF mass spectrometer
• 2015: Access to Fluidigm CyTOF mass cytometer


Is label-free quantification the way to go?

This is a tricky question. Mass spectrometry is not an intrinsically quantitative method, and it is difficult to predict the relationship between the analyte and the intensity of the signal in the instrument. This is manifestly so when introducing a group of peptide ions into an instrument. For example, on MALDI- ToF, lysine terminated peptides are known to give much weaker signals than arginine terminated peptides (there are ways around this). When a complex mixture of peptides is electrosprayed into the source of an instrument, some peptides give very strong signals, but others can be pretty feeble.
It seems as though the major strength of label-free methods is in comparative (relative)

How much will it cost?

Impossible to answer until we have talked. However, we prefer to take responsibility for all aspects of the sample preparation (including the reduction, alkylation and digestion) as this greatly enhances the chances of useful data coming back. This takes a person's time, and we must recover that cost, as well as the cost of running the instrument, covering service charges, analysing the data and generating human-readable output. We're not cheap, but we're certainly not in microarray or nextgen sequencing territory!

A typical proteomics study might be a comparative analysis of a wild type and a knockout.

In 2018 we upgraded our engagement processes, and now, after initial discussions and some refinement, we will issue a formal quote for the work, which you will choose to accept or not. This has gone a long way to reducing misunderstanding of how we work. The only people who can authorise these analyses, and agree the attendant charges, are Claire, Rob or Ed

What is the lowest limit of detection you can attain?

We would hope to be able to reach 100 attomol for the lowest abundance in a discovery experiment.
To know whether that is enough, let's do some quick calculations.
A detection limit of 100 attomol is equivalent to injecting 60 million molecules into the mass spectrometer - that sounds like a lot. If the sample was derived from yeast cells, we would have loaded of the order of 200,000 yeast cell equivalents onto the same column. Thus, we can measure 60 million molecules, derived from 200,000 cells. From this, you will see that the limit of detection in an unprocessed sample is 60,000,000/200,000 = 300 copies per cell.
That sounds pretty good, right? But, if we have been using HeLa cells, the numbers are very different. The limit to what can be loaded on the hplc column is dictated by the total protein load - 200,000 cells gives us about 1000 nanograms of protein. However, each HeLa cell would contain 50 times as much protein as a yeast cell, approximately. Therefore, for a 1000 ng column capacity, we can only load about 4,000 HeLa cells equivalent onto the column. For the same detection limit of 100 attomol, we obtain a limit of detection of 60,000,000/4,000 copies per cell, or 15,000 copies per cell. Rather different!
To overcome such difficulties, it is necessary to resort to sample prefractionation and concentration steps, which have the potential to introduce 'lossy' steps but also increase the number of subsequent LC-MS/MS analyses that need to be conducted, and hence the cost

Can you measure the mass of my protein?

In short, probably yes!
We have set up a semiautomated system for the mass measurement of intact proteins, coupled to a very high quality instrument (Synapt G2si). This system will measure the mass of a protein to about 1Da in 10,000Da, and requires only microgram quantities of protein. The mass is measured by electrospray ionisation mass spectrometry and thus, the protein molecule acquires a large and variable number of charges (protons). Each protein thus creates a multiply-charged envelope of ions that need to be deconvoluted by proprietary software using maximum entropy algorithms. The result is a true mass spectrum that can reveal the mass of the analyte protein and also, the mass of associated contaminants and possibly, fragments or modifications

Can you perform protein sequencing de novo?

High quality sequencing de novo * is best achieved using an instrument that generates high mass accuracy precursor (peptide) and product (peptide fragment) ions. Thus, we would run such samples on the QExactive. Then, the datafile is best processed using Peaks (BSI) which does an excellent job of reconstructing peptide sequences from the ion series.
To see examples where we have almost completely sequenced a protein from species where genome/transcriptome data are unavailable of poor, see [
here] or [here].
However, even with a complete set of tryptic peptides, fully sequenced, it is still impossible to define the order of the peptides. This can be overcome in one of two ways. First, if there is a protein from a related species of known sequence, it may be possible to use homology matching to assemble the tryptic peptide sequences. Alternatively, if the same protein is digested with a different endopeptidase (such as GluC) we can generate a second set of peptide sequences that overlap with the tryptic series, and which can be used to reconstruct the protein sequence. One final point - some peptides will be too small to sequence, and it can also be helpful to digest a protein with endopeptidases LysC or ArgC.
Using these approaches, we have completely sequenced proteins up to about 200 amino acids. There is no reason why this cannot be extended to larger proteins.
And, one final caveat. Leu and Ile are isobaric, and cannot be discriminated by this approach.
* I believe that it is formally correct to say 'sequencing
de novo' rather than 'de novo sequencing'. This is akin to the common error of 'in vitro studies' which should be 'studies in vitro'. Classical scholars might wish to enlighten.

How many proteins can you detect in a typical proteomics analysis?

This is like the answer to 'how long is a piece of string?'. The four factors that dictate the length of the identification list in a typical proteomics experiment are:
Complexity: the number of proteins in the sample Dynamic range: the range of concentrations of those proteins Available material LC-MS Instrument being used
That being said, for a modest run, appropriately replicated, we’d expect to recover data for between 1500 and 2000 proteins with good confidence in a label-free analysis of mammalian cells. If the sample is plasma/serum or cell culture medium that contains serum, then the marked bias in specific proteins will impair the ability to reach to the depths of the proteome, and pre- fractionation is usually the way forward. We do not usually get involved in these pre fractionation steps

What is the best route to contact?

In the first instance, drop a quick email to Rob, Claire or Philip. r.beynon@, ceyers@ or philipjb@ liv.ac.uk. We’ll arrange for a preliminary meeting in which we discuss your project and advise on optimal routes to deliver the results you wish. In many projects, the bioinformatics analysis is a critical part of this discussion, and we will clarify the point at which we stop, and advise you how to engage with bioinformaticians here at Liverpool..

Proteomics as a service

How can I have proteomics samples run?
We are always willing to talk to colleagues about the potential for running new analyses. These can vary from simple 'quick look' analyses to complete and complex, fully biologically replicated analyses. In all circumstances, we adopt a model of 'defend the mass spectrometer form the sample'! In fact, mass spectrometers are remarkably robust; it is the delivery of peptides through a nanoflow high pressure chromatography system that causes the problems.
Biological samples can be delivered in exotic and complex matrixes, either reflecting the biological context of the sample or the sample work up chemistry imposed by the user m(detergent, PEG and glycerol might seem like a dream extraction buffer to you, but we are never going to run that sample for you!). We are very reluctant to receive samples that contain insoluble material, high concentrations of detergents, polymers such a polyethylene glycol (PEG), for example. A nanoflow high resolution column (75um diameter and 150mm long) costs, with trap, nearly £1,000 and is time consuming to exchange and optimise. You can see why we're reluctant to take anonymous samples!
It is far, far better if you come to talk to us before you attempt to prepare proteomics samples

Can I have a copy of XXXXX software?

The answer is, in most, instances, no. The proprietary software that is provided with instruments often includes a free results viewer, but will lack most analytical capabilities. Further, the raw data that derive from the instrument are usually subject to high level processing for discovery or comparative analyses. We have paid a great deal of money for these packages, and they are usually copy-protected and restricted to single computers. These computers are usually very highly specified, running 64-bit OS, with every enhancement for handling large data files, and also for speed.

In general, external users will expect us to perform most of the analyses as well, and thus, we will access these packages on your behalf. If you wish to become adept at running the software yourself, that is no problem, but access to the computers is chargeable as well, and we will also have to make a charge for training. This is the only way we can operate without PFG subsidising other groups research

What is QconCAT?

QconCATs are standards for protein quantification, based on the two principles of surrogacy and stable isotope labelled internal standardisation. They were invented by Rob Beynon, Simon Gaskell and Julie Pratt.
The QconCAT itself is an artificial protein, created by de novo gene synthesis and prepared by heterologus expression, usually in bacteria