phosphorylation site database/resource

Search algorithms, post-searching processing, quantitation software, etc. Share and discuss software here.
Phosphoserine Member
Phosphoserine Member
Posts: 20
Joined: Wed Aug 17, 2011 10:05 pm

phosphorylation site database/resource

Postby jke000 » Fri Dec 07, 2012 9:00 am

A colleague of mine asked about resources for phosphoproteomics. If a researcher has a list of proteins of interest and wants to find evidence of phosphorylation, where should they look? In what repository, database or other resource would you start an investigation yourself? Thanks.

- Jimmy

User avatar
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 307
Joined: Sun Jun 26, 2011 7:20 pm

Postby Doug » Fri Dec 07, 2012 10:19 am

A few places to start:

PHOSIDA---phospho, acetyl, and glyco for several organisms

UniProt---Many different modifications
---It definitely doesn't have all sites in there but they do have an API that lets you grab them programmatically if you want

StemCellOmicsRepository---phospho sites
---If you are interested in pluripotent human cell lines

Angiotensin Member
Angiotensin Member
Posts: 37
Joined: Wed Jun 29, 2011 9:26 am

Postby daniswan » Fri Dec 07, 2012 10:21 am

Phosphosite is a good place to start:
Uniprot accessions often have PTM listed as well.
Or if you are working in yeast, the SGD database has links to PTMs from the summary page for each protein.

Glycine Member
Glycine Member
Posts: 6
Joined: Wed Jul 13, 2011 7:54 am

Postby edhuttlin » Mon Dec 10, 2012 12:48 pm

Hi Jimmy,

I generally agree with the suggestions from Doug and Danielle as far as phosphoproteomics databases are concerned. Here are two more PTM databases:

Phosphomouse: (a list of ~36K phosphorylation sites and their abundance distributions across 9 mouse tissues)
Human Ubiquitination: (a list of ubiquitination sites from cultured human cell lines)

Whenever I point people to these resources, particularly the databases like PhosphoSite and UniProt that aggregate the results of many studies, I always warn users to approach the results with a healthy dose of skepticism because there's a good chance that the proportion of false positives is quite high. The problem, as I'm sure you're aware, is that even if each individual dataset is of reasonably high quality (FDR around 1%, etc), as you combine more and more data sets together, the right answers tend to be identified repeatedly, while wrong answers tend to be distributed more randomly. So the incorrect sites combine in an additive way, while the correct sites accumulate much more slowly. Eventually, if you just look at the non-redundant set of sites that are in the database, the FDR is often surprisingly high unless additional filtering is applied by the curators. (In many cases there is no additional filtering, though this is starting to change.)

Another potential source of error is the inclusion of older or less stringently assembled datasets that may have higher FDR's than would currently be acceptable. Collectively, the entire field has become a lot more rigorous about PTM identification over the past few years, and that's great. But there are certainly datasets out there in the literature that, though they met accepted standards when originally published, would not be as favorably viewed today. Some databases are starting to take steps to filter out questionable data, though it's definitely not standard yet.

With these things in mind, I generally encourage people to look carefully at the results and to focus their attention on sites that have been reported many times by multiple groups using high throughput techniques, and ideally by low throughput techniques as well. (Phosphosite, in particular, does a good job of breaking out references into 'low-throughput' and 'high-throughput' groups.) Other sites should be evaluated on a case-by-case basis. If a site is only identified in a single paper, I'd be suspicious and would look very carefully at the methodology before deciding how much I'd believe it.

Ubiquitin Member
Ubiquitin Member
Posts: 53
Joined: Thu Dec 15, 2011 7:58 am

Postby karthikskamath » Mon Dec 10, 2012 4:22 pm

Hi JKE000,

List of tools, databases in a table of this paper might help you in some way:

"Proteomic databases and tools to decipher post-translational modifications"

Angiotensin Member
Angiotensin Member
Posts: 31
Joined: Fri May 04, 2012 1:06 am

Postby ranio » Thu May 16, 2013 6:18 pm

Phosphosite and PhosphoELM are recommended. As Doug said, PHOSIDA and Uniprot are also good places to start :D

Serine Member
Posts: 8
Joined: Mon Apr 01, 2013 7:55 am

Postby ChiLee » Fri May 17, 2013 12:29 pm

PhosphoELM is worth-looking at -> It contains a mixture of phosphorylation sites derived from low-throughput (probably antibody-based) and high-throughput (mass spectrometry) experiments.
This is particularly nice because PhosphoELM actually marks down in the tab-delimited file which phosphosite was LTP and HTP-derived -> incase you don't quite trust localisation via MS which is a fair assumption.

Return to “Bioinformatics”

Who is online

Users browsing this forum: No registered users and 1 guest