Best place to deposit proteomic data sets

If it doesn't fit into any other category post it here.

Best place to deposit proteomic datasets

PRIDE
1
17%
PeptideAtlas
0
No votes
gpmDB
0
No votes
Tranche
1
17%
Human Proteinpedia
0
No votes
Other
4
67%
 
Total votes: 6

User avatar
Doug
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 307
Joined: Sun Jun 26, 2011 7:20 pm

Best place to deposit proteomic data sets

Postby Doug » Mon Jul 11, 2011 11:03 pm

We are in the process of submitting a paper and upon the editors request we are going to deposit the data in a public repository. I am a big fan of sharing data I just want to make sure we do it in the best way possible. Basically, I want it to be easily and quickly downloadable. I have little experience with this and I am curious what repositories you might recommend. Some possible options include:

PRoteomics IDEntifications database (PRIDE)

PeptideAtlas

Global Proteome Machine Database

Tranche

Human Proteinpedia

I have also added a poll so that you can vote for your favorite option.

Craig
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 220
Joined: Sun Jun 26, 2011 6:49 pm

Postby Craig » Tue Jul 12, 2011 9:20 am

I have to vote for other (people posting data on their own web or FTP servers) because the current state of proteomics data sharing is so pathetic. Tranche was widely accepted but unfortunately it seems to have not been maintained in the past few years, so downloading and uploading data with it can be hit-or-miss. I have not tried to upload data to PRIDE, but it doesn't look to be a streamlined process (http://www.ebi.ac.uk/pride/easySubmitData.do?). I am also puzzled by their choice to use their PRIDE XML format instead of simply adopting the community mzML standard. PeptideAtlas says it doesn't even accept data anymore, and encourages people to use Tranche or PRIDE instead (http://www.peptideatlas.org/upload/). I can't even tell if you can upload data to Global Proteome Machine Database. And I don't know much about Human Proteinpedia, but if it's limited to human data that is not a good alternative. So unfortunately I think for the time being, people should just stick with sharing data via their own servers. It is not a great solution but perhaps the best available for now. Molecular & Cellular Proteomics noted the limitations of proteomics data sharing when they recently made the deposition of raw data optional instead of required with publication (http://www.mcponline.org/site/home/news/index.xhtml#rawdata). I have noticed, however, that there is a new data sharing service called proteomeXchange being developed, which we can only hope will go a long way to solving the problems in this area.

mvlee
Serine Member
Posts: 9
Joined: Tue Jun 28, 2011 1:27 pm

Postby mvlee » Tue Jul 12, 2011 11:37 pm

MSB recomends two options: Pride (http://www.ebi.ac.uk/pride/) and MIAPE (http://www.psidev.info/index.php?q=node/91). A lot of manuscripts list Tranche as the repository. However, I have found it difficult to upload and download data from Tranche.

User avatar
Doug
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 307
Joined: Sun Jun 26, 2011 7:20 pm

Postby Doug » Thu Jul 21, 2011 10:18 pm

Here is a really well written blog post about why sharing proteomic data sets isn't all that straight forward.

http://blog.dannynavarro.net/2010/07/31/sharing-proteomics-data-trickier-than-it-seems/

Craig
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 220
Joined: Sun Jun 26, 2011 6:49 pm

Postby Craig » Wed Aug 24, 2011 8:34 pm

There was a round-table discussion on this topic at the Tenth International Symposium on Mass Spectrometry in the Health and Life Sciences. Apparently Phil Andrews had a P41 center grant which was funding Tranche, but it was not renewed, so there are minimal resources supporting it now. A lot of people had interesting ideas but there doesn't seem to be a consensus about what to do yet. So unfortunately we can probably count on this issue lingering for some time...

aky
Albumin Member
Posts: 90
Joined: Sat Sep 10, 2011 2:33 pm

Postby aky » Sun Oct 16, 2011 12:59 am

Lack of funding is the main reason for the outages of repositories. NCBI peptidome was clean and easy to use, as far as downloading was concerned with no pains of data conversion from orphan formats. I never tried to upload there. For uploading Tranche is the easiest but lots of technical glitches are increasingly making it a pain to use.

Javier
Phosphoserine Member
Phosphoserine Member
Posts: 16
Joined: Fri Oct 14, 2011 6:24 am

Postby Javier » Sun Oct 16, 2011 7:01 am

I like the idea behind Tranche (freedom to upload any format) but the lack of maintenance of the service forces to look for alternatives such as custom FTP depositories

daniswan
Angiotensin Member
Angiotensin Member
Posts: 37
Joined: Wed Jun 29, 2011 9:26 am

Postby daniswan » Fri Apr 27, 2012 6:55 pm


gabe
Angiotensin Member
Angiotensin Member
Posts: 25
Joined: Sat Nov 19, 2011 5:56 pm

Postby gabe » Sat Apr 28, 2012 11:16 am

daniswan wrote:Another discussion about where to deposit proteomics raw data:

A home for raw proteomics data
Nature Methods 9, 419 (2012) doi:10.1038/nmeth.2011
Published online 27 April 2012


I was excited when I saw this article, but I couldn't figure out how to upload RAW data to EBI. Does anyone know where this should go?

Derek
Phosphoserine Member
Phosphoserine Member
Posts: 18
Joined: Tue Jun 28, 2011 5:01 pm

Postby Derek » Mon Apr 30, 2012 8:27 am

I am worried that loading raw data that was collected in a non-centrioded mode (i.e. profile) will severely limit the amount of data that can be easily stored. Some MS systems only produce non-centrioded data and a lot of quantitative-MS relies on profile-mode acquisition and analysis. With the trend going to longer analyses (more fractions, longer gradients, replication, etc...) the shear size of the raw data will grow considerably larger. In less then 24-hours, one MS could collect >15 GB of raw data, and the time commitment to transfer these files across TCP/IP networks is large and burdensome.

Have people given much thought to peer-to-peer file sharing protocols such as torrents? I know that Tranche operated on the same principle of a distributed filesystem, but they seem to have reinvented the wheel and made it work like a square. It would function for a while, but often would get stuck in the middle of a download. Services such as BitTorrent have done well, as evident by the all the illegal pirating, and could be used to facilitate the transfer of data in a distributed manner. Any thoughts?

User avatar
Doug
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 307
Joined: Sun Jun 26, 2011 7:20 pm

Postby Doug » Mon Apr 30, 2012 9:15 am

This seems like a really good idea. But would that mean that the owner of the data would always have to have the data on a specific computer and that computer must be turned on all the time? It seems like access to the data would be sporadic. I realize that for illegal torrent content many people have copies of the data available so this isn't a problem. But I don't think this will be true for proteomics data. In most cases the data will only exist in a single location.

Derek
Phosphoserine Member
Phosphoserine Member
Posts: 18
Joined: Tue Jun 28, 2011 5:01 pm

Postby Derek » Mon Apr 30, 2012 10:20 am

Yes, initially the person who collected the data would have to "seed" it by providing a computer that is always connected to the internet. Once another user (a "leech") downloads the data, they would be "required" to seed it to other users. This, of course, cannot be enforced, but "share ratios" (the amount uploaded / downloaded) could be tracked and basically shame users who download without seeding. Once a dataset gets three or four people seeding it, you should have pretty good fault tolerances and much improved download speeds. Additionally, this type of system allows pausing/resuming capabilities that normal FTP/HTTP downloads lack. Need to download 100 GB dataset but need to restart your computer? No problem if you use this system as individual packets are downloaded, not the whole file. Error checking, such as MD5sums, could easily be employed too so that data integrity is maintained. And the popularity of data can be seen recorded as the number of people seeding the particular torrent.

aky
Albumin Member
Posts: 90
Joined: Sat Sep 10, 2011 2:33 pm

Postby aky » Tue May 01, 2012 2:39 am

Great idea , infact last year Plos ONE had an article on this topic - BioTorrents: A File Sharing Service for Scientific Data. This has practical difficulties though - (1)no one can be forced to "seed". (2)many institutions(like mine), block torrents. So, it is difficult to explain to system admins to change the policies.

There are some filesharing sites like which provide a torrent file which, I believe, works for a limited time, or until your IP address is changed. I dont fully understand it, but it tracks the IP of the leecher for providing the data. If something of this sort, i.e. , "On demand" availability could be made possible for proteomics, it may be helpful.


Return to “Other”

Who is online

Users browsing this forum: No registered users and 1 guest