Difference between Human reviewed and Homo Sapiens Reviewed database

Search algorithms, post-searching processing, quantitation software, etc. Share and discuss software here.
Biomarker
Albumin Member
Posts: 88
Joined: Thu Aug 09, 2012 3:28 am

Difference between Human reviewed and Homo Sapiens Reviewed database

Postby Biomarker » Wed Aug 07, 2013 11:12 pm

Hello,

It may sound the stupid question. But what is the difference between Human and Homo Sapiens database? When i searched individually on Uniprot site, it shows both are different fasta files. And which one we should pick of if we are searching human serum MS data analysis?

Look forward to hear back from you...

Biomarker
Albumin Member
Posts: 88
Joined: Thu Aug 09, 2012 3:28 am

Postby Biomarker » Wed Aug 07, 2013 11:16 pm

One more thing which one I should chose either reviewed or unreviwed? Either of these make any difference in terms of identifications?

Craig
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 220
Joined: Sun Jun 26, 2011 6:49 pm

Postby Craig » Thu Aug 08, 2013 9:46 am

Homo sapiens would be better than human because the latter finds things like "Human immunodeficiency virus". You also definitely want to restrict the search to organism, e.g. organism:"Homo sapiens" or else you'll find proteins from organisms where human is a virus host, for example. However I think the best way to get a FASTA database from UniProt is not to do a search but use the complete and reference proteome sets. They are linked from the main UniProt page.

As for reviewed (Swiss-Prot) versus unreviewed (TrEMBL), I did a comparison a few months back:

[TABLE="class: grid"]
[TR]
[TD][/TD]
[TD]IPI version 3.87[/TD]
[TD]UniProt (Swiss-Prot + TrEMBL; canonical + isoform) version 2013_03[/TD]
[TD]Swiss-Prot (canonical + isoforms) version 2013_03[/TD]
[TD]Swiss-Prot (canonical) version 2013_03[/TD]
[/TR]
[TR]
[TD]proteins[/TD]
[TD]91464[/TD]
[TD]87656[/TD]
[TD]38193[/TD]
[TD]20248[/TD]
[/TR]
[/TABLE]

Using Morpheus I searched triplicate LTQ Orbitrap Velos and Q-Exactive data from http://www.mcponline.org/content/10/9/M111.011015.abstract against these 4 databases. These are the number of protein groups identified at 1% FDR:

[TABLE="class: grid"]
[TR]
[TD][/TD]
[TD]IPI version 3.87[/TD]
[TD]UniProt (Swiss-Prot + TrEMBL; canonical + isoform) version 2013_03[/TD]
[TD]Swiss-Prot (canonical + isoforms) version 2013_03[/TD]
[TD]Swiss-Prot (canonical) version 2013_03[/TD]
[/TR]
[TR]
[TD]LTQ Orbitrap Velos[/TD]
[TD]1928±23[/TD]
[TD]1915±28[/TD]
[TD]1917±22[/TD]
[TD]1922±14[/TD]
[/TR]
[TR]
[TD]Q-Exactive[/TD]
[TD]2404±148[/TD]
[TD]2392±137[/TD]
[TD]2389±132[/TD]
[TD]2388±133[/TD]
[/TR]
[/TABLE]

As you can see there is very little difference. IPI actually looks the best but it is closed and they recommend using UniProt's complete and reference proteome sets. Searching all of UniProt (including unreviewed TrEMBL proteins) with isoforms will take longer and result in protein groups with multiple proteins, but it won't hurt your identifications, so that is my recommendation for most searches.

Biomarker
Albumin Member
Posts: 88
Joined: Thu Aug 09, 2012 3:28 am

Postby Biomarker » Thu Aug 08, 2013 10:22 pm

Hey Craig,

Many thanks for this detailed explanation...

:)


Return to “Bioinformatics”

Who is online

Users browsing this forum: No registered users and 1 guest