Database Search Algorithms

Search algorithms, post-searching processing, quantitation software, etc. Share and discuss software here.
User avatar
Doug
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 307
Joined: Sun Jun 26, 2011 7:20 pm

Database Search Algorithms

Postby Doug » Tue Jul 12, 2011 10:39 pm

I am starting this thread to try to document in one place all available database search algorithms. If you have one to add. Post it and I will update the list.

Sequest - Written by Jimmy Eng when he was in John Yate's lab. Published in 1994 I believe this was the first search engine and is still widely used. It costs money and is distributed by Thermo Scientific.

Mascot - Another widely used commercial database searching software which has been around since at least 1999.
OMSSA - freely available open source software for database searching from NCBI.

X!Tandem - another popular open source option.
Protein Lynx Global Server (PLGS) - from Waters Corporation

ProteinPilot - available from AB Sicex

Phenyx - from GeneBio

SpectrumMill - from Agilent Technologies

MaxQuant/Andromeda - Freely available software from Matthias Mann's lab at Max Planck Institute

aky
Albumin Member
Posts: 89
Joined: Sat Sep 10, 2011 2:33 pm

Postby aky » Tue Oct 04, 2011 10:40 am

[url=pubs.acs.org/doi/abs/10.1021/pr200031z]MassWiz[/url] from our lab
ProbID- Ning Zhang
Crux- C. Park (W.S. Noble lab)
MyriMatch - D.Tabb
GreyLag - Mike Coleman
MassMatrix - Hua Xu
ZCore- R. Sadygov (for ETD spectra search)

[Thanks David for pointing out Zcore]

David
Angiotensin Member
Angiotensin Member
Posts: 28
Joined: Sat Jul 09, 2011 1:27 am

Postby David » Tue Oct 04, 2011 11:59 am

One quick addendum to aky's list - ZCore doesn't actually come from Josh Coon's lab, though testing was done there and ideas/input given. It was actually developed by Rovshan Sadygov, who is now a professor at the University of Texas - Medical Branch.

Toxic
Angiotensin Member
Angiotensin Member
Posts: 39
Joined: Sun Sep 25, 2011 9:53 pm

Postby Toxic » Tue Oct 04, 2011 6:12 pm

Peaks Studio from Bioinformatic Solutions. It performs de novo sequencing to generate sequence tags to assist it's search engine. Useful if working on organisms will little or no genome information as you can look through the unmatched de novo spectra and BLAST search those.

User avatar
Snowcast
Phosphoserine Member
Phosphoserine Member
Posts: 12
Joined: Wed Oct 05, 2011 12:22 am

Postby Snowcast » Wed Oct 12, 2011 12:02 am

A super compendium of MS software can be found at this site:
ms-utils.org

if you scroll down to the 'protein identifications' section there are quite a number of search algorithms listed that aren't currently on 'our' list.

User avatar
Doug
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 307
Joined: Sun Jun 26, 2011 7:20 pm

Postby Doug » Wed Oct 12, 2011 12:07 am

For what its worth i go to the Protein Prospector site all the time, not for data base searching but for calculating theoretical fragment ions, peptide masses, etc. Its a nice resource.

User avatar
Snowcast
Phosphoserine Member
Phosphoserine Member
Posts: 12
Joined: Wed Oct 05, 2011 12:22 am

Postby Snowcast » Wed Oct 12, 2011 12:11 am

Sorry Doug, just edited my previous post!

woa
Ubiquitin Member
Ubiquitin Member
Posts: 62
Joined: Sun Sep 25, 2011 6:29 am

Postby woa » Wed Oct 12, 2011 6:14 am

Can anybody explain the differences between various versions of Sequests like Turbo-Sequest, Sorcerer Sequest etc.?

David
Angiotensin Member
Angiotensin Member
Posts: 28
Joined: Sat Jul 09, 2011 1:27 am

Postby David » Wed Oct 12, 2011 11:04 pm

Hi woa,

Admittedly, I don't use SEQUEST very often, but here is what I have found:

As far as I know, TurboSEQUEST is simply the name of SEQUEST provided by Thermo which is notable for having a user-friendly graphical interface for both searching and interpretation of results. I do not think that any changes to the actual algorithm exist in this version (please correct me if I am wrong).

The "SORCERER" of which you speak is actually the name of a group of different platforms, and does not have to do with the algorithm, itself:
http://www.proteomics2.com/?p=153

However, the "newest" SEQUEST version, I believe, is SEQUEST 3G. Without rehashing the details, I will simply point you towards Sage-N Research's description: http://www.proteomics2.com/?p=153


I hope this helps!

jke000
Phosphoserine Member
Phosphoserine Member
Posts: 20
Joined: Wed Aug 17, 2011 10:05 pm

Postby jke000 » Fri Oct 14, 2011 10:19 am

Actually the term "Turbo" was added to TurboSEQUEST when Thermo released a version, back the late 90's, that sped up searching quite a bit (compared to the original version) by preprocessing or indexing the sequence databases. Back then, there were two indexing schemes for 'size' (supported variable mods) and 'speed' (no variable mods but faster). Sometime around the mid 2000's, Sage-N obtained a license to SEQUEST and re-implemented it, originally accelerated on FPGA hardware, but now simply optimized in software to run on multi-core CPUs. Features of the two versions (Thermo's and Sage-N's) will likely diverge over time as I believe the two tools are being developed independently. I don't use either tools so I can't help to answer the original questions on differences between the two. Presumably the core underlying algorithm is the same and differences are in the ancillary features available.

There are academic versions of SEQUEST still being developed, at UW and presumably at Scripps. The Gerber lab published MacroSEQUEST which is their implementation for significant speed improvements.

Craig
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 220
Joined: Sun Jun 26, 2011 6:49 pm

Postby Craig » Fri Oct 14, 2011 10:29 am

Hi jke000,

Thanks a lot for that clarification. If one wanted to use Sequest for their research, what version would you recommend? Preferably a free version.

-Craig

User avatar
Doug
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 307
Joined: Sun Jun 26, 2011 7:20 pm

Postby Doug » Wed Nov 02, 2011 6:59 am

We have added an organized list of these software packages to our Resources page. Each program includes the contributing authors, a brief description, a link to paper, and a link to webpage. Please, post here to suggest additions or makes us aware of any mistakes.

Murat
Proton Member
Proton Member
Posts: 1
Joined: Thu Nov 03, 2011 7:55 am

Postby Murat » Thu Nov 03, 2011 8:02 am

There is a review, comprehensively describing a lot of proteome-related software and databases:
[url=A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics]Nesvizhskii AI. A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics. J Proteomics. 2010 Oct 10;73(11):2092-123. Epub 2010 Sep 8. Review. PMID:20816881[/url]

jke000
Phosphoserine Member
Phosphoserine Member
Posts: 20
Joined: Wed Aug 17, 2011 10:05 pm

Postby jke000 » Tue Nov 08, 2011 4:47 pm

Craig wrote:Hi jke000,

Thanks a lot for that clarification. If one wanted to use Sequest for their research, what version would you recommend? Preferably a free version.

-Craig


Craig,

There's effectively no free version of Sequest; it's commercially licensed and available from Thermo or Sage-N Research.

There are tools like Crux and MyriMatch that both reimplement the cross correlation score. The k-score plug-in to X!Tandem does a similar calculation but it's not the same; the other two tools implement the xcorr more faithfully (based on looking at their source code).

- Jimmy

Artur
Angiotensin Member
Angiotensin Member
Posts: 31
Joined: Tue Sep 20, 2011 2:50 am

Postby Artur » Fri Nov 25, 2011 1:21 am

Does anyone are aware of software package that really make use of high mass accuracy (e.g <20ppm for MS and MS/MS) data?

arledvina
Phosphoserine Member
Phosphoserine Member
Posts: 14
Joined: Wed Jun 29, 2011 6:09 am

Postby arledvina » Mon Nov 28, 2011 8:01 am

Hi Artur,

I don't know if this is what you are thinking of, but our group uses OMSSA to search high res (orbitrap) MS/MS data. To take advantage of the high mass accuracy, we set the product ion m/z tolerance quite low (0.01 Th). This is done by modifying the search string variable " -to ".

gabe
Angiotensin Member
Angiotensin Member
Posts: 25
Joined: Sat Nov 19, 2011 5:56 pm

Postby gabe » Mon Nov 28, 2011 12:13 pm

There's also ProLuCID, which is a new-ish algorithm from the Yates lab that, I think, is pretty similar to SEQUEST:

http://fields.scripps.edu/prolucid/index.html

sesf43
Serine Member
Posts: 8
Joined: Wed Nov 30, 2011 9:25 am

Postby sesf43 » Wed Nov 30, 2011 9:55 am

Doug,
I think a great addition to this already wonderful list of resources would be a column that lists the available input file formats and output file formats. eg. mzXML, dta, mzML etc and out, protXML, dat respectively.

User avatar
Doug
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 307
Joined: Sun Jun 26, 2011 7:20 pm

Postby Doug » Wed Nov 30, 2011 11:31 am

That's a great idea. Anybody want to help me gather the info?

If you are familiar with any of the software packages listed here please post here indicating the compatible input and output formats.

Just post something like this...

MASCOT INPUT
Finnigan (.ASC)
Micromass (.PKL)
Sequest (.DTA)
PerSeptive (.PKS)
Sciex API III
Bruker (.XML)
mzData (.XML)
mzML (.mzML)

MASCOT OUTPUT
XML
CSV
pepXML
mzIdentML
DTASelect
DAT
MGF

aky
Albumin Member
Posts: 89
Joined: Sat Sep 10, 2011 2:33 pm

Postby aky » Wed Nov 30, 2011 11:53 pm

Hi Doug,

Please redirect MassWiz link to sourceforge page hereas the webserver is running an old version. And you can add pFind algorithm to that list.

MassWiz INPUT
mgf
pkl
dta
MassWiz OUTPUT
csv

aky
Albumin Member
Posts: 89
Joined: Sat Sep 10, 2011 2:33 pm

Postby aky » Thu Dec 01, 2011 4:16 am

The Algorithms page is commendable. You have taken a lot of pains to collate that info.
I have a suggestion Doug. It would be great if you can add the download links for the software as well. I can help you in that if you take it up.

User avatar
Doug
E. Coli Lysate Member
E. Coli Lysate Member
Posts: 307
Joined: Sun Jun 26, 2011 7:20 pm

Postby Doug » Thu Dec 01, 2011 9:09 am

This is a great suggestion. Perhaps we need a better way to organize/update this list. We had tried a wiki but preferred the clean look and integration of the resources page instead (i.e., wiki was not good for reading). But it restricts the ability of our members to edit it thus slowing down the process (i.e., html was not good for editing). Any suggestions about how we coud organize this so that it is both reader and editor friendly?

Maybe we should consider a wiki again. In fact, a similar (but much much larger) site SEQanswers.com recently published a paper on their wiki of genomic software tools (see below).

If we could figure that out I would like to start other lists for publicly available databases etc.

Li JW, Robison K, Martin M, Sjödin A, Usadel B, Young M, Olivares EC, Bolser
DM. The SEQanswers wiki: a wiki database of tools for high-throughput sequencing
analysis. Nucleic Acids Res. 2011 Nov 15. [Epub ahead of print] PubMed PMID:
22086956.

aky
Albumin Member
Posts: 89
Joined: Sat Sep 10, 2011 2:33 pm

Postby aky » Thu Dec 01, 2011 9:09 pm

It was heartening to see this. Scientific publishing acknowledging the contribution of An educational, community-driven resource is a good step.
We can take inspiration from SEQanswers and see what is applicable to this community. We can also modify, tweak or add new things based on proteomics needs specifically.

I am in favor of community editing (of course with some level of control too). About wiki, I am not sure if everyone is fine with that, but I am comfortable with both wiki and HTML.

brdankiw
Proton Member
Proton Member
Posts: 2
Joined: Wed Apr 18, 2012 10:25 am

Postby brdankiw » Wed Apr 18, 2012 10:33 am

Hello,

I noticed that your product information and link to PEAKS Studio is outdated, please find the updated information below:

PEAKS Studio Link: http://www.bioinfor.com/
Paper: http://www.mcponline.org/content/early/2011/12/20/mcp.M111.010587.full.pdf+html
Price: $$$ (free trial)
Authors: Bioinformatics Solutions Inc.

Cheers!

Ben

brdankiw
Proton Member
Proton Member
Posts: 2
Joined: Wed Apr 18, 2012 10:25 am

Postby brdankiw » Wed Apr 18, 2012 10:35 am

Oh and I would certainly agree with adding a download page section, as well as possible screenshots to allow users to take a quick peek at the software. This way they can have a quick look at the software interface to see if it is something they would like.

Ben


Return to “Bioinformatics”

Who is online

Users browsing this forum: No registered users and 1 guest