Open Source Libraries for Proteomics

Search algorithms, post-searching processing, quantitation software, etc. Share and discuss software here.
Proton Member
Proton Member
Posts: 2
Joined: Sun Aug 07, 2011 8:51 am

Open Source Libraries for Proteomics

Postby parag » Sun Aug 07, 2011 9:25 am

Getting started in developing tools for proteomics can be daunting. There is a significant overhead in writing tools to read the standard formats, writing code to perform in silico digests, etc. There are a number of libraries out there that can help. In addition, many of the larger, open-source tools have extensive code-bases that can be a good starting point.

Here are just a few of the projects with useful code-bases that I've interacted with. Please do add others!


ProteoWizard - I'm biased, but I think it's useful. It has a bunch of libraries and tools that are meant to do all the boring stuff so you can do something useful. When we were starting it we were trying to create a BOOST, for proteomics. It's not quite comprehensive, but it's getting there. Certainly for reading/writing the open formats, or reading vendor formats, it's a great swiss-army knife: {{paper}} We are working on bindings to other languages (python, R), but that work is still quite preliminary. We also recently made a GUI for msConvert!

TOPP - A full pipeline for proteomics written using modern principles. The focus is definitely more on the pipeline aspects, but it's an excellent package. {Paper}

TPP - Though this suite is heavily focused on providing a beginning to end pipeline for MS interpretation, there is a lot of useful code in there to be used.

X! - X Tandem is mostly a search engine. However, it's open source and again, a lot of useful code tucked away in there. {Paper}


MsInspect - A useful set of libraries and tools for JAVA. {Paper}

~ Parag M ~

E. Coli Lysate Member
E. Coli Lysate Member
Posts: 220
Joined: Sun Jun 26, 2011 6:49 pm

Postby Craig » Sun Aug 07, 2011 1:39 pm

I'll add that I've been using ProteoWizard to convert proprietary mass spec data (Agilent .d and Thermo .raw) to mzML and it has worked great. It is command line but very intuitive to use. Best of all, it does a really good job of dealing with all the vendor libraries so you don't have to mess with that. In my experience you can just download and extract and it works right off the bat.

And I can throw in couple more open-source proteomics tools:

OMSSA - Another open-source search algorithm like X!Tandem. It was developed in C++ by Lewis Geer of NCBI and has a very permissive "public domain" license, so as I understand you can use it in any software without restriction. {paper}

COMPASS - Since Parag got the ball rolling I'll mention my own software again. It is written around OMSSA, and aside from that it's 100% C#, which is high level and should be pretty easy to understand. {paper}

I have a lot of experience with both of the above so if anybody has any questions feel free to post them.

Phosphoserine Member
Phosphoserine Member
Posts: 10
Joined: Wed Dec 28, 2011 10:29 am

A join effort for Java library?

Postby MSENC » Sat Jan 14, 2012 9:23 am

We've also used ProteoWizard for file conversion to big success. It's a great idea to have a joined effort from several labs that actually consolidate the know-how, rather than going your own way.

In our own works, we use Java exclusively in server/web mode. I'd love to contribute to a public effort similar to ProteoWizard, for Java people. There are many projects such as JPL, compomics, msInspect, already, but none of them have gone down the path like ProteoWizard. The biggest hurdle is probably on the design of "core" classes.

Wen Yu

Albumin Member
Posts: 91
Joined: Sat Sep 10, 2011 2:33 pm

Postby aky » Sun Jan 15, 2012 3:25 am

Wen Yu,

You are right, a public effort goes a long way in establishing a library and gradually cleaning out the chinks, if any. You have so many eyes to catch errors.

There are some other libraries too.

@Craig-can we put up a resource page for lib/APIs? as done for the Search Algorithms ?


E. Coli Lysate Member
E. Coli Lysate Member
Posts: 220
Joined: Sun Jun 26, 2011 6:49 pm

Postby Craig » Sun Jan 15, 2012 11:38 am

I have added a Computational Libraries page here, and started it off with a few of the libraries mentioned above, but more can be added anytime. If you have any corrections, or know of other libraries that should be added, just reply in this thread.

Proton Member
Proton Member
Posts: 2
Joined: Mon Jan 16, 2012 4:25 am

Postby _Chris_ » Mon Jan 16, 2012 6:05 am

For high-res top down proteomics, I sometimes use "Hardklör" ( and its sibling "Krönik". Its a feature finding tool that can detect highly charged proteins/peptides quite nicely. Most software can deal with lower charges (trypic peptides), but the higher charge gets, the more problems you usually have. Hardklör works quite nicely there.

OpenMS/TOPP (see above) can read/convert Hardklör/'kroenik' result files to other formats.

Carbon Member
Carbon Member
Posts: 4
Joined: Wed Jul 11, 2012 7:20 am

Postby ypriverol » Mon Oct 14, 2013 4:31 am

We recently published a manuscript about OpenSource libraries and frameworks in MS-based Proteomics (

Carbon Member
Carbon Member
Posts: 3
Joined: Mon Nov 18, 2013 1:16 am

Postby hlfernandez » Mon Nov 18, 2013 1:28 am

We have just released Mass-Up 1.0 (, a new open source java standalone application for mass spectrometry data analysis.

Mass-Up (i) loads data from mzML, mzXML and csv files, (ii) allows the user to preprocess data via R packages and (iii) provides different types of analyses such as quality control, biomarker discovery, hierarchical clustering or principal component analysis.

Moreover, you can download a Virtual Machine that comes with Mass-Up installed and ready to use.

With best regards,

The Mass-Up team.

Return to “Bioinformatics”

Who is online

Users browsing this forum: No registered users and 0 guests