I am new to proteomics and have a question about testing for overlap between protein lists
I have 2 protein lists I would like to compare: A subset (n) of list of proteins (N) generated by me by experiment and a list of proteins from literature belonging to a specific category (R).
I would like to know whether my subset list of proteins (n) is enriched for the proteins in the list from literature compared to other subsets from my experiment. I tried to use the hypergeometric test for determining the significance of overlap between the 2sets.
I am not sure what to use as the background list. I thought of using the total proteins identified (N) in the experiment as the background, however, I realized that about 50% of the proteins in the list from literature (R) were not identified in my experiment. So obviously they would not be present in my subset list which I would like to look for overlap with the literature list.
Would it be acceptable for me to filter the literature list (R) for only those proteins that were identified in my experiment (r) and then compare my subset list (n) with the subset literature protein list (r) and use my total proteins identified as the background?
If it doesn't fit into any other category post it here.
2 posts • Page 1 of 1
- Proton Member
- Posts: 1
- Joined: Wed Sep 02, 2015 6:46 am
- E. Coli Lysate Member
- Posts: 107
- Joined: Wed Dec 21, 2011 8:22 pm
Hi, I'm not statistician but here is what I would do. As I understood you identified a total of N proteins from which n were "regulated" and you want to see overlap of those n and list obtained from the literature R - let's name it "k". You could use entire proteome as back ground but this will not be fare since MS are biased towards identifications of high abundant proteins. What if you randomly choose n proteins from your list of N and calculate overlap with R - this will give you overlap that would be obtained by chance (x). Repeat this step many times (~1,000) to get x1-x1000 - this would allow you to access both "average random overlap" and its variance. Then you will have to use a variation of 1-sample t-test to see if value of your real overlap k is significantly different from what would be randomly obtained.
Hope this helps.
Hope this helps.
Who is online
Users browsing this forum: Bing [Bot] and 1 guest