Monday, April 22, 2013

Common ptifalls of FDR estimation part two


The second pitfall of the traditional target-decoy strategy is caused by another popular technique used to increase the peptide identification sensitivity.  

The idea is clever: if a weakly identified peptide happens to be on a highly-confident protein, then the peptide is likely to be correct regardless of its low score. So, to increase the sensitivity, the software can add a score bonus to each peptide on a multiple-hit protein. Indeed, this protein bonus will save some weak true hits, but it will save some weak false hits at the same time. The bigger problem is that the target database will provide more multiple-hit proteins than the decoy. As a result, more weak false hits will be saved from the target database. This will cause the FDR underestimation.


In PEAKS, decoy fusion approach can solve this problem effectively. 

Because the target and decoy sequences are concatenated into a single protein sequence, when a protein bonus is added to the multiple-hit proteins, the same bonus will be added to the target and decoy hits equally. So, weak false hits are saved with approximately equal probabilities in the target and decoy. This recreates the balance and provides accurate FDR estimation.

By using the decoy fusion as the validation method, we can safely apply the protein bonus. We get the sensitivity, but did not compromise  the FDR estimation.

 
*The content of this post is extracted from "Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy" by Dr. Bin Ma, CTO of Bioinformatics Solutions Inc. You can find the link to the guide on this page.

No comments:

Post a Comment