In this approach, a decoy database that contains the same number of proteins as the target database are searched together by the database search engine to identify peptides. As illustrated in the figure, the blue colors indicate the target hits and the orange colors indicate the decoy hits, the squares are the false hits, and circles are true hits.
The target-decoy strategy is a powerful method for FDR estimation. However, as we will discover in the next little while, such a powerful method must be used with caution to avoid FDR underestimation.
The first pitfall in the use of target-decoy approach for FDR estimation is due to the so-called multiple round search strategy in today’s database search software.
This multi-round search was popularized by the X!Tandem program published in 2004, in order to speed up the computation. The first round uses a fast but less sensitive search method to quickly identify a shortlist of proteins from the large database. Then, the second round uses a more sensitive but slower search method to identify peptides, but only from the short list of proteins. This effectively speeds up the search without sacrificing too much sensitivity. Indeed, X!Tandem is one of the fastest search algorithm used today.
However, as pointed out by a paper published in JPR in 2010, this multiple-round search strategy screws up the target-decoy estimation of the FDR. The reason is that after the first round, there will be more target proteins than the decoy in the short list. Thus, if the second round search makes a mistake, the mistake will be more likely in the target proteins. So, we will end up with fewer decoy hits than the actual false target hits. This causes the FDR underestimation.
In PEAKS, we used a new approach, called decoy fusion to solve this problem.
Instead of mixing the target and decoy databases, we append a decoy sequence to each target protein.
*The content of this post is extracted from "Practical Guide to Significantly Improve Peptide Identification Sensitivity and Accuracy" by Dr. Bin Ma, CTO of Bioinformatics Solutions Inc. You can find the link to the guide on this page.