Wednesday, March 27, 2013

PEAKS de novo sequencing on 3.4 million MS2 spectrum

It is unbelievable how fast the mass spectrometry data can grow in size. A decade ago, when Dr. Bin Ma created PEAKS, a data file containing hundreds of spectrum is already considered large. Now we are seeing dataset over a million spectrum on a regular basis. On this scale, it is quite challenging for proteomics software to analyze the data within a reasonable amount of time under finite computing resources.

I recently received a large dataset from one of our collaborators. The data contains 3.4 million MS2 spectrum in total (about 160 hours LC for all the samples) and it is generated on a Thermo Orbitrap instrument.

I now have the project created and started de novo sequencing on all the spectrum in one go. It will take a while until the job completion.

Stay tuned!

Tuesday, March 26, 2013

How to run PEAKS Studio/Viewer on Mac OS X or Linux?

Occasionally, we are asked by users whether PEAKS can run on Mac OS X or Linux. PEAKS is written in Java, theoretically it should work. This post will show the steps you need to do to make PEAKS run a Mac OS X. The steps needed should be very similar to make PEAKS work on Linux.

Disclaimer: PEAKS does not officially support any OS other than Windows as of the time I am writing the post. The software may not be fully functional. Activating the software on OS X or Linux will consume the license, which means the same license can not be used again. I strongly recommend only following the steps to configure PEAKS Viewer (the unlicensed Studio) on OS X or Linux for PEAKS result sharing and presentation purposes.

Before you start

PEAKS is a Java program. So before porting PEAKS to Mac, we will make sure that Java is installed. Open a terminal window and type in the command:
java -version
If Java is installed, the version information will be displayed. In OS X Mountain Lion, if Java is not installed, this command will also trigger a window for Java installation.

Get the files

Since PEAKS only have the installer for Windows, you will need a Windows computer to install PEAKS and copy the installed files over to Mac.

To proceed, download PEAKS from the website on a Windows PC. Run the installer, follow the on screen instructions to complete the installation. By default, PEAKS 6 will be installed on C:\PeaksStudio6 directory. Copy the directory to a USB drive and copy it to Mac OS X, e.g. /Users/userx/PeaksStudio6.

Configure PEAKS on OS X

Open a Terminal window and change the directory to the PEAKS directory, for example, /Users/userx/PeaksStudio6, by typing the command:
cd /Users/userx/PeaksStudio6
We need to replace the Windows version JRE with the one installed in OS X:
rm -r -f jre
mkdir jre
cd jre
mkdir bin
cd bin
ln -s /usr/bin/java java.exe
cd /Users/userx/PeaksStudio6
We want to make sure PEAKS starts in one JVM (type the following on one line with a white space after ".jar"):
jre/bin/java.exe -cp peaksstudio.jar
In the performance configuration dialog, select "Manually configure PEAKS performance" option and make sure that the "Start Client Separately" and "Start Compute Node Separately" checkboxes are unchecked. Click "Apply" and close the dialog.

Use a text editor, e.g. vim, to create the start up script
jre/bin/java.exe -Xmx12000m -splash:splash.png -jar peaksstudio.jar
The number 12000 means that PEAKS can use up to 12GB of RAM. You can change this value based on your computer configuration, but a higher amount is always preferred.

We need to make the script executable:
chmod u+x
Now you can start PEAKS by simply run the script:
There are one more thing to do. When PEAKS is opened, go to Preferences. In the "General" section, change the default project folder to a correct directory in OS X.

Now you can view your PEAKS results on a Mac!

Tuesday, March 19, 2013

New discovery or an error?

We have seen an interesting search result that the precursor mass error plot forms two clusters. The dataset was generated with Thermo LTQ-Orbitrap instrument. Will this be related to some new science discovery or is it caused by an error? Our scientist took a closer look to find out the reason.

In the result summary view, PTM profile section, we noticed that many PSMs have the Deamidation modification.
We then look at some of those PSMs and found out that the precursor mass reported by the instrument is wrong. Instead of reporting the m/z of the monoisotopic peak, in many cases the instrument reported the m/z of the highest isotopic peak. Thus resulting a mass shift of 1Da on the precursor. Here is an example. For this charge 3 peptide, the instrument reports precursor m/z 1057.17, but the monoisotopic peak is at m/z 1056.83.
At this point, we know why there are two clusters on the plot. The software try to explain the wrong precursor mass by adding a Deamidation.

So how can we fix this? We re-run the search. This time, we checked correct precursor mass option in the data refine stage. PEAKS will try to determine the correct precursor mass by look at the survey scan instead of blindly trust values the instrument reported. Here are the new plots and you no longer see two clusters.

Comparing the two search results, we not only removed the false Deamidation but also able to explain 10% more spectrum at 1% FDR.

Monday, March 18, 2013

Decoy Fusion on traditional target + decoy database

We were asked a question today by a PEAKS user about FDR result validation. He used PEAKS DB for peptide identification and enabled the built-in decoy fusion method to estimate the FDR. When examining the result, he realized that the FASTA database used for the search is a concatenation of target and decoy proteins. So his question is that is the FDR control still valid or does he have to re-run the search.

The decoy fusion method concatenate the decoy and target sequences of the same protein together as a "fused" sequence (detail explanation can be found here). This ensures that the target and decoy lengths are always the same. If in the searched database, the decoy length is the same as the target length, then PEAKS DB with decoy fusion searched exactly three times the decoy length.

As long as the decoy protein in the searched database is distinguishable, the user can simply discard those hits. The FDR reported by PEAKS is still safe to be used as it only becomes more conservative. 

Friday, March 15, 2013

'Experiment Control' is a valuable tool

There are so many factors that can make database search engines fail to produce a decent result. Today I saw such an interesting case.

A user send in a dataset generated from Orbitrap instrument. When he used PEAKS to analyze the data, surprisingly, he only got very few PSMs. I examined the figures in the 'Experiment Control' section in the 'Summary View' and noticed that the precursor mass error distribution of the PSMs is strange. So I increased the parent ion error tolerance from 10ppm to 20ppm and got very good result. One third of the spectrum have been identified under 1% FDR.

Now I looked at the figures again. It is very clear that there is 12ppm accuracy shift!

The figures in the 'Summary View' sometimes can help identify the problem when the search result is not ideal. In this case, the instrument is not well calibrated.