PEAKS Blog: May 2013

Monday, May 27, 2013

CHAMPS - Antibody Sequencing Service

The PEAKS team is proudly announcing the launch of the Antibody Sequencing Service - CHAMPS.

Backed by the market leading de novo sequencing algorithm and PEAKS complete analysis workflow, our scientists are able to offer a fast and professional service for obtaining primary sequences of monoclonal antibodies with modifications.

Please contact us at champs@bioinfor.com for a test drive! More information of the service is here.

Wednesday, May 22, 2013

More than a decade of PEAKS

Staring at the calendar, I can not believe it is almost half way into the year of 2013. The PEAKS team is hard at work and strive to better serve the mass spec proteomics community by adding tons of new features to every releases.

Ever since the first release, PEAKS has gone through many iterations and/or dramatic changes to become what it is today.

What new features will be in this year's release? Stay tuned!

Thursday, May 16, 2013

PEAKS User Meeting co-located with ASMS 2013

We will be hosting the 7th Annual PEAKS User Meeting on June 9th, 2013 at Minneapolis Convention Center, Room 103A, co-located with ASMS conference.

Please join us and register today to reserve your seat!

http://www.bioinfor.com/peaks/corp/conferences/peaks-asms-2013.html

Here are the tentative agenda.

12:30 - 1:00	Lunch
1:00 - 1:30	Facts and Fallacies about de Novo Sequencing and Database Search Dr. Bin Ma, CTO at BSI and Professor at the University of Waterloo
1:30 - 1:50	Automatic Validation of de Novo Sequencing Result Lian Yang, Research Scientist at BSI
1:50 - 2:10	Antibody Sequencing with LC-MS/MS Dr. Baozhen Shan, Senior Application Scientist at BSI
2:10 - 2:30	Common Use Cases of PEAKS Studio Dan Maloney, Application Scientist at BSI
2:30 - 3:30	Free Discussion. Ask the onsite BSI employees for questions and best practices about using PEAKS in your specific application.

Monday, May 13, 2013

Multiple enzymes support in PEAKS - Full Protein Coverage

PEAKS 6 introduced a new feature specifically targeting the experiments that use multiple enzyme digestions to increase protein coverage.

In the past, users have to search each sample separately and combine all the results manually afterwards or using none enzyme option to analyze all samples in one go which may cause higher false positives. Now in PEAKS, users can specify enzyme for each sample when creating a multi-sample project. Then in de novo sequencing and PEAKS DB, user can choose 'sample enzyme' in the enzyme list as the search option. PEAKS will use the correct enzyme when analyzing each sample.

From our users feedback, this feature is extremely useful when you want to fully characterize a single protein.The following example shows how big a difference this feature may make.

ALBU_BOVIN protein ordered from a reputable vendor was digested with Trypsin, LysC, GluC. The dataset is generated from Thermo Orbitrap instrument. Three searches were performed. The first one uses inChorus function to launch Mascot search (version 1.4) on the trypsin sample only. The second search uses standard PEAKS DB search on the trypsin sample. The third search uses the complete analysis workflow, including PEAKS PTM and SPIDER, on all three samples and uses "sample enzyme" as the enzyme option. The results are all filtered to only keep the very confident PSMs at 0.1% FDR level.

Mascot and PEAKS DB are able to achieve 73% and 86% protein coverage using only the trypsin sample respectively. In the protein coverage view below, the blue bars are the PSMs that matched the protein sequence at that position.

PEAKS complete analysis on all three samples reported 96% coverage on the protein. The uncovered 4% is in the protein N-terminal region, which is most likely cleaved-off and not in the purchased sample¹.
¹specific binding site (Asp-Thr-His-Lys) for Cu(II) ions. T. Peters Jr., F.A. Blumenstock. J. Biol. Chem., 242 (1967), p. 1574

Monday, May 6, 2013

Configure FASTA database in PEAKS

Configuring FASTA databases in PEAKS is fairly easy especially if the FASTA file has the same header format as one of the public databases (e.g. NR, Swiss-Prot, IPI). It is just a matter of selecting the pre-defined format and the parsing rules will be automatically filled in.

There are also a large number of users use PEAKS to search on their in-house, customized FASTA databases. In this situation, the header format is very hard to predict and it varies case by case.

In PEAKS, the parsing rule is defined using regular expression. While regular expression is very powerful, it will take people quite a bit of time to master it. Since we got tons of searches to run every week, against FASTA files with so many different header formats, I created this lazy, generic parsing rule for internal use and in most cases, it worked good enough.

Accession. The regular expression tries to use everything before the first white space as the accession. If no white space were found within the first 30 characters, the first 30 characters will be used as accession.

>\([^\s|]{1,30}\)

Description. The whole line after ">" will be used as the description.

>\(.*\)

Wednesday, May 1, 2013

100% vs 50% CPU usage, twice as fast? Not really!

Some user observed that when performing a search, the CPU usage for PEAKS would only go up to 50%. Why PEAKS does not use 100% of the CPU?

The observation is for sure valid, but the CPU usage reported by Windows Task Manager is somewhat misleading. 100% CPU usage does not mean the program is running twice as fast as under 50% CPU usage. The reason for this is, in my opinion, due to the Hyper-Threading technology most Intel CPUs have enabled by default. While the technology can improve the performance and responsiveness of a computer in some situation, it does not help much for computation heavy application, like PEAKS.

I did a performance test on a desktop PC with the following specification, a quad-core CPU with lots of RAM.

Intel i7 3770 3.4GHz CPU (quad core with hyperthreading)
16GB RAM
System drive SSD, data drive 7200RPM HDD
Windows 8 Pro 64bit

The dataset contains about 9000 MS/MS spectrum from Thermo Orbitrap instrument. PEAKS de novo was manually configured to run on 1, 2, 4 and 8 threads configuration respectively. For each configuration, two searches were done, one with hyper-threading enabled, one with hyper-threading disabled. So in total, there were 8 runs, a clean project was created for each run and the PC was rebooted between the runs.
With HT enabled, there is only about 10% performance gain running 8 threads (100% CPU usage) than running 4 threads (50% CPU usage). The search indeed run slightly faster, but the computer is not very responsive for even the simplest tasks like email, excel, etc.