Scoring Module of Mass Spectrometry-Based

Background

Tandem mass spectrometry-based database searching is currently the main method for protein identification in shotgun proteomics. The explosive growth of protein and peptide databases, which is a result of genome translations, enzymatic digestions, and post-translational modifications (PTMs), is making computational efficiency in database searching a serious challenge. Profile analysis shows that most search engines spend 50%-90% of their total time on the scoring module, and that the spectrum dot product (SDP) based scoring module is the most widely used. As a general purpose and high performance parallel hardware, graphics processing units (GPUs) are promising platforms for speeding up database searches in the protein identification process.

Versions

The first version, GPUScorer 1.0

Generating theoretical and experimental spectra from X!Tandem
Format the above spectrum data by our indexer
Both single GPU and GPU cluster module

User's guide

We test the code on a GPU cluster that includes one master node (mu01) and four computing nodes (Fermi.1-4), as shown in Figure 1 and Table 1 in our paper. All of the nodes have a Xeon E5620 and two Nvidia GeForce GTX580 cards. Each GTX580 has 512 cores, performing at 1.54 GHz and with a peak memory bandwidth of 192.4 GB/sec. The CPU-based programs are in C++ language, and the GPU-based program uses CUDA 4.2.

To try our GPUScorer, you could run our simulation program, by choosing option 1, which generates the theoretical and experimental spectra randomly, and compare the running of CPU and GPU version.

In addition, you could follow these three steps to test the real data.

The first step is downloading SpecGenerater, which generates the exactly same theoretical and experimental spectra with that in the "dot()" function of X!Tandem. We develop the program based on the source code of X!Tandem (2011.12.01). And the user could see the guide of X!Tandem on the input parameter.

The output of SpecGenerater is a group files of original theoretical and experimental spectra.

The second step is downloading the Indexer, which re-organizes the raw data, generated by the SpecGenerater, and creates the input data for the GPUScorer. When we complete the whole protein identification search engine, we will re-construct SpecGenerater, and consider the indexer as a bridge between the SpecGenerater and GPUScorer. In the following work, we will also overload indexer, process other spectrum format, and let other user use our GPUScorer more conveniently.

The input of Indexer is the file names from the output of SpecGenerater.
The output of indexer is a meta file, together with a group of sorted theoretical and experimental spectra.

The third step is downloading the GPUScorer, which calculate the spectral dot product between theoretical and experimental spectra

The input of GPUScorer is the meta file from the indexer, together with precursor and ion mass tolerance .

The output of GPUScorer is the top 1 theoretical spectrum for each experimental spectrum.

Contact us

E-mail: youli(AT)comp.hkbu.edu.hk

Reference

List in the paper