- Background
- Tandem mass spectrometry-based database searching is currently the main
method for protein identification in shotgun proteomics. The explosive growth
of protein and peptide databases, which is a result of genome translations,
enzymatic digestions, and post-translational modifications (PTMs), is making
computational efficiency in database searching a serious challenge. Profile
analysis shows that most search engines spend 50%-90% of their total time
on the scoring module, and that the spectrum dot product (SDP) based scoring
module is the most widely used. As a general purpose and high performance
parallel hardware, graphics processing units (GPUs) are promising platforms
for speeding up database searches in the protein identification process.
- Versions
- The first version, GPUScorer 1.0
- Generating theoretical and experimental spectra from X!Tandem
- Format the above spectrum data by our indexer
- Both single GPU and GPU cluster module
- User's guide
- We test the code on a GPU cluster that includes one master node (mu01)
and four computing nodes (Fermi.1-4), as shown in Figure 1 and Table 1 in
our paper. All of the nodes have a Xeon E5620 and two Nvidia GeForce GTX580
cards. Each GTX580 has 512 cores, performing at 1.54 GHz and with a peak
memory bandwidth of 192.4 GB/sec. The CPU-based programs are in C++ language,
and the GPU-based program uses CUDA 4.2.
- To try our GPUScorer,
you could run our simulation program, by choosing option 1, which generates
the theoretical and experimental spectra randomly, and compare the running
of CPU and GPU version.
- In addition, you could follow these three steps to test the real data.
- The first step is downloading SpecGenerater,
which generates the exactly same theoretical and experimental spectra with
that in the "dot()" function of X!Tandem. We develop the program based on
the source code of X!Tandem (2011.12.01). And the user could see the guide
of X!Tandem on the input parameter.
- The output of SpecGenerater is a group files of original theoretical and
experimental spectra.
- The second step is downloading the Indexer,
which re-organizes the raw data, generated by the SpecGenerater, and creates
the input data for the GPUScorer. When we complete the whole protein identification
search engine, we will re-construct SpecGenerater, and consider the indexer
as a bridge between the SpecGenerater and GPUScorer. In the following work,
we will also overload indexer, process other spectrum format, and let other
user use our GPUScorer more conveniently.
- The input of Indexer is the file names from the output of SpecGenerater.
The output of indexer is a meta file, together with a group of sorted theoretical
and experimental spectra.
- The third step is downloading the GPUScorer, which calculate the spectral
dot product between theoretical and experimental spectra
- The input of GPUScorer is the meta file from the indexer, together with
precursor and ion mass tolerance .
- The output of GPUScorer is the top 1 theoretical spectrum for each experimental
spectrum.
- Contact us
- E-mail: youli(AT)comp.hkbu.edu.hk
- Reference
- List in the paper