Utilities
Documentation -> Proteome Docs
FAQ
About
The proteome predictions module, available from the utilities page, takes as input an email address, a PoPS specificity model, a selected organism and a number of output parameters. All the known proteins from the selected organism are processed, and a number of files containing different interpretations of the output are returned to the user.
This is very straightforward: type the full email address for where you would like to receive the results of your analysis. Note that the email itself will be very small, because it does not contain the results files themselves, but instead it contains web links to the location of those files on our server.
Again, this is straightforward: select your organism of interest. On submitting the form data, your PoPS specificity model will be processed against all the known proteins of the selected organism. These proteins are obtained from the RefSeq Project available from the NCBI web site. Currently, PoPS provides proteome analysis for the following organisms:
Select the preferred threshold option
There are four settings available for this option:
Set the threshold to an integer value N:
For each protein in the proteome, every score greater than or
equal to the specified value N is returned, i.e.:
Set the threshold to the minimum score (all predictions in each substrate except -Infinity):
For every protein in the proteome, all scores equal to the
minimum score calculated for the respective protein is returned, i.e.:
Set a limit on scores above the threshold
By setting this value, the results are limited to returning only proteins with no more than the specified number of scores above the threshold.
For example, assume that the threshold has been set to 15.0, and the limit has been set to 3. For a given protein in the proteome, all the scores for the protein are calculated, and then all the scores >= 15.0 are collated, as described above. Finally, the protein will only be returned in the results if the number of scores >= 15.0 (above the threshold) is <=3, i.e. if there are 4 or more scores >= 15.0, this protein will NOT be returned in the results. This process is repeated for every protein in the proteome.
If the default setting "No Limit" is used, this option is ignored.
Supply a PoPS specificity model
Use the "Browse ..." button to search for a model file. This file should be consistent with the PoPS protease model format.
Proteome predictions results files
The results are returned to the user as an email containing links to all the output files. By clicking on each link, it is possible to retrieve each output file from the server. The output files are available for 24 hours after the analysis is performed, after which they are removed. If you have not saved your results to disk by this time, you will need to resubmit your data.
In processing the proteome for possible cleavage sites, PoPS takes into account the accessibility data stored in the ASA database (described in the PoPS manual. Thus the output contains information about the possible accessibility of the predicted cleavages, as described below. For the purpose of the proteome analysis, a residue/amino acid is predicted as accessible to the active site if it is at least 33% solvent accessible.
There are two types of output file, histograms and summary files. The histogram files consist of the following:
The frequency of the total the number of predictions:
Click for larger view |
This histogram reports the number of number of substrates
vs. the maximum score in each substrate. In this histogram 834
proteins have one cleavage greater than or equal to the threshold
in the respective protein, and 25 proteins have 4 scores greater
than or equal to the threshold. The title reports the name of
the histogram, and the requested threshold, limit on the number
of scores per protein, and the number of proteins shown from the
number of proteins that were processed.
This histogram is available with or without the buried predictions included the output. When the buried predictions are not included, they are reported in the '0' bar, as shown in this example (642 proteins). |
Frequency of maximum scores:
Click for larger view |
This histogram reports the number of number of substrates vs. the
maximum score in each substrate. For example, in this histogram 14
proteins have a maximum score between 31.25 and 31.75, while 3
proteins have a maximum score between 32.25 and 32.75. The title
reports the name of the histogram, and the requested threshold, limit
on the number of scores per protein, and the number of proteins shown
from the number of proteins that were processed.
When the difference between the maximum score and the threshold is < 10, the bars are calculated at intervals of 0.5, between 10 and 20 at intervals of 1.0, and > 20 at intervals of 5.0. This histogram is available with or without the buried predictions included the output. In the latter case, the buried predictions are not included in the results. |
The summary files consist of the following:
The reasoning table file:
For each protein analysed, a modified reasoning table is produced for each substrate containing the top 5 scores/locations. Each table begins with summary data:
Following this information is the table itself. The first column reports the location of the cleavage, described as the P1/P1' residues (single letter encoding) together with their sequence location. This column is followed by the score for the respective location. Finally, addition, for each of those scores, two columns report the number of times the predicted cleavage is reported as accessible (A) or buried (B) in the ASA database, described in the viewing results section of the PoPS manual. Note that when a cleavage is reported as accessible, it may be that accessibility is unknown for that cleavage (and is therefore reported as accessible).
>gi|16127998|ref|NP_414545.1| threonine synthase [Escherichia coli K12]
Number of scores: 421
Number of scores buried: 411
Maximum score: 19.24
Minimum score: 12.66
Number of scores above threshold: 4
Number of scores above threshold buried: 3
Pos Score A B
L6-K7 -Inf 2 0
R218-L219 12.66 0 2
R341-E342 19.24 0 2
R362-D363 14.85 0 2
R421-K422 17.07 2 0
The list of hits (names only):
This file reports the RefSeq Project identifier for each protein returned in the results set, and the maximum score for that protein. For example:
...
>gi|4502103|ref|NP_003559.1| (NM_003568) annexin A9; annexin 31; annexin XXXI [Homo sapiens]
Maximum score: 32.24
>gi|34222121|ref|NP_076962.2| (NM_024057) nucleoporin Nup37 [Homo sapiens]
Maximum score: 32.65
>gi|13376170|ref|NP_079074.1| (NM_024798) sorting nexin 22 [Homo sapiens]
Maximum score: 32.29
...
Utilities
Documentation -> Proteome Docs
FAQ
About
![]() |
![]() |
![]() |