Home       PoPS       Utilities       Documentation -> Proteome Docs       FAQ       About


Proteome Predictions Documentation

The proteome predictions module, available from the utilities page, takes as input an email address, a PoPS specificity model, a selected organism and a number of output parameters. All the known proteins from the selected organism are processed, and a number of files containing different interpretations of the output are returned to the user.


Contents
Your email address  

Select the organism  

Select the preferred threshold option  

Set a limit on scores above the threshold  

Supply a PoPS specificity model  

Proteome predictions results files  

Your email address

This is very straightforward: type the full email address for where you would like to receive the results of your analysis. Note that the email itself will be very small, because it does not contain the results files themselves, but instead it contains web links to the location of those files on our server.

Select the organism

Again, this is straightforward: select your organism of interest. On submitting the form data, your PoPS specificity model will be processed against all the known proteins of the selected organism. These proteins are obtained from the RefSeq Project available from the NCBI web site. Currently, PoPS provides proteome analysis for the following organisms:

  • Homo sapiens (human):     25,835 proteins
  • Saccharomyces cerevisiae (baker's yeast):     5,857 proteins
  • Escherichia coli K12:     4,311 proteins
  • Drosophila melanogaster (fruit fly):     18,267 proteins
  • Arabidopsis thaliana (thale cress):     28,767 proteins
  • Rattus norvegicus (Norway rat):     21,948 proteins
  • Mus musculus (house mouse):     40,534 proteins
  • Danio rerio (zebrafish):     1,170 proteins

    Select the preferred threshold option

    There are four settings available for this option:

    Set the threshold to an integer value N:
    For each protein in the proteome, every score greater than or equal to the specified value N is returned, i.e.:

    For each protein, all scores >= N are returned.

    Set the threshold to the maximum score in each protein (substrate):
    For every protein in the proteome, the score(s) equal to the maximum score calculated for the respective protein is returned, i.e.:
    For each protein p, all scores = maximum score of p are returned.

    For example, the first protein has a maximum score of 12.0, so all scores in this protein that are equal to 12.0 are returned. The second protein has a maximum of 15.0. For thisprotein, all scores equal to 15.0 are returned, and so on.

    Set the threshold to the minimum score (all predictions in each substrate except -Infinity):
    For every protein in the proteome, all scores equal to the minimum score calculated for the respective protein is returned, i.e.:

    For each protein p, all scores >= minimum score of p are returned.

    The effect of this option is to return all scores for all proteins, except scores equal to -Infinity, i.e. those scores involving the `#' symbol, or those scores at the ends of each protein where there are more subsites (defined in the model) than amino acids. See the PoPS manual for more details on how the score is calculated.

  • Return all scores (includes -Infinity):
    This option returns every score calculated for every protein in the proteome. Note that although this option is provided, it is not recommended as the results set will be extremely large.

    Set a limit on scores above the threshold

    By setting this value, the results are limited to returning only proteins with no more than the specified number of scores above the threshold.

    For example, assume that the threshold has been set to 15.0, and the limit has been set to 3. For a given protein in the proteome, all the scores for the protein are calculated, and then all the scores >= 15.0 are collated, as described above. Finally, the protein will only be returned in the results if the number of scores >= 15.0 (above the threshold) is <=3, i.e. if there are 4 or more scores >= 15.0, this protein will NOT be returned in the results. This process is repeated for every protein in the proteome.

    If the default setting "No Limit" is used, this option is ignored.

    Supply a PoPS specificity model

    Use the "Browse ..." button to search for a model file. This file should be consistent with the PoPS protease model format.

    Proteome predictions results files

    The results are returned to the user as an email containing links to all the output files. By clicking on each link, it is possible to retrieve each output file from the server. The output files are available for 24 hours after the analysis is performed, after which they are removed. If you have not saved your results to disk by this time, you will need to resubmit your data.

    In processing the proteome for possible cleavage sites, PoPS takes into account the accessibility data stored in the ASA database (described in the PoPS manual. Thus the output contains information about the possible accessibility of the predicted cleavages, as described below. For the purpose of the proteome analysis, a residue/amino acid is predicted as accessible to the active site if it is at least 33% solvent accessible.

    There are two types of output file, histograms and summary files. The histogram files consist of the following:

    The frequency of the total the number of predictions:
    Example predictions histogram.
    Click for larger view
    This histogram reports the number of number of substrates vs. the maximum score in each substrate. In this histogram 834 proteins have one cleavage greater than or equal to the threshold in the respective protein, and 25 proteins have 4 scores greater than or equal to the threshold. The title reports the name of the histogram, and the requested threshold, limit on the number of scores per protein, and the number of proteins shown from the number of proteins that were processed.
    This histogram is available with or without the buried predictions included the output. When the buried predictions are not included, they are reported in the '0' bar, as shown in this example (642 proteins).

    Frequency of maximum scores:
    Example predictions histogram.
    Click for larger view
    This histogram reports the number of number of substrates vs. the maximum score in each substrate. For example, in this histogram 14 proteins have a maximum score between 31.25 and 31.75, while 3 proteins have a maximum score between 32.25 and 32.75. The title reports the name of the histogram, and the requested threshold, limit on the number of scores per protein, and the number of proteins shown from the number of proteins that were processed.
    When the difference between the maximum score and the threshold is < 10, the bars are calculated at intervals of 0.5, between 10 and 20 at intervals of 1.0, and > 20 at intervals of 5.0.
    This histogram is available with or without the buried predictions included the output. In the latter case, the buried predictions are not included in the results.

    The summary files consist of the following:

    The reasoning table file:

    For each protein analysed, a modified reasoning table is produced for each substrate containing the top 5 scores/locations. Each table begins with summary data:

  • The description of the protein from the RefSeq database

  • The number of scores calculated for the protein sequence (includes buried scores)

  • The number of scores predicted as buried (minimum solvent accessibility of 33%)

  • Maximum score for the given protein sequence

  • Minimum score for the given protein sequence

  • Number of score above the threshold (includes buried scores)

  • Number of score above the threshold buried (again, minimum solvent accessibility of 33%)
  • Following this information is the table itself. The first column reports the location of the cleavage, described as the P1/P1' residues (single letter encoding) together with their sequence location. This column is followed by the score for the respective location. Finally, addition, for each of those scores, two columns report the number of times the predicted cleavage is reported as accessible (A) or buried (B) in the ASA database, described in the viewing results section of the PoPS manual. Note that when a cleavage is reported as accessible, it may be that accessibility is unknown for that cleavage (and is therefore reported as accessible).

    >gi|16127998|ref|NP_414545.1| threonine synthase [Escherichia coli K12]
    Number of scores: 421
    Number of scores buried: 411
    Maximum score: 19.24
    Minimum score: 12.66
    Number of scores above threshold: 4
    Number of scores above threshold buried: 3
    Pos           Score     A     B
    L6-K7         -Inf     2     0
    R218-L219     12.66     0     2
    R341-E342     19.24     0     2
    R362-D363     14.85     0     2
    R421-K422     17.07     2     0

    The list of hits (names only):

    This file reports the RefSeq Project identifier for each protein returned in the results set, and the maximum score for that protein. For example:

    ...

    >gi|4502103|ref|NP_003559.1| (NM_003568) annexin A9; annexin 31; annexin XXXI [Homo sapiens]
    Maximum score: 32.24

    >gi|34222121|ref|NP_076962.2| (NM_024057) nucleoporin Nup37 [Homo sapiens]
    Maximum score: 32.65

    >gi|13376170|ref|NP_079074.1| (NM_024798) sorting nexin 22 [Homo sapiens]
    Maximum score: 32.29

    ...


    Top       Home       PoPS       Utilities       Documentation -> Proteome Docs       FAQ       About


    Victorian Bioinformatics Consortium PoPS Logo Monash University