Home       PoPS       Utilities       Documentation -> Batch Docs       FAQ       About


Batch Predictions Documentation

The batch predictions module, available from the utilities page, takes as input a PoPS specificity model and a list of substrates in fasta format. Each substrate is processed, and a file is returned containing a reasoning table for each substrate, listing the scores for each of the top N (a user-supplied number) hits in each substrate.


Contents
Specifying the number of scores per reasoning table  

Loading a PoPS specificity model  

Loading a substrate file  

The results file  

Specifying the number of scores per reasoning table

The first input to the batch module is a number (integer value) N that restricts the output to N values for each substrate. For example, entering '3' will result in the top 3 scores being returned for each substrate. These are reported in the 'reasoning table' format described below.

Loading a PoPS specificity model

Use the "Browse ..." button to search for a model file. This file should be consistent with the PoPS protease model format.

Loading a substrate file

Use the browse button to search for a file containing one or more substrates. Note that there is no limit to the number of substrates to be analysed, but the bigger the input file, the longer the program will take, and the larger the output file will be.

The substrate file should be in fasta format. This format entails that for each substrate:

  • There is a description followed by the substrate amino acid sequence in single-letter encoding.
  • The description starts with the ">" symbol.
  • The ">" is usually followed immediately by the sequence ID and then a description, although both of these are optional.
  • Lines should not contain more than 80 characters, although the batch predictions module can accept more characters than this in a line.
  • The current substrate sequence ends when a line is found that begins with the ">" symbol, indicating a description for a new substrate.
  • The batch predictions module will accept blank lines between entries.

    For example, here are some protein sequences in fasta format, as they might appear in the substrate file:

    >gi|4557777|ref|NP_000249.1| myosin light chain 3 [Homo sapiens]
    MAPKKPEPKKDDAKAAPKAAPAPAPPPEPERPKEVEFDASKIKIEFTPEQIEEFKEAFMLFDRTPKCEMK
    ITYGQCGDVLRALGQNPTQAEVLRVLGKPRQEELNTKMMDFETFLPMLQHISKNKDTGTYEDFVEGLRVF
    DKEGNGTVMGAELRHVLATLGERLTEDEVEKLMAGQEDSNGCINYEAFVKHIMSS

    >Some Description
    MAEPRQEFEVMEDHAGTYGLGDRKDQGGYTMHQDQEGDTDAGLKESPLQTPTEDGSEEPGSETSDAKSTP
    TAEDVTAPLVDEGAPGKQAAAQPHTEIPEGTTAEEAGIGDTPSLEDEAAGHVTQARMVSKSKDGTGSDDK
    KAKGADGKTKIATPRGAAPPGQKGQANATRIPAKTPPAPKTPPSSGEPPKSGDRSGYSSPGSPGTPGSRS
    >
    EESKPKPRSFMPNLVPPKIPDGERVDFDDIHRKRMEKDLNELQALIEAHFENRKKEEEELVSLKDRIERR
    RAERAEQQRIRNEREKERQNRLAEERARREEEENRRKAEDEARKKKALSNMMHFGGYIQKQAQTERKSGK
    NQKVSKTRGKAKVTGRWK
    >gi|2276312|emb|CAB06656.1| CIP1 (WAF1, P21, MDA-6) [Homo sapiens]
    MSEPAGDVRQNPCGSKACRRLFGPVDSEQLSRDCDALMAGCIQEARERWNFDFVTETPLEGDFAWERVRG
    LGLPKLYLPTGPRRGRDELGGGRRPGTSPALLQGTAEEDHVDLSLSCTLVPRSGEQAEGSPGGPGDSQGR
    KRRQTSMTDFYHSKRRLIFSKRKP

    The results file

    For each substrate, the results are printed out in the reasoning table format, as described in the viewing results section of the PoPS manual, and as shown below. Assuming the number of results per reasoning table was three, an example reasoning table would look like this:
            Minimum Score: -3.0         Maximum Score: 15.0
            Position S2 S1 S1' Total
            Tyr58-Gly59 3.0 2.0 10.0 15.0
            His79-Asp80 0.0 2.0 9.0 11.0
            His40-Tyr41 5.0 -3.0 8.0 10.0

    Each table contains the top N scores, in this case the top 3 scores, for the current substrate. Note, however, that unlike the main PoPS program output, this table is sorted in order of highest to lowest score, regardless of where the cleavage occurs. The main PoPS program, in contrast, outputs the predicted cleavages in order of substrate sequence. Therefore, in the example above, the predicted cleavage at His40-Tyr41 is reported third in the table because it is the third-to-highest score, even though it occurs before Tyr58-Gly59 and His79-Asp80 in terms of the substrate amino acid sequence.

    One table is created for each substrate (each with the N top scores for the respective substrate), and these are returned to the user in a text file, which can be downloaded and saved to disk.


    Top       Home       PoPS       Utilities       Documentation -> Batch Docs       FAQ       About


    Victorian Bioinformatics Consortium PoPS Logo Monash University