Home       PoPS       Utilities       Documentation -> PoPS Manual       FAQ       About

The PoPS Manual
Contents
Providing a Substrate  
How to supply your substrate to PoPS, and how it is handled by the program.
Specifying the Protease Model  
Provides information on how to provide a model of the protease. Includes the following subsections:
Viewing the results  
How to view and interpret your results. Includes the following subsections:
The PoPS Models Database  
Discusses the PoPS database. Includes the following subsections:
Submitting a Protease Model as a File  
How to load a file containing a protease specificity model in to PoPS.
Saving a Protease Model to a File  
How to save a protease specificity model from PoPS.
Saving the predictions to a File  
How to save your predictions to a file.
Batch Processing of Predictions  
How to predict cleavage of multiple substrates by one protease.


Providing a Substrate

The substrate sequence is supplied to PoPS by pasting in the sequence into the text area at the top of the program. The amino acids must be represented using the 1-letter encoding. For example, glycine is represented by G, alanine as A and so on. Lower case letters will be converted to upper case letters before prediction. All other characters apart from those representing the alpha amino acids will be removed for the purposes of calculations (for example B,J,O,U,X,Z and punctuation marks are all considered illegal characters). The text area can be cleared (for example, to paste another sequence) by using the "Clear Substrate" button.

Text area for submitting the substrate sequence.

The text area where the substrate sequence is pasted. The button at the top right (Clear Substrate) is used to clear the text area.

Top

Specifying the Protease Model

The specificity model of a protease is built by specifying for each subsite the preferred amino acid(s), and the subsite's relative importance. (The contribution of these to substrate binding and cleavage are discussed on the Background page).

Defining the number of subsites.

The user must specify how many subsites determine cleavage. Any number of subsites can be added or deleted by using the relevant add/delete subsite buttons, to the right of the display. There must be at least one S and S' subsite. A detailed description of the S and S' notation is here. Initially, the specificity profile has two subsites: S1 and S1'.

Image of the Protease Specificity Model Window.
Image of the default protease specificity model, before editing.

The number of subsites that can be defined is not limited, but obviously the more subsites that are defined, the slower the computation time. It is therefore advisable not to specify more subsites than necessary. The definition of subsites is continuous, which means, for example, it is not possible to define the S3 and S5 subsites without also specifying the S4 subsite. To force the program to ignore a subsite, set the weight to 0 (the weights and their meaning are discussed below).

Defining the specificity profile for each subsite.

For each subsite, it is necessary to define the preferences for the 20 alpha amino acids by assigning each of them a score within the subsite. This is known as the specificity profile. To create/edit the specificity profile, click on the profile button in the matrix panel (which is initially set to "Any Amino Acid"). This will open a new dialog which allows the user to edit the profile.

Window to edit the specificity profile.

To create a profile, a number of predefined profiles are available in the drop-down list on the left side of the panel. Clicking on one of these profiles will automatically insert the values for each of the amino acids on the right side of the panel. Alternatively, the user can manually customise a profile by entering values directly. Of course, it is also possible to first select a predefined profile, then edit the values.

Each specificity profile contains 20 scores, one for each amino acid. Scores can assume any value ranging from -5.0 to +5.0. Negative values indicate an inhibitory effect on binding, with -1.0 being slightly unfavourable through to -5.0 being very unfavourable. Positive values represent amino acids that have a positive effect on binding, with +1.0 being slightly favourable, through to +5.0 being very favourable. A score of 0.0 indicates the amino acid has neither a positive nor negative effect on binding. If a score is outside of this range, it will be set to the nearest score (i.e. >5.0 is set to 5.0, and <-5.0 is set to -5.0). Finally, the hash (or pound) symbol "#" is reserved to indicate the given amino acid completely prevent cleavage will not occur.

Initially, each specificity profile is set to "Any amino acid", which means the score is set to 0.0 for all the amino acids. If this profile is to be used, it is recommended that the weight also be set to 0, to speed up computation time by causing the program to ignore this subsite.

To prevent numbers outside the allowable range (-5.0 to 5.0) from being truncated to the closest value (as described above), it is possible to scale the profile values, using the "Scale Values" button. There are two scaling options. The first option is to scale between a lower and upper bound. In this case, the minimum value is set to the lower bound, the maximum value to the upper bound, and then all the other values are scaled between the bounds appropriately. If the lower and upper bounds are set to the same value, then the user is alerted to this fact. If the user chooses to continue, then all the values in the specificity profile will be set to the same value, i.e. the upper/lower bound that was entered.

The second scaling option is to set a specific value to be equal to 0.0 after scaling, and to set the upper bound. In this case, the lower bound will be the negative of the upper bound, by default. If the upper bound is entered as a negative number, this will be set to the lower bound, and the upper bound will become the positive of the number entered. It is important to remember to check the box of the scaling option to be used.

Window for scaling the specificity profile.

Defining the relative contribution of each subsite.

The last feature of the specificity model is the relative contribution of each of the subsites to cleavage. This is defined by setting the weight value of a subsite. This value reflects the subsite's contribution, or importance, in determining cleavage, relative to all the other subsites. There is no limit on the values that the weights can take. The default value is 1. As mentioned above, to force PoPS to ignore a subsite, set the weight to 0. For each subsite

The weights are used in conjunction with the specificity profiles to calculate a score reflecting the likelihood of a cleavage occurring. More detailed information on how the scores are calculated can be found in the How Scores are Calculated section of the Background page.

Top

Viewing the Results

Display of the Results

The results of a prediction are displayed in the results section of the program:

Graphical display of the prediction.

The first representation of the results is a textual representation, known as the reasoning display. This information explains why the displayed predictions have been selected. The top line contains the maximum and minimum scores for all the predictions (even those that have not been displayed). The minimum score is the minimum score from all possible cleavages, and therefore does not include cleavages that could never occur, i.e. cleavages that include a "#". The score for these cleavages is negative infinity (-Infinity). For each predicted cleavage, the position of the cleavage (the residues between which it occurs), the subtotals for each subsite, and the total for the prediction are all displayed on one line.

The other representation of the results is a graphical representation. The substrate is printed across the panel, centered vertically. Note that the substrate printed in the results will not contain any illegal characters. The substrate displayed is the substrate after all illegal characters have been removed, immediately prior to the predictions being calculated. Within the substrate sequence, every tenth amino acid (after illegal characters have been removed) is numbered, with the digit appearing directly below the relevant amino acid.

Predicted cleavages are indicated by the downward-pointing arrows, centered between the amino acids either side of the cleavage. The size and colour of the arrows are both used to reflect the likelihood of a cleavage. A positive prediction (positive score) is one where a cleavage is likely to occur, and is drawn in green, where the more intense in the green, the more positive the score. A negative prediction (negative score) is one where a cleavage is not likely to occur, and is drawn in red, where the more intense the red, the more negative the score. In both cases, the greater the absolute value of the score, the thicker the arrow. Predictions with a score of 0 are represented as a straight black line. Information on how the scores are calculated can be found in the How Scores are Calculated section of the Background page.

A larger display of the results can be viewed by clicking on the "Bigger Results Display" button, which will bring up a new window with only the graphical results. This window can be resized to display more of the graphical results at one time (as opposed to the window in the PoPS main page, where only thirty residues can be viewed at a time).

Using the Stringency Setting

The results displayed are subject to the stringency setting. Initially, the stringency is set to 0, meaning that only predicted cleavages that have a score greater than or equal to 0 will be displayed. The stringency can be set to any value (positive or negative) and all predictions with a score greater than or equal to the specified stringency will be displayed. To specify a stringency, click on the "Specify Stringency" button. A dialog box will appear, where the user can enter the preferred stringency value. Clicking "OK" will screen the displayed results to display those predictions with a score greater than or equal to the new stringency. The current stringency is displayed in orange immediately above the graphics panel.

Button to alter stringency.

Predicting Buried Cleavages

PoPS has a database, called the ASA, that contains the proteins of known three-dimensional structure, obtained from the Protein Data Bank (PDB) (H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: The Protein Data Bank. Nucleic Acids Research, 28 pp. 235-242(2000)). Each protein in this database has been passed through a program called DSSP (Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, http://www.cmbi.kun.nl/gv/dssp/, ref: Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577-637). Using a 1.4 angstrom radius molecule, DSSP calculates the accessibility of each residue in the proteins in the database.

When a user checks the "Shade buried predictions" option, a dialog box appears asking the user to enter a value. This value is the minimum percentage that a residue must be exposed to solvent before it will be considered to be accessible to the protease. The default value is 33%.

Set the minimum percentage that is solvent accessible.

If the user clicks OK, the substrate is then aligned with the proteins in the ASA database using 1 iteration of PSI-BLAST. All alignments with an expect value of < 0.001 are selected, and the surface accessibility of those alignments are returned together with the predicted cleavages. The expect value is returned by PSI-BLAST, and indicates the homology between the PDB and substrate sequences. The smaller the e-value, the more likely the two sequences are homologous. The user can select any of the alignments returned by clicking on the "Shade Buried Predictions" check box. A new dialog opens with a list to allow the users to select an alignment (if no alignments are returned, the list will be empty).

Dialog containing the ASA alignments.

Each line of the list contains one result, with a lot of information about the alignment. Firstly, the values in brackets indicate the indices of the first and last buried residues in the alignment. All residues outside of this range will be predicted as accessible, so the most useful information (what is buried) only occurs within this range. The second part of the result is the PDB (Protein Data Bank) descriptor of the protein that has been aligned with the substrate. This includes the PDB id, which means a user can go to PDB and look up the protein. After the PDB descriptor, in brackets, is the expect value (e-value) of the alignment of the PDB protein and the substrate submitted. The output list of ASA results is sorted in order of most homologous sequence to least homologous (smallest e-value to greatest e-value).

Selecting a homologue and clicking "Apply selected" will cause buried residues and predictions to be shaded in grey, in both the reasoning and graphical displays. A prediction will be shaded in grey if one or more residue across the alignment is buried. That is, if there is a residue that would be sitting in the active site for the current prediction, and that residue is buried, the prediction will be shaded grey, even if all the other residues are exposed. If the exposed/buried status of a given residue is unknown, then the residue is treated as exposed.

Sometimes a large number of alignments are returned. A shortcut to trying each alignment in turn is to select the "Apply all ASA" button. This combines all of the asa data returned, so that if a residue is predicted as being buried in any one of the ASA alignments, it will be shaded grey in the results displays. This can be used to get an overall picture of accessible surface area, but individual alignments should be checked, because sometimes the inaccessible residue shifts across by one between one alignment and another. For example, given the sequence THYVD, H may be buried in the first alignment, and Y buried in the second. When all the ASA alignments are combined, both H and Y are indicated as being inaccessible, when in fact it is likely to be one or the other, but not both.

Results display with hidden predictions shaded grey.

Note that if a homologue is not selected, no predictions will be hidden, and the check box will become unselected again. Hidden predictions can be displayed again, and the selected alignment can be changed by reselecting the "Shade buried predictions" check box.

Top

Predicting Secondary Structure

In addition to the accessibility of the cleavage site, it is also interesting to have some idea of the possible secondary structure of the substrate, for example to identify helical regions, which would be less likely to be cleaved.. PoPS uses PSIPRED (Jones, D.T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195-202 (1999)) to predict the secondary structure of the substrate. In testing, PSIPRED achieved an average Q3 score of nearly 78%. To produce the predicted secondary structure, the substrate is initially blasted against the SwissProt non-redundant database. The results of the prediction are displayed below the substrate sequence, in the graphical results display. Helices are respresented with blue coils, sheets are represented by red arrows and random coil is represented with green wavy lines. The intensity of the colouring of these lines represents the confidence with which PSIPRED predicted the respective secondary structure. The more bold and intense the colouring, the more confident the prediction was.

To predict the secondary structure, check the "Predict secondary struct." check box. A dialog will appear while waiting for the results from the server, and when that disappears, the secondary structure will be drawn in the graphical results display. To hide the secondary structure, simply uncheck the box.

Results display with secondary structure drawn.

Top

The PoPS Models Database

PoPS contains database whose content is derived from the MEROPS database (merops.sanger.ac.uk; Rawlings, N.D., O'Brien, E. A. & Barrett, A.J. (2002) MEROPS: the protease database. Nucleic Acids Res. 30, 343-346). Consequently, the PoPS database contains all known proteases, which can be searched for by name, clan, family, and MEROPS identifier. This database provides a repository for protease models, allowing users to save and load models.

Users are strongly encouraged to save models to the database. Protease specificity knowledge is currently scattered throughout journals, or worse not even published, but is expert knowledge retained within laboratories. Therefore, the PoPS database has the potential to become an extremely useful source of information for people working with proteases.

Saving Models to the Database

Once a model has been created in the program, it can be saved by clicking on the "Save Model To Database" button. The first window to appear will be a Login window.

Login Window

To acquire a login name and password, it is necessary to register your details. In the registration process, the user supplies some basic details, and selects a name and password, which is used on future occasions to log in (bypassing the registration process).

Registration Window

The registration information required from the user is simple, and is only used to ensure the integrity of the models that are saved into the database. Note that only the name of a registered user is made available to other users, as part of the model information. No other registration information is made publicly available, either through the program, or any other means.

Once registered (or logged in), it is possible to save the model to the database. First, the protease to which the model belongs must be selected. Then, if the model being saved is based on a previously existing model, the "parent" model needs to be indicated.

Save Model Window

When ready, selecting the "Save Model" button brings up a new window with the information that will be saved together with the model. While some of this information is automatically generated, the user can add information about which species the model is specific to (if applicable), any bibliographic information that might have been used in the generation of the model, and general comments that should be included with the model (for example, how the model was created, or whether it is specific to a particular pH).

Model Save Verification Window

Clicking on the "Save Model" button will save the model to the database. The save can be cancelled at any time by using the "Cancel" buttons in the verification and save windows.

Loading Models From the Database

To load a model from the database, click on the "Load Database Model" button, and a dialog box will appear with a list of all the proteases. This list can be searched by a number of methods, all of which are available in the right-hand panel next to the list of names.

Dialog to load a model from the database.

Select the protease by clicking on it, and then click the "List Models" button. This produces the list of models available for the given protease. To select a model, click on the name of the model. If necessary, check the selected model's details before loading by clicking on the "Display Model Details" button. The model can be loaded without checking the details by clicking on the "Load Model" button. The "Cancel" button will return the user to the main program without loading a model.

The "Show Model Details" button produces a new window displaying all the information that was saved with the model. This includes the creation details (e.g. bibliographic details), species details, and ratings from other users. To see any specific comments provided with a model rating, double click on the relevant rating and the comments (if any) will appear in the box below. The "Load This Model" button loads the model, while the "Close" button closes the window, and returns the user to the list of models, to select a different model.

Model Information Window.

Rating Models From the Database

To keep track of the utility of database models, it is possible for users to save comments about a model. Rating models does not require registration or login. To rate a model, click on the "Rate Database Model" button:

Button to Rate a Database Model

The dialog button brings up a dialog very similar to the load and save model dialogs. Select the required model (by selecting the relevant protease then model), then click on the "Rate This Model" button. In the new dialog window that opens, it is possible to choose a number out of 5 to rate the model, where 5 is the best, and 0 is the worst. To help the user select the best number to rate the model, some quantification is given to the meaning of the numbers. Therefore 5, the maximum that you have used the model is ranked from 5 (the model that performs extremely well, gettting almost all, or all of the expected predictions correct) through to 0 (the model which failed to get any expected predictions correct). In addition to selecting a rating, it is also possible to enter comments, e.g. explanations as to why the rating was chosen, or specific notes about the situation in which the model was being used, e.g. the substrate(s) being used.

Dialog Window for Rating a Database Model

The model rating and the comments are saved to the database, together with the user's surname and first initial, the date the rating was entered. This information is available to users loading models from the database (see below), to help users select the most appropriate or useful model. Keeping track of this information also helps assess which features of the models are most important for determining a proteases substrate specificity. This will help in the long-term goal of building "best" models for individual proteases, and patterns of specificity within clans and families of proteases.

Top

Submitting a Protease Model as a File

To load a file in to PoPS, click on the "Load User Model" button in the program. This will bring up a window that allows the user to select the relevant file. On clicking the "Load Protease File" button, PoPS will load and display your protease model in the Applet. The file format is the PoPS file format, which is a text file with a *.prts extension. The layout is as follows:

Subsites S2 S1 S1' S2'
Weights 2.0 1.0 0.0 0.0
Gly 0.0 0.0 0.0 0.0
Ala 0.0 1.0 3.0 1.7
Val 2.0 0.0 3.0 3.2
Leu 3.0 0.0 0.0 3.3
Ile 1.0 2.0 0.0 3.0
Pro -3.0 3.0 0.0 0.0
Phe 3.0 3.0 0.0 0.2
Tyr 2.0 2.0 3.0 0.0
Trp 1.0 0.0 3.0 0.0
Ser -1.0 0.0 3.0 0.0
Thr -1.0 0.0 0.0 0.0
Cys 0.0 0.0 0.0 0.0
Met 0.0 0.0 0.0 3.0
Asn -1.0 3.8 0.0 3.1
Gln -1.0 4.2 0.0 3.0
Asp -2.0 0.0 0.0 3.0
Glu -2.0 0.0 -1.5 3.0
Lys -2.0 0.0 -2.0 0.0
Arg -2.0 0.0 -1.2 0.5
His -1.0 0.0 -1.7 0.0

The first column of text is the row label. Each tag should start with an upper case, and all the other letters should be lowercase. The order of the tags is also important. The first tag should be the Subsites tag, the next the Weights tag, and then the 20 amino acids should be defined in the same order as above. The ordering of the amino acids (as above) is the following:

The other columns should contain the values for the respective labels for each subsite. Note that the subsites must be defined from the outermost S subsite to the outermost S' subsite. It does not matter if the representation of the subsite has an upper- or lower-case S, but all S' subsites must end with the " ' " (prime) character. If values in the subsite profiles are outside of the range -5 to 5 (see Specifying the Protease Model for more information about the limits on these values), the user will be prompted to scale or re-enter the values as the file is loaded.

The columns should be separated by white space, for example single spaces (using the space bar), tab spacing and carriage returns (using the Enter key). Tab spacing produces the best alignment for reading the file. Because the file is a text file, it can be opened and edited in any text editor (for example Notepad, Emacs, Vi, TextEdit), as long as the file is always saved in text format. Alternatively, the file can be loaded in to PoPS, the protease model altered within PoPS, and then the file saved back to the user's directory again (see below for information on saving files from PoPS).

Comments can be included in the protease file. There is no specific format for the comments, and they can contain any text (any letters, numbers or symbols). The most important factor to remember is that the comments must come before and/or after the tags (as above), but not in between.

Top

Saving a Protease Model to a File

Click on the button labelled "Save Model To Disk". This will bring up a dialog window that will allow the user to save the file to the local computer system.

Top

Saving the predictions to a File

There are two results formats that can be saved to a file. The first is the reasoning table format, which can be obtained by clicking on the button "Save Results to Disk". A dialog box will appear, allowing the user to save the results to a file on the local machine.

The other results format that can be saved is the graphical results format which can be obtained by clicking on the "Save Graphics to Web Page" button.

Top


Top       Home       PoPS       Utilities       Documentation -> PoPS Manual       FAQ       About



Victorian Bioinformatics Consortium PoPS Logo Monash University