Utilities
Documentation -> PoPS Manual
FAQ
About
The substrate sequence is supplied to PoPS by pasting
in the sequence into the text area at the top of the program. The
amino acids must be represented using the 1-letter encoding. For
example, glycine is represented by G, alanine as A and so on. Lower
case letters will be converted to upper case letters before
prediction. All other characters apart from those representing the
alpha amino acids will be removed for the purposes of calculations
(for example B,J,O,U,X,Z and punctuation marks are all considered
illegal characters). The text area can be cleared (for example, to
paste another sequence) by using the "Clear Substrate" button.

The specificity model of a protease is built by specifying for each subsite the preferred amino acid(s), and the subsite's relative importance. (The contribution of these to substrate binding and cleavage are discussed on the Background page).
Defining the number of subsites.
The user must specify how many subsites determine cleavage. Any
number of subsites can be added or deleted by using the relevant
add/delete subsite buttons, to the right of the display. There must
be at least one S and S' subsite. A detailed description of the S and
S' notation is here.
Initially, the specificity profile has two subsites: S1 and S1'.

The number of subsites that can be defined is not limited, but obviously the more subsites that are defined, the slower the computation time. It is therefore advisable not to specify more subsites than necessary. The definition of subsites is continuous, which means, for example, it is not possible to define the S3 and S5 subsites without also specifying the S4 subsite. To force the program to ignore a subsite, set the weight to 0 (the weights and their meaning are discussed below).
Defining the specificity profile for each subsite.
For each subsite, it is necessary to define the preferences for the
20 alpha amino acids by assigning each of them a score within the
subsite. This is known as the specificity profile. To
create/edit the specificity profile, click on the profile button in
the matrix panel (which is initially set to "Any Amino Acid"). This
will open a new dialog which allows the user to edit the profile.

To create a profile, a number of predefined profiles are available in the drop-down list on the left side of the panel. Clicking on one of these profiles will automatically insert the values for each of the amino acids on the right side of the panel. Alternatively, the user can manually customise a profile by entering values directly. Of course, it is also possible to first select a predefined profile, then edit the values.
Each specificity profile contains 20 scores, one for each amino acid. Scores can assume any value ranging from -5.0 to +5.0. Negative values indicate an inhibitory effect on binding, with -1.0 being slightly unfavourable through to -5.0 being very unfavourable. Positive values represent amino acids that have a positive effect on binding, with +1.0 being slightly favourable, through to +5.0 being very favourable. A score of 0.0 indicates the amino acid has neither a positive nor negative effect on binding. If a score is outside of this range, it will be set to the nearest score (i.e. >5.0 is set to 5.0, and <-5.0 is set to -5.0). Finally, the hash (or pound) symbol "#" is reserved to indicate the given amino acid completely prevent cleavage will not occur.
Initially, each specificity profile is set to "Any amino acid", which means the score is set to 0.0 for all the amino acids. If this profile is to be used, it is recommended that the weight also be set to 0, to speed up computation time by causing the program to ignore this subsite.
To prevent numbers outside the allowable range (-5.0 to 5.0) from being truncated to the closest value (as described above), it is possible to scale the profile values, using the "Scale Values" button. There are two scaling options. The first option is to scale between a lower and upper bound. In this case, the minimum value is set to the lower bound, the maximum value to the upper bound, and then all the other values are scaled between the bounds appropriately. If the lower and upper bounds are set to the same value, then the user is alerted to this fact. If the user chooses to continue, then all the values in the specificity profile will be set to the same value, i.e. the upper/lower bound that was entered.
The second scaling option is to set a specific value to be equal to
0.0 after scaling, and to set the upper bound. In this case, the
lower bound will be the negative of the upper bound, by default. If
the upper bound is entered as a negative number, this will be set to
the lower bound, and the upper bound will become the positive of the
number entered. It is important to remember to check the box
of the scaling option to be used.

Defining the relative contribution of each subsite.
The last feature of the specificity model is the relative contribution of each of the subsites to cleavage. This is defined by setting the weight value of a subsite. This value reflects the subsite's contribution, or importance, in determining cleavage, relative to all the other subsites. There is no limit on the values that the weights can take. The default value is 1. As mentioned above, to force PoPS to ignore a subsite, set the weight to 0. For each subsite
The weights are used in conjunction with the specificity profiles to calculate a score reflecting the likelihood of a cleavage occurring. More detailed information on how the scores are calculated can be found in the How Scores are Calculated section of the Background page.
Display of the Results
The results of a prediction are displayed in the results section of the program:

The first representation of the results is a textual representation, known as the reasoning display. This information explains why the displayed predictions have been selected. The top line contains the maximum and minimum scores for all the predictions (even those that have not been displayed). The minimum score is the minimum score from all possible cleavages, and therefore does not include cleavages that could never occur, i.e. cleavages that include a "#". The score for these cleavages is negative infinity (-Infinity). For each predicted cleavage, the position of the cleavage (the residues between which it occurs), the subtotals for each subsite, and the total for the prediction are all displayed on one line.
The other representation of the results is a graphical representation. The substrate is printed across the panel, centered vertically. Note that the substrate printed in the results will not contain any illegal characters. The substrate displayed is the substrate after all illegal characters have been removed, immediately prior to the predictions being calculated. Within the substrate sequence, every tenth amino acid (after illegal characters have been removed) is numbered, with the digit appearing directly below the relevant amino acid.
Predicted cleavages are indicated by the downward-pointing arrows, centered between the amino acids either side of the cleavage. The size and colour of the arrows are both used to reflect the likelihood of a cleavage. A positive prediction (positive score) is one where a cleavage is likely to occur, and is drawn in green, where the more intense in the green, the more positive the score. A negative prediction (negative score) is one where a cleavage is not likely to occur, and is drawn in red, where the more intense the red, the more negative the score. In both cases, the greater the absolute value of the score, the thicker the arrow. Predictions with a score of 0 are represented as a straight black line. Information on how the scores are calculated can be found in the How Scores are Calculated section of the Background page.
A larger display of the results can be viewed by clicking on the "Bigger Results Display" button, which will bring up a new window with only the graphical results. This window can be resized to display more of the graphical results at one time (as opposed to the window in the PoPS main page, where only thirty residues can be viewed at a time).
Using the Stringency Setting
The results displayed are subject to the stringency setting.
Initially, the stringency is set to 0, meaning that only predicted
cleavages that have a score greater than or equal to 0 will be
displayed. The stringency can be set to any value (positive or
negative) and all predictions with a score greater than or equal to
the specified stringency will be displayed. To specify a stringency,
click on the "Specify Stringency" button. A dialog box will appear,
where the user can enter the preferred stringency value. Clicking
"OK" will screen the displayed results to display those predictions
with a score greater than or equal to the new stringency. The current
stringency is displayed in orange immediately above the graphics
panel.

Predicting Buried Cleavages
PoPS has a database, called the ASA, that contains the proteins of known three-dimensional structure, obtained from the Protein Data Bank (PDB) (H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne: The Protein Data Bank. Nucleic Acids Research, 28 pp. 235-242(2000)). Each protein in this database has been passed through a program called DSSP (Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features, http://www.cmbi.kun.nl/gv/dssp/, ref: Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers. 1983 Dec;22(12):2577-637). Using a 1.4 angstrom radius molecule, DSSP calculates the accessibility of each residue in the proteins in the database.
When a user checks the "Shade buried predictions" option, a dialog
box appears asking the user to enter a value. This value is the
minimum percentage that a residue must be exposed to solvent before it
will be considered to be accessible to the protease. The default value is 33%.

If the user clicks OK, the substrate is then aligned with the
proteins in the ASA database using 1 iteration of PSI-BLAST. All
alignments with an expect value of < 0.001 are selected, and the
surface accessibility of those alignments are returned together with
the predicted cleavages. The expect value is returned by PSI-BLAST,
and indicates the homology between the PDB and substrate sequences.
The smaller the e-value, the more likely the two sequences are
homologous. The user can select any of the alignments returned by
clicking on the "Shade Buried Predictions" check box. A new dialog
opens with a list to allow the users to select an alignment (if no
alignments are returned, the list will be empty).

Each line of the list contains one result, with a lot of information about the alignment. Firstly, the values in brackets indicate the indices of the first and last buried residues in the alignment. All residues outside of this range will be predicted as accessible, so the most useful information (what is buried) only occurs within this range. The second part of the result is the PDB (Protein Data Bank) descriptor of the protein that has been aligned with the substrate. This includes the PDB id, which means a user can go to PDB and look up the protein. After the PDB descriptor, in brackets, is the expect value (e-value) of the alignment of the PDB protein and the substrate submitted. The output list of ASA results is sorted in order of most homologous sequence to least homologous (smallest e-value to greatest e-value).
Selecting a homologue and clicking "Apply selected" will cause buried residues and predictions to be shaded in grey, in both the reasoning and graphical displays. A prediction will be shaded in grey if one or more residue across the alignment is buried. That is, if there is a residue that would be sitting in the active site for the current prediction, and that residue is buried, the prediction will be shaded grey, even if all the other residues are exposed. If the exposed/buried status of a given residue is unknown, then the residue is treated as exposed.
Sometimes a large number of alignments are returned. A shortcut to
trying each alignment in turn is to select the "Apply all ASA" button.
This combines all of the asa data returned, so that if a residue is
predicted as being buried in any one of the ASA alignments, it will be
shaded grey in the results displays. This can be used to get an
overall picture of accessible surface area, but individual alignments
should be checked, because sometimes the inaccessible residue shifts
across by one between one alignment and another. For example, given
the sequence THYVD, H may be buried in the first alignment, and Y
buried in the second. When all the ASA alignments are combined, both
H and Y are indicated as being inaccessible, when in fact it is likely
to be one or the other, but not both.

Note that if a homologue is not selected, no predictions will be hidden, and the check box will become unselected again. Hidden predictions can be displayed again, and the selected alignment can be changed by reselecting the "Shade buried predictions" check box.
Predicting Secondary Structure
In addition to the accessibility of the cleavage site, it is also interesting to have some idea of the possible secondary structure of the substrate, for example to identify helical regions, which would be less likely to be cleaved.. PoPS uses PSIPRED (Jones, D.T. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292, 195-202 (1999)) to predict the secondary structure of the substrate. In testing, PSIPRED achieved an average Q3 score of nearly 78%. To produce the predicted secondary structure, the substrate is initially blasted against the SwissProt non-redundant database. The results of the prediction are displayed below the substrate sequence, in the graphical results display. Helices are respresented with blue coils, sheets are represented by red arrows and random coil is represented with green wavy lines. The intensity of the colouring of these lines represents the confidence with which PSIPRED predicted the respective secondary structure. The more bold and intense the colouring, the more confident the prediction was.
To predict the secondary structure, check the "Predict secondary
struct." check box. A dialog will appear while waiting for the
results from the server, and when that disappears, the secondary
structure will be drawn in the graphical results display. To hide the
secondary structure, simply uncheck the box.

PoPS contains database whose content is derived from the MEROPS database (merops.sanger.ac.uk; Rawlings, N.D., O'Brien, E. A. & Barrett, A.J. (2002) MEROPS: the protease database. Nucleic Acids Res. 30, 343-346). Consequently, the PoPS database contains all known proteases, which can be searched for by name, clan, family, and MEROPS identifier. This database provides a repository for protease models, allowing users to save and load models.
Users are strongly encouraged to save models to the database. Protease specificity knowledge is currently scattered throughout journals, or worse not even published, but is expert knowledge retained within laboratories. Therefore, the PoPS database has the potential to become an extremely useful source of information for people working with proteases.
Saving Models to the Database
Once a model has been created in the program, it can be saved by
clicking on the "Save Model To Database" button. The first window to
appear will be a Login window.

To acquire a login name and password, it is necessary to register
your details. In the registration process, the user supplies some
basic details, and selects a name and password, which is used on
future occasions to log in (bypassing the registration process).

The registration information required from the user is simple, and is only used to ensure the integrity of the models that are saved into the database. Note that only the name of a registered user is made available to other users, as part of the model information. No other registration information is made publicly available, either through the program, or any other means.
Once registered (or logged in), it is possible to save the model to
the database. First, the protease to which the model belongs must be
selected. Then, if the model being saved is based on a previously
existing model, the "parent" model needs to be indicated.

When ready, selecting the "Save Model" button brings up a new
window with the information that will be saved together with the
model. While some of this information is automatically generated, the
user can add information about which species the model is specific to
(if applicable), any bibliographic information that might have been
used in the generation of the model, and general comments that should
be included with the model (for example, how the model was created, or
whether it is specific to a particular pH).

Clicking on the "Save Model" button will save the model to the database. The save can be cancelled at any time by using the "Cancel" buttons in the verification and save windows.
Loading Models From the Database
To load a model from the database, click on the "Load Database
Model" button, and a dialog box will appear with a list of all the
proteases. This list can be searched by a number of methods, all of
which are available in the right-hand panel next to the list of names.

Select the protease by clicking on it, and then click the "List Models" button. This produces the list of models available for the given protease. To select a model, click on the name of the model. If necessary, check the selected model's details before loading by clicking on the "Display Model Details" button. The model can be loaded without checking the details by clicking on the "Load Model" button. The "Cancel" button will return the user to the main program without loading a model.
The "Show Model Details" button produces a new window displaying
all the information that was saved with the model. This includes the
creation details (e.g. bibliographic details), species details, and
ratings from other users. To see any specific comments provided with
a model rating, double click on the relevant rating and the comments
(if any) will appear in the box below. The "Load This Model" button
loads the model, while the "Close" button closes the window, and
returns the user to the list of models, to select a different model.

Rating Models From the Database
To keep track of the utility of database models, it is possible for
users to save comments about a model. Rating models does not require
registration or login. To rate a model, click on the "Rate Database
Model" button:

The dialog button brings up a dialog very similar to the load and
save model dialogs. Select the required model (by selecting the
relevant protease then model), then click on the "Rate This Model"
button. In the new dialog window that opens, it is possible to choose
a number out of 5 to rate the model, where 5 is the best, and 0 is the
worst. To help the user select the best number to rate the model,
some quantification is given to the meaning of the numbers. Therefore
5, the maximum that you have used the model is ranked from 5 (the
model that performs extremely well, gettting almost all, or all of the
expected predictions correct) through to 0 (the model which failed to
get any expected predictions correct). In addition to selecting a
rating, it is also possible to enter comments, e.g. explanations as to
why the rating was chosen, or specific notes about the situation in
which the model was being used, e.g. the substrate(s) being used.

The model rating and the comments are saved to the database, together with the user's surname and first initial, the date the rating was entered. This information is available to users loading models from the database (see below), to help users select the most appropriate or useful model. Keeping track of this information also helps assess which features of the models are most important for determining a proteases substrate specificity. This will help in the long-term goal of building "best" models for individual proteases, and patterns of specificity within clans and families of proteases.
Submitting a Protease Model as a File
To load a file in to PoPS, click on the "Load User Model" button in the program. This will bring up a window that allows the user to select the relevant file. On clicking the "Load Protease File" button, PoPS will load and display your protease model in the Applet. The file format is the PoPS file format, which is a text file with a *.prts extension. The layout is as follows:
| Subsites | S2 | S1 | S1' | S2' |
| Weights | 2.0 | 1.0 | 0.0 | 0.0 |
| Gly | 0.0 | 0.0 | 0.0 | 0.0 |
| Ala | 0.0 | 1.0 | 3.0 | 1.7 |
| Val | 2.0 | 0.0 | 3.0 | 3.2 |
| Leu | 3.0 | 0.0 | 0.0 | 3.3 |
| Ile | 1.0 | 2.0 | 0.0 | 3.0 |
| Pro | -3.0 | 3.0 | 0.0 | 0.0 |
| Phe | 3.0 | 3.0 | 0.0 | 0.2 |
| Tyr | 2.0 | 2.0 | 3.0 | 0.0 |
| Trp | 1.0 | 0.0 | 3.0 | 0.0 |
| Ser | -1.0 | 0.0 | 3.0 | 0.0 |
| Thr | -1.0 | 0.0 | 0.0 | 0.0 |
| Cys | 0.0 | 0.0 | 0.0 | 0.0 |
| Met | 0.0 | 0.0 | 0.0 | 3.0 |
| Asn | -1.0 | 3.8 | 0.0 | 3.1 |
| Gln | -1.0 | 4.2 | 0.0 | 3.0 |
| Asp | -2.0 | 0.0 | 0.0 | 3.0 |
| Glu | -2.0 | 0.0 | -1.5 | 3.0 |
| Lys | -2.0 | 0.0 | -2.0 | 0.0 |
| Arg | -2.0 | 0.0 | -1.2 | 0.5 |
| His | -1.0 | 0.0 | -1.7 | 0.0 |
The first column of text is the row label. Each tag should start with an upper case, and all the other letters should be lowercase. The order of the tags is also important. The first tag should be the Subsites tag, the next the Weights tag, and then the 20 amino acids should be defined in the same order as above. The ordering of the amino acids (as above) is the following:
The other columns should contain the values for the respective labels for each subsite. Note that the subsites must be defined from the outermost S subsite to the outermost S' subsite. It does not matter if the representation of the subsite has an upper- or lower-case S, but all S' subsites must end with the " ' " (prime) character. If values in the subsite profiles are outside of the range -5 to 5 (see Specifying the Protease Model for more information about the limits on these values), the user will be prompted to scale or re-enter the values as the file is loaded.
The columns should be separated by white space, for example single spaces (using the space bar), tab spacing and carriage returns (using the Enter key). Tab spacing produces the best alignment for reading the file. Because the file is a text file, it can be opened and edited in any text editor (for example Notepad, Emacs, Vi, TextEdit), as long as the file is always saved in text format. Alternatively, the file can be loaded in to PoPS, the protease model altered within PoPS, and then the file saved back to the user's directory again (see below for information on saving files from PoPS).
Comments can be included in the protease file. There is no specific format for the comments, and they can contain any text (any letters, numbers or symbols). The most important factor to remember is that the comments must come before and/or after the tags (as above), but not in between.
Saving a Protease Model to a File
Click on the button labelled "Save Model To Disk". This will bring up a dialog window that will allow the user to save the file to the local computer system.
Saving the predictions to a File
There are two results formats that can be saved to a file. The first is the reasoning table format, which can be obtained by clicking on the button "Save Results to Disk". A dialog box will appear, allowing the user to save the results to a file on the local machine.
The other results format that can be saved is the graphical results format which can be obtained by clicking on the "Save Graphics to Web Page" button.
Utilities
Documentation -> PoPS Manual
FAQ
About
![]() |
![]() |
![]() |