Utilities
Documentation -> ROC Curve Docs
FAQ
About
The ROC curves module, available from the utilities page, takes as input a file with cleavage sites and associated PoPS scores, which this module will use to generate a ROC curve.
A ROC curve provides a graphical representation of the relationship between the true-positive and false-positive prediction rate of a model. The y-axis corresponds to the sensitivity of the model, i.e. how well the model is able to predict true positives (real cleavages) from sites that are not cleaved, and the y-coordinates are calculated as:

The x-axis corresponds to the specificity (expressed on the curve as 1-specificity), i.e. the ability of the model to identify true negatives. An increase in specificity (i.e. a decrease along the X-axis) results in an increase in sensitivity. The x-coordinates are calculated as:

The greater the sensitivity at high specificity values (i.e. high y-axis values at low X-axis values) the better the model. A numerical measure of the accuracy of the model can be obtained from the area under the curve, where an area of 1.0 signifies near perfect accuracy, while an area of less than 0.5 indicates that the model is worse than just random. The quantitative-qualitative relationship between area and accuracy follows a fairly linear pattern, such that the following could be used as a guide:
Input data and generation of the ROC curve
This module takes as input a set of PoPS scores and whether each score belongs to a real cleavage site (true positive, indicated using `+') or to a site that is known not to be cleaved (true negative, indicated using `-'). The file contains three columns, with the following headers on the first line: CleavageSite, TrueCleavage? and PoPSScore. An example input file is:
| CleavageSite | TrueCleavage? | PoPSScore |
| Arg18-Gly19 | - | 11.55365703 |
| Arg211-Asp212 | - | 19.88672768 |
| Arg219-Thr220 | - | 12.06521739 |
| Arg227-Val228 | - | 10.75178455 |
| Arg254-Leu255 | - | 11.68750841 |
| Arg444-Thr445 | + | 18.3065967 |
In this (very small) example, there are 5 true negatives (sites that are not cleaved) and one true positive (site that is cleaved). There is no provision for comments in the file, i.e. the file should only contain the data for the ROC curve. The headers must be included, but are not case-sensitive (upper- or lower-case can be used), but each header MUST be a single word, for example `CleavageSite' (no space), not `Cleavage Site' (space between the two words).
The above example produces the following ROC curve (download the
example input file here):

Click here for larger version
The red line is the ROC curve for the input data, the gray line is the reference line. The output is discussed in detail in the section below.
The ROC curve is generated using the following steps:
If the score is greater than or equal to the threshold value, then it is classified as positive, otherwise it is classified as negative.

Load a ROC data file.
To load a ROC data file, click on the "Load a data file button", and select the appropriate file. Note that the input file must follow the format described above.
Manipulating the output.
On loading a data file, the data will be plotted in the main window, and the name of the file will appear in the "Graph" list box at the bottom left of the module. To remove a ROC curve from the main graph, click on the name of the curve in the "Graph" list and use the ">>" arrow to transfer it to the "Don't Graph" list. The curve will automatically be removed from the graph in the main window. The curve can be added back into the main window by clicking on the name of the curve in the "Don't Graph" list and using the "<<" button to transfer it back to the "Graph" list.
Each curve is automatically named using the input file name, and assigned a colour and marker type. The curve settings can be edited by double-clicking on the name of the curve (in either the "Graph" or "Don't Graph" lists). The following window will appear, allowing the settings of curve name, colour, and line and marker styles to be changed:

This dialog allows you to set the name, colour, and line and marker style of the curve.
As mentioned above, the performance of the model can be interepreted using the area under the curve. This is displayed in the legend:

The graph contains a reference line, drawn in gray, which indicates the 50% mark for every X/Y coordinate on the curve. When the graph falls below this mark, the model is performining poorly, worse than just random.
Finally, the zoom buttons enlarge ("zoom +") or shrink ("zoom -") the graph within the graphing window.
Utilities
Documentation -> ROC Curve Docs
FAQ
About
![]() |
![]() |
![]() |