Function to create prediction objects

Every classifier evaluation using ROCR starts with creating a prediction object. This function is used to transform the input data (which can be in vector, matrix, data frame, or list form) into a standardized format.

Usage

prediction(predictions, labels, label.ordering = NULL)

Arguments

predictions: A vector, matrix, list, or data frame containing the predictions.
labels: A vector, matrix, list, or data frame containing the true class labels. Must have the same dimensions as predictions.
label.ordering: The default ordering (cf.details) of the classes can be changed by supplying a vector containing the negative and the positive class label.

Value

An S4 object of class prediction.

Details

predictions and labels can simply be vectors of the same length. However, in the case of cross-validation data, different cross-validation runs can be provided as the columns of a matrix or data frame, or as the entries of a list. In the case of a matrix or data frame, all cross-validation runs must have the same length, whereas in the case of a list, the lengths can vary across the cross-validation runs. Internally, as described in section 'Value', all of these input formats are converted to list representation.

Since scoring classifiers give relative tendencies towards a negative (low scores) or positive (high scores) class, it has to be declared which class label denotes the negative, and which the positive class. Ideally, labels should be supplied as ordered factor(s), the lower level corresponding to the negative class, the upper level to the positive class. If the labels are factors (unordered), numeric, logical or characters, ordering of the labels is inferred from R's built-in < relation (e.g. 0 < 1, -1 < 1, 'a' < 'b', FALSE < TRUE). Use label.ordering to override this default ordering. Please note that the ordering can be locale-dependent e.g. for character labels '-1' and '1'.

Currently, ROCR supports only binary classification (extensions toward multiclass classification are scheduled for the next release, however). If there are more than two distinct label symbols, execution stops with an error message. If all predictions use the same two symbols that are used for the labels, categorical predictions are assumed. If there are more than two predicted values, but all numeric, continuous predictions are assumed (i.e. a scoring classifier). Otherwise, if more than two symbols occur in the predictions, and not all of them are numeric, execution stops with an error message.

Author

Tobias Sing tobias.sing@gmail.com, Oliver Sander osander@gmail.com