An important application of genomic
studies is to discover genomic biomarkers, among tens of
thousands of genes assayed, for disease classification. There is
a need for statistical methods that can efficiently use such
high-throughput data, select biomarkers with discriminant power
and construct classification rules. The ROC (receiving operator
characteristic) technique has been widely used in disease
classification with low dimensional biomarkers because (1).it
does not assume a parametric form of the class probability;
(2).it accommodates case-control designs; and (3).it allows
treating false positives and false negatives differently.
However, due to computational difficulties, the ROC based
classification has not been used with genomic data. Moreover,
the standard ROC technique does not incorporate built-in
biomarker selection.
We propose a novel method for biomarker selection and
classification using the ROC technique for genomic data. The
proposed method uses a sigmoid approximation to the area under
the ROC curve as the objective function for classification and
the threshold gradient descent regularization method for
estimation and biomarker selection. Tuning parameter selection
based on the V-fold cross validation and predictive performance
evaluation are also investigated. The proposed approach is
demonstrated with the Colon cancer study and a HIV vaccine
study. The proposed approach yields parsimonious models with
excellent classification performance.