TY - JOUR
T1 - Accelerated and Interpretable Oblique Random Survival Forests
AU - Jaeger, Byron C.
AU - Welden, Sawyer
AU - Lenoir, Kristin
AU - Speiser, Jaime L.
AU - Segar, Matthew W.
AU - Pandey, Ambarish
AU - Pajewski, Nicholas M.
N1 - Publisher Copyright:
© 2023 American Statistical Association and Institute of Mathematical Statistics.
PY - 2024
Y1 - 2024
N2 - The oblique random survival forest (RSF) is an ensemble supervised learning method for right-censored outcomes. Trees in the oblique RSF are grown using linear combinations of predictors, whereas in the standard RSF, a single predictor is used. Oblique RSF ensembles have high prediction accuracy, but assessing many linear combinations of predictors induces high computational overhead. In addition, few methods have been developed for estimation of variable importance (VI) with oblique RSFs. We introduce a method to increase computational efficiency of the oblique RSF and a method to estimate VI with the oblique RSF. Our computational approach uses Newton-Raphson scoring in each non-leaf node, We estimate VI by negating each coefficient used for a given predictor in linear combinations, and then computing the reduction in out-of-bag accuracy. In benchmarking experiments, we find our implementation of the oblique RSF is hundreds of times faster, with equivalent prediction accuracy, compared to existing software for oblique RSFs. We find in simulation studies that “negation VI” discriminates between relevant and irrelevant numeric predictors more accurately than permutation VI, Shapley VI, and a technique to measure VI using analysis of variance. All oblique RSF methods in the current study are available in the aorsf R package, and additional supplemental materials are available online.
AB - The oblique random survival forest (RSF) is an ensemble supervised learning method for right-censored outcomes. Trees in the oblique RSF are grown using linear combinations of predictors, whereas in the standard RSF, a single predictor is used. Oblique RSF ensembles have high prediction accuracy, but assessing many linear combinations of predictors induces high computational overhead. In addition, few methods have been developed for estimation of variable importance (VI) with oblique RSFs. We introduce a method to increase computational efficiency of the oblique RSF and a method to estimate VI with the oblique RSF. Our computational approach uses Newton-Raphson scoring in each non-leaf node, We estimate VI by negating each coefficient used for a given predictor in linear combinations, and then computing the reduction in out-of-bag accuracy. In benchmarking experiments, we find our implementation of the oblique RSF is hundreds of times faster, with equivalent prediction accuracy, compared to existing software for oblique RSFs. We find in simulation studies that “negation VI” discriminates between relevant and irrelevant numeric predictors more accurately than permutation VI, Shapley VI, and a technique to measure VI using analysis of variance. All oblique RSF methods in the current study are available in the aorsf R package, and additional supplemental materials are available online.
KW - Computational efficiency
KW - Supervised learning
KW - Variable importance
UR - http://www.scopus.com/inward/record.url?scp=85166982895&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85166982895&partnerID=8YFLogxK
U2 - 10.1080/10618600.2023.2231048
DO - 10.1080/10618600.2023.2231048
M3 - Article
AN - SCOPUS:85166982895
SN - 1061-8600
VL - 33
SP - 192
EP - 207
JO - Journal of Computational and Graphical Statistics
JF - Journal of Computational and Graphical Statistics
IS - 1
ER -