Robust (or "resistant") methods for statistics modelling have been
available in S from the very beginning in the 1980s; and then in R in
mean(*, trim =. ),
fivenum(), the statistic
loess()) for robust
nonparametric regression, which had been complemented
Much further important functionality has been made available in
recommended (and hence present in all R versions) package
(by Bill Venables and Brian Ripley, see
Statistics with S
Most importantly, they provide
for robust regression and
robust multivariate scatter and covariance.
This task view is about R add-on packages providing newer or faster,
more efficient algorithms and notably for (robustification of) new models.
Please send suggestions for additions and extensions to the
task view maintainer
An international group of scientists working in the field of robust
statistics has made efforts (since October 2005) to coordinate several of
the scattered developments and make the important ones available
through a set of R packages complementing each other.
These should build on a basic package with "Essentials",
with (potentially many) other packages
building on top and extending the essential functionality to particular
models or applications.
Further, there is the quite comprehensive package
robust, a version of the robust library of S-PLUS,
as an R package now GPLicensed thanks to Insightful and Kjell Konis.
Originally, there has been much overlap between 'robustbase'
and 'robust', now
robustbase, the former providing convenient routines for
the casual user where the latter will contain the underlying
functionality, and provide the more advanced statistician with a
large range of options for robust modeling.
We structure the packages roughly into the following topics, and
typically will first mention functionality in packages
Regression (Linear, Generalized Linear, Nonlinear Models,
incl. Mixed Effects)
(robust) where the former uses the latest of the
fast-S algorithms and heteroscedasticity and autocorrelation corrected
(HAC) standard errors, the latter makes use of the M-S algorithm of
Maronna and Yohai (2000), automatically when there are factors
among the predictors (where S-estimators (and hence MM-estimators)
based on resampling typically badly fail).
are available in
robustbase, but rather for comparison
Note that Koenker's quantile regression package
contains L1 (aka LAD, least absolute deviations)-regression as a
special case, doing so also for nonparametric regression via
Quantile regression (and hence L1 or LAD) for mixed effect models,
is available in package
lqmm, whereas an
approach for robust linear
is available from package
Generalized linear models (GLMs) are provided both via
Robust Nonlinear model fitting is available through
fits overdispersed multinomial regression
models for count data.
fits robust GAMs, i.e., robust Generalized Additive
fits "Doubly Robust" Generalized Estimating Equations (GEEs)
package which builds ("
provides nice S4 class based methods,
more methods for robust multivariate variance-covariance estimation,
and adds robust PCA methodology.
It is extended by
rrcovNA, providing robust multivariate
or missing (
NA) data, and by
rrcovHD, providing robust multivariate methods for
contains a slightly more flexible
fastmcd(), and similarly for
has automatically chosen
for large dimensionality p.
for experimental, or other not yet
established procedures, contains
covNCC(), the latter providing the
neighbor variance estimation (NNVE) method of Wang and Raftery (2002),
also available in
provides a robust Regularized Singular Value Decomposition.
several methods for outlier identification in high dimensions.
estimates multivariate location and scatter in the presence of missing data.
performs robust inference based on
ootstrap on robust estimators, including
multivarate regression, PCA and Hotelling tests.
computes robust pairwise correlations based on scale estimates,
nearest neighbor variance estimation (NNVE) method of Wang and
Note that robust PCA can be performed by using standard
X <- stackloss; pc.rob <- princomp(X, covmat= MASS::cov.rob(X))
See also the CRAN task views
Large Data Sets
should be applicable for larger (n,p) than traditional robust
covariance based outlier detectors.
detects outliers for replicated high-throughput data.
(See also the CRAN task view
Descriptive Statistics / Exploratory Data Analysis
boxplot.stats(), etc mentioned above
Note however that these (last two items) are not yet available from CRAN.
running median filtering.
contains robust regression and
filtering methods for univariate time series, typically based on
repeated (weighted) median regressions.
provides several methods for robust
periodogram estimation, notably for irregularly spaced time series.
Peter Ruckdeschel has started to lead an effort for a robust
time-series package, see
"Routines for Robust Kalman
Filtering --- the ACM- and rLS-filter"
, is being developed, see
Econometricians tend to like HAC (heteroscedasticity and
autocorrelation corrected) standard errors. For a broad class of
models, these are provided by package
also uses a version of HAC
standard errors for its robustly estimated linear models.
See also the CRAN task view
Robust Methods for Bioinformatics
There are several packages in the
providing specialized robust methods.
provides infinitesimally robust
estimators for preprocessing omics data.
Robust Methods for Survival Analysis
provides robust estimation in the Cox
does robust signature selection for
detects outliers using quantile regression for
Robust Methods in Biostatistics (outside of Survival A.)
for robust and efficient analysis of the effect of
exposure on a secondary outcome in a case-control study.
Robust Methods for Surveys
On R-forge only, package
provides a robust
aims at robust geostatistical
analysis of spatial data, such as kriging and more.
Other approaches to robust and resistant methodology
and its several child packages
also allow to explore robust estimation concepts, see e.g.,
Notably, based on these,
aims for the implementation of R
packages for the computation of optimally robust estimators and
tests as well as the necessary infrastructure (mainly S4 classes
and methods) and diagnostics; cf. M. Kohl (2005).
It includes the R packages
computes Robust Accelerated Failure
Time Regression for Gaussian and logWeibull errors.
Weighted Likelihood Estimation
robustified likelihood estimation for a range of models,
notably (generalized) regression, and time series (AR and
for robust variance meta-regression
provides robust estimation and inference in sample selection models.