mean
, median
, quantile
, cdf
, and pmf
.R/simple_ensemble.R
simple_ensemble.Rd
Compute ensemble model outputs by summarizing component model outputs for
each combination of model task, output type, and output type id. Supported
output types include mean
, median
, quantile
, cdf
, and pmf
.
simple_ensemble(
model_outputs,
weights = NULL,
weights_col_name = "weight",
agg_fun = "mean",
agg_args = list(),
model_id = "hub-ensemble",
task_id_cols = NULL
)
an object of class model_out_tbl
with component
model outputs (e.g., predictions).
an optional data.frame
with component model weights. If
provided, it should have a column named model_id
and a column containing
model weights. Optionally, it may contain additional columns corresponding
to task id variables, output_type
, or output_type_id
, if weights are
specific to values of those variables. The default is NULL
, in which case
an equally-weighted ensemble is calculated.
character
string naming the column in weights
with model weights. Defaults to "weight"
a function or character string name of a function to use for aggregating component model outputs into the ensemble outputs. See the details for more information.
a named list of any additional arguments that will be passed
to agg_fun
.
character
string with the identifier to use for the
ensemble model.
character
vector with names of columns in
model_outputs
that specify modeling tasks. Defaults to NULL
, in which
case all columns in model_outputs
other than "model_id"
, the specified
output_type_col
and output_type_id_col
, and "value"
are used as task
ids.
a model_out_tbl
object of ensemble predictions. Note that
any additional columns in the input model_outputs
are dropped.
The default for agg_fun
is "mean"
, in which case the ensemble's
output is the average of the component model outputs within each group
defined by a combination of values in the task id columns, output type, and
output type id. The provided agg_fun
should have an argument x
for the
vector of numeric values to summarize, and for weighted methods, an
argument w
with a numeric vector of weights. If it desired to use an
aggregation function that does not accept these arguments, a wrapper
would need to be written. For weighted methods, agg_fun = "mean"
and
agg_fun = "median"
are translated to use matrixStats::weightedMean
and
matrixStats::weightedMedian
respectively.