mean
, quantile
, cdf
, and pmf
.R/linear_pool.R
linear_pool.Rd
Compute ensemble model outputs as a linear pool, otherwise known as a
distributional mixture, of component model outputs for
each combination of model task, output type, and output type id. Supported
output types include mean
, quantile
, cdf
, and pmf
.
linear_pool(
model_outputs,
weights = NULL,
weights_col_name = "weight",
model_id = "hub-ensemble",
task_id_cols = NULL,
n_samples = 10000,
...
)
an object of class model_output_df
with component
model outputs (e.g., predictions).
an optional data.frame
with component model weights. If
provided, it should have a column named model_id
and a column containing
model weights. Optionally, it may contain additional columns corresponding
to task id variables, output_type
, or output_type_id
, if weights are
specific to values of those variables. The default is NULL
, in which case
an equally-weighted ensemble is calculated.
character
string naming the column in weights
with model weights. Defaults to "weight"
character
string with the identifier to use for the
ensemble model.
character
vector with names of columns in
model_outputs
that specify modeling tasks. Defaults to NULL
, in which
case all columns in model_outputs
other than "model_id"
, the specified
output_type_col
and output_type_id_col
, and "value"
are used as task
ids.
numeric
that specifies the number of samples to use when
calculating quantiles from an estimated quantile function. Defaults to 1e4
.
parameters that are passed to distfromq::make_q_fn
, specifying
details of how to estimate a quantile function from provided quantile levels
and quantile values for output_type
"quantile"
.
a model_out_tbl
object of ensemble predictions. Note that any additional
columns in the input model_outputs
are dropped.
The underlying mechanism for the computations varies for different
output_type
s. When the output_type
is cdf
, pmf
, or mean
, this
function simply calls simple_ensemble
to calculate a (weighted) mean of the
component model outputs. This is the definitional calculation for the cdf or
pmf of a linear pool. For the mean
output type, this is justified by the fact
that the (weighted) mean of the linear pool is the (weighted) mean of the means
of the component distributions.
When the output_type
is quantile
, we obtain the quantiles of a linear pool
in three steps:
1. Interpolate and extrapolate from the provided quantiles for each component
model to obtain an estimate of the cdf of that distribution.
2. Draw samples from the distribution for each component model. To reduce Monte
Carlo variability, we use pseudo-random samples corresponding to quantiles
of the estimated distribution.
3. Collect the samples from all component models and extract the desired quantiles.
Steps 1 and 2 in this process are performed by distfromq::make_q_fn
.
# We illustrate the calculation of a linear pool when we have quantiles from the
# component models. We take the components to be normal distributions with
# means -3, 0, and 3, all standard deviations 1, and weights 0.25, 0.5, and 0.25.
library(purrr)
component_ids <- letters[1:3]
component_weights <- c(0.25, 0.5, 0.25)
component_means <- c(-3, 0, 3)
lp_qs <- seq(from = -5, to = 5, by = 0.25) # linear pool quantiles, expected outputs
ps <- rep(0, length(lp_qs))
for (m in seq_len(3)) {
ps <- ps + component_weights[m] * pnorm(lp_qs, mean = component_means[m])
}
component_qs <- purrr::map(component_means, ~ qnorm(ps, mean=.x)) %>% unlist()
component_outputs <- data.frame(
stringsAsFactors = FALSE,
model_id = rep(component_ids, each = length(lp_qs)),
target = "inc death",
output_type = "quantile",
output_type_id = ps,
value = component_qs)
lp_from_component_qs <- linear_pool(
component_outputs,
weights = data.frame(model_id = component_ids, weight = component_weights))
head(lp_from_component_qs)
#> # A tibble: 6 × 5
#> model_id target output_type output_type_id value
#> <chr> <chr> <chr> <dbl> <dbl>
#> 1 hub-ensemble inc death quantile 0.00569 -5.00
#> 2 hub-ensemble inc death quantile 0.0100 -4.75
#> 3 hub-ensemble inc death quantile 0.0167 -4.50
#> 4 hub-ensemble inc death quantile 0.0264 -4.25
#> 5 hub-ensemble inc death quantile 0.0397 -4.00
#> 6 hub-ensemble inc death quantile 0.0567 -3.75
all.equal(lp_from_component_qs$value, lp_qs, tolerance = 1e-3,
check.attributes=FALSE)
#> [1] TRUE