Use Pattern Oriented Modeling to fit unknown parameters

This is very much in alpha condition. It has been tested on simple problems, as shown in the examples, with up to 2 parameters. It appears that DEoptim is the superior package for the stochastic problems. This should be used with caution as with all optimization routines. This function can nevertheless take optim as optimizer, using stats::optim. However, these latter approaches do not seem appropriate for stochastic problems, and have not been widely tested and are not supported within POM.

POM(
  sim,
  params,
  objects = NULL,
  objFn,
  cl,
  optimizer = "DEoptim",
  sterr = FALSE,
  ...,
  objFnCompare = "MAD",
  optimControl = NULL,
  NaNRetries = NA,
  logObjFnVals = FALSE,
  weights,
  useLog = FALSE
)

# S4 method for simList,character
POM(
  sim,
  params,
  objects = NULL,
  objFn,
  cl,
  optimizer = "DEoptim",
  sterr = FALSE,
  ...,
  objFnCompare = "MAD",
  optimControl = NULL,
  NaNRetries = NA,
  logObjFnVals = FALSE,
  weights,
  useLog = FALSE
)

Arguments

sim	A `simList` simulation object, generally produced by `simInit`.
params	Character vector of parameter names that can be changed by the optimizer. These must be accessible with `params(sim)` internally.
objects	A optional named list (must be specified if objFn is not). The names of each list element must correspond to an object in the `.GlobalEnv` and the list elements must be objects or functions of objects that can be accessed in the ls(sim) internally. These will be used to create the objective function passed to the optimizer. See details and examples.
objFn	An optional objective function to be passed into `optimizer`. If missing, then `POM` will use `objFnCompare` and `objects` instead. If using `POM` with a SpaDES simulation, this objFn must contain a spades call internally, followed by a derivation of a value that can be minimized but the `optimizer`. It must have, as first argument, the values for the parameters. See example.
cl	A cluster object. Optional. This would generally be created using parallel::makeCluster or equivalent. This is an alternative way, instead of `beginCluster()`, to use parallelism for this function, allowing for more control over cluster use.
optimizer	The function to use to optimize. Default is `"DEoptim"`. Currently it can also be `"optim"`, which uses `stats::optim`. The latter two do not seem optimal for stochastic problems and have not been widely tested.
sterr	Logical. If using `optimizer = "optim"`, the hessian can be calculated. If this is TRUE, then the standard errors can be estimated using that hessian, assuming normality.
...	All objects needed in objFn
objFnCompare	Character string. Either, `"MAD"` (default) or `"RMSE"` indicating that inside the objective function, data and prediction will be compared by Mean Absolute Deviation or Root Mean Squared Error.
optimControl	List of control arguments passed into the control of each optimization routine. Currently, only passed to `DEoptim.control` when `optimizer` is `"DEoptim"`
NaNRetries	Numeric. If greater than 1, then the function will retry the objective function for a total of that number of times if it results in an `NaN`. In general this should not be used as the objective function should be made so that it doesn't produce `NaN`. But, sometimes it is difficult to diagnose stochastic results.
logObjFnVals	Logical or Character string indicating a filename to log the outputs. Ignored if `objFn` is supplied. If TRUE (and there is no `objFn` supplied), then the value of the individual patterns will be output the console if being run interactively or to a tab delimited text file named `ObjectiveFnValues.txt` (or that passed by the user here) at each evaluation of the POM created objective function. See details.
weights	Numeric. If provided, this vector will be multiplied by the standardized deviations (possibly MAD or RMSE) as described in `objects`. This has the effect of weighing each standardized deviation (pattern--data pair) to a user specified amount in the objective function.
useLog	Logical. Should the data patterns and output patterns be logged (`log`) before calculating the `objFnCompare`. I.e., `mean(abs(log(output) - log(data)))`. This should be length 1 or length `objects`. It will be recycled if length >1, less than `objects`.

Value

A list with at least 2 elements. The first (or first several) will be the returned object from the optimizer. The second (or last if there are more than 2), named args is the set of arguments that were passed into the control of the optimizer.

Details

There are two ways to use this function: via 1) objFn or 2) objects.

The user can pass the entire objective function to the objFn argument that will be passed directly to the optimizer. For this, the user will likely need to pass named objects as part of the ....
The slightly simpler approach is to pass a list of 'actual data--simulated data' pairs as a named list in objects and specify how these objects should be compared via objFnCompare (whose default is Mean Absolute Deviation or "MAD").

Option 1 offers more control to the user, but may require more knowledge. Option 1 should likely contain a call to simInit(Copy(simList)) and spades internally. See examples that show simple examples of each type, option 1 and option 2. In both cases, params is required to indicate which parameters can be varied in order to achieve the fit.

Currently, option 1 only exists when optimizer is "DEoptim", the default.

The upper and lower limits for parameter values are taken from the metadata in the module. Thus, if the module metadata does not define the upper and lower limits, or these are very wide, then the optimization may have troubles. Currently, there is no way to override these upper and lower limits; the module metadata should be changed if there needs to be different parameter limits for optimization.

objects is a named list of data--pattern pairs. Each of these pairs will be assessed against one another using the objFnCompare, after standardizing each independently. The standardization, which only occurs if the abs(data value < 1), is: mean(abs(derived value - data value))/mean(data value). If the data value is between -1 and 1, then there is no standardization. If there is more than one data--pattern pair, then they will simply be added together in the objective function. This gives equal weight to each pair. If the user wishes to put different weight on each pattern, a weights vector can be provided. This will be used to multiply the standardized values described above. Alternatively, the user may wish to weight them differently, in which case, their relative scales can be adjusted.

There are many options that can be passed to DEoptim, (the details of which are in the help), using optimControl. The defaults sent from POM to DEoptim are: steptol = 3 (meaning it will start assessing convergence after 3 iterations (WHICH MAY NOT BE SUFFICIENT FOR YOUR PROBLEM), NP = 10 * length(params) (meaning the population size is 10 x the number of parameters) and itermax = 200 (meaning it won't go past 200 iterations). These and others may need to be adjusted to obtain good values. NOTE: DEoptim does not provide a direct estimate of confidence intervals. Also, convergence may be unreliable, and may occur because itermax is reached. Even when convergence is indicated, the estimates are not guaranteed to be global optima. This is different than other optimizers that will normally indicate if convergence was not achieved at termination of the optimization.

Using this function with a parallel cluster currently requires that you pass optimControl = list(parallelType = 1), and possibly package and variable names (and does not yet accept the cl argument). See examples. This setting will use all available threads on your computer. Future versions of this will allow passing of a custom cluster object via cl argument. POM will automatically determine packages to load in the spawned cluster (via packages) and it will load all objects in the cluster that are necessary, by sending names(objects) to parVar in DEoptim.control.

Setting logObjFnVals to TRUE may help diagnosing some problems. Using the POM derived objective function, essentially all patterns are treated equally. This may not give the correct behaviour for the objective function. Because POM weighs the patterns equally, it may be useful to use the log files to examine the behaviour of the pattern--data pairs. The first file, ObjectiveFnValues.txt, shows the result of each of the (possibly logged), pattern--data deviations, standardized, and weighted. The second file, ObjectiveFnValues_RawPatterns.txt, shows the actual value of the pattern (unstandardized, unweighted, unlogged). If weights is passed, then these weighted values will be reflected in the ObjectiveFnValues.txt file.

Examples

if (interactive()) {
  set.seed(89462)
  library(parallel)
  library(raster)
  mySim <- simInit(
    times = list(start = 0.0, end = 2.0, timeunit = "year"),
    params = list(
      .globals = list(stackName = "landscape", burnStats = "nPixelsBurned"),
      fireSpread = list(nfires = 5),
      randomLandscapes = list(nx = 300, ny = 300)
    ),
    modules = list("randomLandscapes", "fireSpread", "caribouMovement"),
    paths = list(modulePath = system.file("sampleModules", package = "SpaDES.core"))
  )

  # Since this is a made up example, we don't have real data
  #  to run POM against. Instead, we will run the model once,
  #  take the values at the end of the simulation as if they
  #  are real data, then rerun the POM function next,
  #  comparing these "data" with the simulated values
  #  using Mean Absolute Deviation
  outData <- spades(reproducible::Copy(mySim), .plotInitialTime = NA)

  # Extract the "true" data, in this case, the "proportion of cells burned"
  # Function defined that will use landscape$Fires map from simList,
  #  i.e., sim$landscape$Fires
  #  the return value being compared via MAD with propCellBurnedData
  propCellBurnedFn <- function(landscape) {
    sum(getValues(landscape$Fires) > 0) / ncell(landscape$Fires)
  }
  # visualize the burned maps of true "data"
  propCellBurnedData <- propCellBurnedFn(outData$landscape)
  clearPlot()
  if (interactive()) {
    library(quickPlot)

    fires <- outData$landscape$Fires # Plot doesn't do well with many nested layers
    Plot(fires)
  }

  # Example 1 - 1 parameter
  # In words, this says, "find the best value of spreadprob such that
  #  the proportion of the area burned in the simulation
  #  is as close as possible to the proportion area burned in
  #  the "data", using DEoptim().

  # Can use cluster if computer is multi-threaded.
  # This example can use parallelType = 1 in DEoptim. For this, you must manually
  #  pass all packages and variables as character strings.
  # cl <- makeCluster(detectCores() - 1) # not implemented yet in DEoptim
  out1 <- POM(mySim, "spreadprob",
              list(propCellBurnedData = propCellBurnedFn), # data = pattern pair
              #optimControl = list(parallelType = 1),
              logObjFnVals = TRUE)

  ## Once cl arg is available from DEoptim, this will work:
  # out1 <- POM(mySim, "spreadprob", cl = cl,
  #            list(propCellBurnedData = propCellBurnedFn)) # data = pattern pair

  # Example 2 - 2 parameters
  # Function defined that will use caribou from sim$caribou, with
  #  the return value being compared via MAD with nPattern
  #  module, parameter N, is from 10 to 1000)
  caribouFn <- function(caribou) length(caribou)

  # Extract "data" from simList object (normally, this would be actual data)
  nPattern <- caribouFn(outData$caribou)

  aTime <- Sys.time()
  parsToVary <- c("spreadprob", "N")
  out2 <- POM(mySim, parsToVary,
              list(propCellBurnedData = propCellBurnedFn,
                   nPattern = caribouFn), logObjFnVals = TRUE)
                   #optimControl = list(parallelType = 1))
                   #cl = cl) # not yet implemented, waiting for DEoptim
  bTime <- Sys.time()
  # check that population overlaps known values (0.225 and 100)
  apply(out2$member$pop, 2, quantile, c(0.025, 0.975))
  hists <- apply(out2$member$pop, 2, hist, plot = FALSE)
  clearPlot()
  for (i in seq_along(hists)) Plot(hists[[i]], addTo = parsToVary[i],
                                   title = parsToVary[i], axes = TRUE)

  print(paste("DEoptim", format(bTime - aTime)))
  #stopCluster(cl) # not yet implemented, waiting for DEoptim

  # Example 3 - using objFn instead of objects

  # list all the parameters in the simList, from these, we select to vary
  params(mySim)

  # Objective Function Example:
  #   objective function must have several elements
  #   - first argument must be parameter vector, passed to and used by DEoptim
  #   - likely needs to take sim object, likely needs a copy
  #      because of pass-by-reference semantics of sim objects
  #   - pass data that will be used internally for objective function
  objFnEx <- function(pars, # param values
                      sim, # simList object
                      nPattern, propCellBurnedData, caribouFn, propCellBurnedFn) {
    ### data

    # make a copy of simList because it will possibly be altered by spades call
    sim1 <- reproducible::Copy(sim)

    # take the parameters and assign them to simList
    params(sim1)$fireSpread$spreadprob <- pars[1]
    params(sim1)$caribouMovement$N <- pars[2]

    # run spades, without plotting
    out <- spades(sim1, .plotInitialTime = NA)

    # calculate outputs
    propCellBurnedOut <- propCellBurnedFn(out$landscape)
    nPattern_Out <- caribouFn(out$caribou)

    minimizeFn <- abs(nPattern_Out - nPattern) +
                  abs(propCellBurnedOut - propCellBurnedData)

    # have more info reported to console, if desired
    # cat(minimizeFn)
    # cat(" ")
    # cat(pars)

    return(minimizeFn)
  }

  # Run DEoptim with custom objFn, identifying 2 parameters to allow
  #  to vary, and pass all necessary objects required for the
  #  objFn

  # choose 2 of them to vary. Need to identify them in params & inside objFn
  # Change optimization parameters to alter how convergence is achieved
  out5 <- POM(mySim, params = c("spreadprob", "N"),
              objFn = objFnEx,
              nPattern = nPattern,
              propCellBurnedData = propCellBurnedData,
              caribouFn = caribouFn,
              propCellBurnedFn = propCellBurnedFn,
              #cl = cl, # uncomment for cluster # not yet implemented, waiting for DEoptim
              # see ?DEoptim.control for explanation of these options
              optimControl = list(
                NP = 100, # run 100 populations, allowing quantiles to be calculated
                initialpop = matrix(c(runif(100, 0.2, 0.24), runif(100, 80, 120)), ncol = 2),
                parallelType = 1
              )
            )

  # Can also use an optimizer directly -- miss automatic parameter bounds,
  #  and automatic objective function using option 2
  library(DEoptim)
  out7 <- DEoptim(fn = objFnEx,
                 sim = mySim,
                 nPattern = nPattern,
                 propCellBurnedData = propCellBurnedData,
                 caribouFn = caribouFn,
                 propCellBurnedFn = propCellBurnedFn,
                 # cl = cl, # uncomment for cluster
                 # see ?DEoptim.control for explanation of these options
                 control = DEoptim.control(
                   steptol = 3,
                   parallelType = 1, # parallelType = 3,
                   packages = list("raster", "SpaDES.core", "RColorBrewer"),
                   parVar = list("objFnEx"),
                   initialpop = matrix(c(runif(40, 0.2, 0.24),
                                         runif(40, 80, 120)), ncol = 2)),
                 lower = c(0.2, 80), upper = c(0.24, 120))
}

Use Pattern Oriented Modeling to fit unknown parameters

Arguments

Value

Details

See also

Examples

Contents

Author