This is very much in alpha condition. It has been tested on simple problems, as shown in the examples, with up to 2 parameters. It appears that DEoptim is the superior package for the stochastic problems. This should be used with caution as with all optimization routines. This function can nevertheless take optim as optimizer, using stats::optim. However, these latter approaches do not seem appropriate for stochastic problems, and have not been widely tested and are not supported within POM.

POM(
  sim,
  params,
  objects = NULL,
  objFn,
  cl,
  optimizer = "DEoptim",
  sterr = FALSE,
  ...,
  objFnCompare = "MAD",
  optimControl = NULL,
  NaNRetries = NA,
  logObjFnVals = FALSE,
  weights,
  useLog = FALSE
)

# S4 method for simList,character
POM(
  sim,
  params,
  objects = NULL,
  objFn,
  cl,
  optimizer = "DEoptim",
  sterr = FALSE,
  ...,
  objFnCompare = "MAD",
  optimControl = NULL,
  NaNRetries = NA,
  logObjFnVals = FALSE,
  weights,
  useLog = FALSE
)

Arguments

sim

A simList simulation object, generally produced by simInit.

params

Character vector of parameter names that can be changed by the optimizer. These must be accessible with params(sim) internally.

objects

A optional named list (must be specified if objFn is not). The names of each list element must correspond to an object in the .GlobalEnv and the list elements must be objects or functions of objects that can be accessed in the ls(sim) internally. These will be used to create the objective function passed to the optimizer. See details and examples.

objFn

An optional objective function to be passed into optimizer. If missing, then POM will use objFnCompare and objects instead. If using POM with a SpaDES simulation, this objFn must contain a spades call internally, followed by a derivation of a value that can be minimized but the optimizer. It must have, as first argument, the values for the parameters. See example.

cl

A cluster object. Optional. This would generally be created using parallel::makeCluster or equivalent. This is an alternative way, instead of beginCluster(), to use parallelism for this function, allowing for more control over cluster use.

optimizer

The function to use to optimize. Default is "DEoptim". Currently it can also be "optim", which uses stats::optim. The latter two do not seem optimal for stochastic problems and have not been widely tested.

sterr

Logical. If using optimizer = "optim", the hessian can be calculated. If this is TRUE, then the standard errors can be estimated using that hessian, assuming normality.

...

All objects needed in objFn

objFnCompare

Character string. Either, "MAD" (default) or "RMSE" indicating that inside the objective function, data and prediction will be compared by Mean Absolute Deviation or Root Mean Squared Error.

optimControl

List of control arguments passed into the control of each optimization routine. Currently, only passed to DEoptim.control when optimizer is "DEoptim"

NaNRetries

Numeric. If greater than 1, then the function will retry the objective function for a total of that number of times if it results in an NaN. In general this should not be used as the objective function should be made so that it doesn't produce NaN. But, sometimes it is difficult to diagnose stochastic results.

logObjFnVals

Logical or Character string indicating a filename to log the outputs. Ignored if objFn is supplied. If TRUE (and there is no objFn supplied), then the value of the individual patterns will be output the console if being run interactively or to a tab delimited text file named ObjectiveFnValues.txt (or that passed by the user here) at each evaluation of the POM created objective function. See details.

weights

Numeric. If provided, this vector will be multiplied by the standardized deviations (possibly MAD or RMSE) as described in objects. This has the effect of weighing each standardized deviation (pattern--data pair) to a user specified amount in the objective function.

useLog

Logical. Should the data patterns and output patterns be logged (log) before calculating the objFnCompare. I.e., mean(abs(log(output) - log(data))). This should be length 1 or length objects. It will be recycled if length >1, less than objects.

Value

A list with at least 2 elements. The first (or first several) will be the returned object from the optimizer. The second (or last if there are more than 2), named args is the set of arguments that were passed into the control of the optimizer.

Details

There are two ways to use this function: via 1) objFn or 2) objects.

  1. The user can pass the entire objective function to the objFn argument that will be passed directly to the optimizer. For this, the user will likely need to pass named objects as part of the ....

  2. The slightly simpler approach is to pass a list of 'actual data--simulated data' pairs as a named list in objects and specify how these objects should be compared via objFnCompare (whose default is Mean Absolute Deviation or "MAD").

Option 1 offers more control to the user, but may require more knowledge. Option 1 should likely contain a call to simInit(Copy(simList)) and spades internally. See examples that show simple examples of each type, option 1 and option 2. In both cases, params is required to indicate which parameters can be varied in order to achieve the fit.

Currently, option 1 only exists when optimizer is "DEoptim", the default.

The upper and lower limits for parameter values are taken from the metadata in the module. Thus, if the module metadata does not define the upper and lower limits, or these are very wide, then the optimization may have troubles. Currently, there is no way to override these upper and lower limits; the module metadata should be changed if there needs to be different parameter limits for optimization.

objects is a named list of data--pattern pairs. Each of these pairs will be assessed against one another using the objFnCompare, after standardizing each independently. The standardization, which only occurs if the abs(data value < 1), is: mean(abs(derived value - data value))/mean(data value). If the data value is between -1 and 1, then there is no standardization. If there is more than one data--pattern pair, then they will simply be added together in the objective function. This gives equal weight to each pair. If the user wishes to put different weight on each pattern, a weights vector can be provided. This will be used to multiply the standardized values described above. Alternatively, the user may wish to weight them differently, in which case, their relative scales can be adjusted.

There are many options that can be passed to DEoptim, (the details of which are in the help), using optimControl. The defaults sent from POM to DEoptim are: steptol = 3 (meaning it will start assessing convergence after 3 iterations (WHICH MAY NOT BE SUFFICIENT FOR YOUR PROBLEM), NP = 10 * length(params) (meaning the population size is 10 x the number of parameters) and itermax = 200 (meaning it won't go past 200 iterations). These and others may need to be adjusted to obtain good values. NOTE: DEoptim does not provide a direct estimate of confidence intervals. Also, convergence may be unreliable, and may occur because itermax is reached. Even when convergence is indicated, the estimates are not guaranteed to be global optima. This is different than other optimizers that will normally indicate if convergence was not achieved at termination of the optimization.

Using this function with a parallel cluster currently requires that you pass optimControl = list(parallelType = 1), and possibly package and variable names (and does not yet accept the cl argument). See examples. This setting will use all available threads on your computer. Future versions of this will allow passing of a custom cluster object via cl argument. POM will automatically determine packages to load in the spawned cluster (via packages) and it will load all objects in the cluster that are necessary, by sending names(objects) to parVar in DEoptim.control.

Setting logObjFnVals to TRUE may help diagnosing some problems. Using the POM derived objective function, essentially all patterns are treated equally. This may not give the correct behaviour for the objective function. Because POM weighs the patterns equally, it may be useful to use the log files to examine the behaviour of the pattern--data pairs. The first file, ObjectiveFnValues.txt, shows the result of each of the (possibly logged), pattern--data deviations, standardized, and weighted. The second file, ObjectiveFnValues_RawPatterns.txt, shows the actual value of the pattern (unstandardized, unweighted, unlogged). If weights is passed, then these weighted values will be reflected in the ObjectiveFnValues.txt file.

See also

Examples

if (interactive()) { set.seed(89462) library(parallel) library(raster) mySim <- simInit( times = list(start = 0.0, end = 2.0, timeunit = "year"), params = list( .globals = list(stackName = "landscape", burnStats = "nPixelsBurned"), fireSpread = list(nfires = 5), randomLandscapes = list(nx = 300, ny = 300) ), modules = list("randomLandscapes", "fireSpread", "caribouMovement"), paths = list(modulePath = system.file("sampleModules", package = "SpaDES.core")) ) # Since this is a made up example, we don't have real data # to run POM against. Instead, we will run the model once, # take the values at the end of the simulation as if they # are real data, then rerun the POM function next, # comparing these "data" with the simulated values # using Mean Absolute Deviation outData <- spades(reproducible::Copy(mySim), .plotInitialTime = NA) # Extract the "true" data, in this case, the "proportion of cells burned" # Function defined that will use landscape$Fires map from simList, # i.e., sim$landscape$Fires # the return value being compared via MAD with propCellBurnedData propCellBurnedFn <- function(landscape) { sum(getValues(landscape$Fires) > 0) / ncell(landscape$Fires) } # visualize the burned maps of true "data" propCellBurnedData <- propCellBurnedFn(outData$landscape) clearPlot() if (interactive()) { library(quickPlot) fires <- outData$landscape$Fires # Plot doesn't do well with many nested layers Plot(fires) } # Example 1 - 1 parameter # In words, this says, "find the best value of spreadprob such that # the proportion of the area burned in the simulation # is as close as possible to the proportion area burned in # the "data", using DEoptim(). # Can use cluster if computer is multi-threaded. # This example can use parallelType = 1 in DEoptim. For this, you must manually # pass all packages and variables as character strings. # cl <- makeCluster(detectCores() - 1) # not implemented yet in DEoptim out1 <- POM(mySim, "spreadprob", list(propCellBurnedData = propCellBurnedFn), # data = pattern pair #optimControl = list(parallelType = 1), logObjFnVals = TRUE) ## Once cl arg is available from DEoptim, this will work: # out1 <- POM(mySim, "spreadprob", cl = cl, # list(propCellBurnedData = propCellBurnedFn)) # data = pattern pair # Example 2 - 2 parameters # Function defined that will use caribou from sim$caribou, with # the return value being compared via MAD with nPattern # module, parameter N, is from 10 to 1000) caribouFn <- function(caribou) length(caribou) # Extract "data" from simList object (normally, this would be actual data) nPattern <- caribouFn(outData$caribou) aTime <- Sys.time() parsToVary <- c("spreadprob", "N") out2 <- POM(mySim, parsToVary, list(propCellBurnedData = propCellBurnedFn, nPattern = caribouFn), logObjFnVals = TRUE) #optimControl = list(parallelType = 1)) #cl = cl) # not yet implemented, waiting for DEoptim bTime <- Sys.time() # check that population overlaps known values (0.225 and 100) apply(out2$member$pop, 2, quantile, c(0.025, 0.975)) hists <- apply(out2$member$pop, 2, hist, plot = FALSE) clearPlot() for (i in seq_along(hists)) Plot(hists[[i]], addTo = parsToVary[i], title = parsToVary[i], axes = TRUE) print(paste("DEoptim", format(bTime - aTime))) #stopCluster(cl) # not yet implemented, waiting for DEoptim # Example 3 - using objFn instead of objects # list all the parameters in the simList, from these, we select to vary params(mySim) # Objective Function Example: # objective function must have several elements # - first argument must be parameter vector, passed to and used by DEoptim # - likely needs to take sim object, likely needs a copy # because of pass-by-reference semantics of sim objects # - pass data that will be used internally for objective function objFnEx <- function(pars, # param values sim, # simList object nPattern, propCellBurnedData, caribouFn, propCellBurnedFn) { ### data # make a copy of simList because it will possibly be altered by spades call sim1 <- reproducible::Copy(sim) # take the parameters and assign them to simList params(sim1)$fireSpread$spreadprob <- pars[1] params(sim1)$caribouMovement$N <- pars[2] # run spades, without plotting out <- spades(sim1, .plotInitialTime = NA) # calculate outputs propCellBurnedOut <- propCellBurnedFn(out$landscape) nPattern_Out <- caribouFn(out$caribou) minimizeFn <- abs(nPattern_Out - nPattern) + abs(propCellBurnedOut - propCellBurnedData) # have more info reported to console, if desired # cat(minimizeFn) # cat(" ") # cat(pars) return(minimizeFn) } # Run DEoptim with custom objFn, identifying 2 parameters to allow # to vary, and pass all necessary objects required for the # objFn # choose 2 of them to vary. Need to identify them in params & inside objFn # Change optimization parameters to alter how convergence is achieved out5 <- POM(mySim, params = c("spreadprob", "N"), objFn = objFnEx, nPattern = nPattern, propCellBurnedData = propCellBurnedData, caribouFn = caribouFn, propCellBurnedFn = propCellBurnedFn, #cl = cl, # uncomment for cluster # not yet implemented, waiting for DEoptim # see ?DEoptim.control for explanation of these options optimControl = list( NP = 100, # run 100 populations, allowing quantiles to be calculated initialpop = matrix(c(runif(100, 0.2, 0.24), runif(100, 80, 120)), ncol = 2), parallelType = 1 ) ) # Can also use an optimizer directly -- miss automatic parameter bounds, # and automatic objective function using option 2 library(DEoptim) out7 <- DEoptim(fn = objFnEx, sim = mySim, nPattern = nPattern, propCellBurnedData = propCellBurnedData, caribouFn = caribouFn, propCellBurnedFn = propCellBurnedFn, # cl = cl, # uncomment for cluster # see ?DEoptim.control for explanation of these options control = DEoptim.control( steptol = 3, parallelType = 1, # parallelType = 3, packages = list("raster", "SpaDES.core", "RColorBrewer"), parVar = list("objFnEx"), initialpop = matrix(c(runif(40, 0.2, 0.24), runif(40, 80, 120)), ncol = 2)), lower = c(0.2, 80), upper = c(0.24, 120)) }