This is very much in alpha condition.
It has been tested on simple problems, as shown in the examples, with up to 2 parameters.
It appears that DEoptim
is the superior package for the stochastic problems.
This should be used with caution as with all optimization routines. This function
can nevertheless take optim
as optimizer, using stats::optim
.
However, these latter approaches do not seem appropriate for stochastic problems,
and have not been widely tested and are not supported within POM
.
POM( sim, params, objects = NULL, objFn, cl, optimizer = "DEoptim", sterr = FALSE, ..., objFnCompare = "MAD", optimControl = NULL, NaNRetries = NA, logObjFnVals = FALSE, weights, useLog = FALSE ) # S4 method for simList,character POM( sim, params, objects = NULL, objFn, cl, optimizer = "DEoptim", sterr = FALSE, ..., objFnCompare = "MAD", optimControl = NULL, NaNRetries = NA, logObjFnVals = FALSE, weights, useLog = FALSE )
sim | A |
---|---|
params | Character vector of parameter names that can be changed by the optimizer.
These must be accessible with |
objects | A optional named list (must be specified if objFn is not).
The names of each list element must correspond to an object in the
|
objFn | An optional objective function to be passed into |
cl | A cluster object. Optional. This would generally be created using
parallel::makeCluster or equivalent. This is an alternative way, instead
of |
optimizer | The function to use to optimize. Default is |
sterr | Logical. If using |
... | All objects needed in objFn |
objFnCompare | Character string. Either, |
optimControl | List of control arguments passed into the control of each
optimization routine. Currently, only passed to
|
NaNRetries | Numeric. If greater than 1, then the function will retry the objective
function for a total of that number of times if it results in an |
logObjFnVals | Logical or Character string indicating a filename to log the outputs.
Ignored if |
weights | Numeric. If provided, this vector will be multiplied by the standardized
deviations (possibly MAD or RMSE) as described in |
useLog | Logical. Should the data patterns and output patterns be logged ( |
A list with at least 2 elements. The first (or first several) will
be the returned object from the optimizer. The second (or last if there are
more than 2), named args
is the set of arguments that were passed
into the control of the optimizer.
There are two ways to use this function: via 1) objFn
or 2) objects
.
The user can pass the entire objective function to the objFn
argument that will be passed directly to the optimizer
.
For this, the user will likely need to pass named objects as part of the ...
.
The slightly simpler approach is to pass a list of 'actual data--simulated data'
pairs as a named list in objects
and specify how these objects should be
compared via objFnCompare
(whose default is Mean Absolute Deviation or "MAD").
Option 1 offers more control to the user, but may require more knowledge.
Option 1 should likely contain a call to simInit(Copy(simList))
and
spades
internally.
See examples that show simple examples of each type, option 1 and option 2.
In both cases, params
is required to indicate which parameters can be
varied in order to achieve the fit.
Currently, option 1 only exists when optimizer is "DEoptim"
, the default.
The upper and lower limits for parameter values are taken from the metadata in the module. Thus, if the module metadata does not define the upper and lower limits, or these are very wide, then the optimization may have troubles. Currently, there is no way to override these upper and lower limits; the module metadata should be changed if there needs to be different parameter limits for optimization.
objects
is a named list of data--pattern pairs.
Each of these pairs will be assessed against one another using
the objFnCompare
, after standardizing each independently. The
standardization, which only occurs if the abs(data value < 1),
is: mean(abs(derived value - data value))/mean(data value)
. If
the data value is between -1 and 1, then there is no standardization.
If there is more than one data--pattern
pair, then they will simply be added together in the objective
function. This gives equal weight to each pair. If the user wishes to
put different weight on each pattern, a weights
vector can be
provided. This will be used to multiply the standardized values described above.
Alternatively, the user
may wish to weight them differently, in which case, their relative
scales can be adjusted.
There are many options that can be passed to DEoptim
,
(the details of which are in the help), using optimControl
. The defaults
sent from POM
to DEoptim
are: steptol = 3 (meaning it will start
assessing convergence after 3 iterations (WHICH MAY NOT BE SUFFICIENT FOR YOUR PROBLEM),
NP = 10 * length(params)
(meaning the population size is 10 x the number of parameters) and itermax =
200 (meaning it won't go past 200 iterations). These and others may need to be adjusted to
obtain good values.
NOTE: DEoptim
does not provide a direct estimate of confidence intervals.
Also, convergence may be unreliable, and may occur because itermax
is reached.
Even when convergence is indicated, the estimates are not guaranteed to be global
optima. This is different than other optimizers that will normally indicate
if convergence was not achieved at termination of the optimization.
Using this function with a parallel cluster currently requires that you pass
optimControl = list(parallelType = 1)
, and possibly package and variable names
(and does not yet accept the cl
argument). See examples.
This setting will use all available threads on your computer.
Future versions of this will allow passing of a custom cluster object via cl
argument.
POM
will automatically determine packages to load in the spawned cluster
(via packages
) and it will load all objects in the cluster that are
necessary, by sending names(objects)
to parVar
in DEoptim.control
.
Setting logObjFnVals
to TRUE
may help diagnosing some problems.
Using the POM derived objective function, essentially all patterns are treated equally.
This may not give the correct behaviour for the objective function.
Because POM
weighs the patterns equally, it may be useful to use the
log files to examine the behaviour of the pattern--data pairs.
The first file, ObjectiveFnValues.txt, shows the result of each of the
(possibly logged), pattern--data deviations, standardized, and weighted.
The second file, ObjectiveFnValues_RawPatterns.txt
, shows the actual
value of the pattern (unstandardized, unweighted, unlogged).
If weights
is passed, then these weighted values will be reflected
in the ObjectiveFnValues.txt
file.
spades
, makeCluster
,
simInit
if (interactive()) { set.seed(89462) library(parallel) library(raster) mySim <- simInit( times = list(start = 0.0, end = 2.0, timeunit = "year"), params = list( .globals = list(stackName = "landscape", burnStats = "nPixelsBurned"), fireSpread = list(nfires = 5), randomLandscapes = list(nx = 300, ny = 300) ), modules = list("randomLandscapes", "fireSpread", "caribouMovement"), paths = list(modulePath = system.file("sampleModules", package = "SpaDES.core")) ) # Since this is a made up example, we don't have real data # to run POM against. Instead, we will run the model once, # take the values at the end of the simulation as if they # are real data, then rerun the POM function next, # comparing these "data" with the simulated values # using Mean Absolute Deviation outData <- spades(reproducible::Copy(mySim), .plotInitialTime = NA) # Extract the "true" data, in this case, the "proportion of cells burned" # Function defined that will use landscape$Fires map from simList, # i.e., sim$landscape$Fires # the return value being compared via MAD with propCellBurnedData propCellBurnedFn <- function(landscape) { sum(getValues(landscape$Fires) > 0) / ncell(landscape$Fires) } # visualize the burned maps of true "data" propCellBurnedData <- propCellBurnedFn(outData$landscape) clearPlot() if (interactive()) { library(quickPlot) fires <- outData$landscape$Fires # Plot doesn't do well with many nested layers Plot(fires) } # Example 1 - 1 parameter # In words, this says, "find the best value of spreadprob such that # the proportion of the area burned in the simulation # is as close as possible to the proportion area burned in # the "data", using DEoptim(). # Can use cluster if computer is multi-threaded. # This example can use parallelType = 1 in DEoptim. For this, you must manually # pass all packages and variables as character strings. # cl <- makeCluster(detectCores() - 1) # not implemented yet in DEoptim out1 <- POM(mySim, "spreadprob", list(propCellBurnedData = propCellBurnedFn), # data = pattern pair #optimControl = list(parallelType = 1), logObjFnVals = TRUE) ## Once cl arg is available from DEoptim, this will work: # out1 <- POM(mySim, "spreadprob", cl = cl, # list(propCellBurnedData = propCellBurnedFn)) # data = pattern pair # Example 2 - 2 parameters # Function defined that will use caribou from sim$caribou, with # the return value being compared via MAD with nPattern # module, parameter N, is from 10 to 1000) caribouFn <- function(caribou) length(caribou) # Extract "data" from simList object (normally, this would be actual data) nPattern <- caribouFn(outData$caribou) aTime <- Sys.time() parsToVary <- c("spreadprob", "N") out2 <- POM(mySim, parsToVary, list(propCellBurnedData = propCellBurnedFn, nPattern = caribouFn), logObjFnVals = TRUE) #optimControl = list(parallelType = 1)) #cl = cl) # not yet implemented, waiting for DEoptim bTime <- Sys.time() # check that population overlaps known values (0.225 and 100) apply(out2$member$pop, 2, quantile, c(0.025, 0.975)) hists <- apply(out2$member$pop, 2, hist, plot = FALSE) clearPlot() for (i in seq_along(hists)) Plot(hists[[i]], addTo = parsToVary[i], title = parsToVary[i], axes = TRUE) print(paste("DEoptim", format(bTime - aTime))) #stopCluster(cl) # not yet implemented, waiting for DEoptim # Example 3 - using objFn instead of objects # list all the parameters in the simList, from these, we select to vary params(mySim) # Objective Function Example: # objective function must have several elements # - first argument must be parameter vector, passed to and used by DEoptim # - likely needs to take sim object, likely needs a copy # because of pass-by-reference semantics of sim objects # - pass data that will be used internally for objective function objFnEx <- function(pars, # param values sim, # simList object nPattern, propCellBurnedData, caribouFn, propCellBurnedFn) { ### data # make a copy of simList because it will possibly be altered by spades call sim1 <- reproducible::Copy(sim) # take the parameters and assign them to simList params(sim1)$fireSpread$spreadprob <- pars[1] params(sim1)$caribouMovement$N <- pars[2] # run spades, without plotting out <- spades(sim1, .plotInitialTime = NA) # calculate outputs propCellBurnedOut <- propCellBurnedFn(out$landscape) nPattern_Out <- caribouFn(out$caribou) minimizeFn <- abs(nPattern_Out - nPattern) + abs(propCellBurnedOut - propCellBurnedData) # have more info reported to console, if desired # cat(minimizeFn) # cat(" ") # cat(pars) return(minimizeFn) } # Run DEoptim with custom objFn, identifying 2 parameters to allow # to vary, and pass all necessary objects required for the # objFn # choose 2 of them to vary. Need to identify them in params & inside objFn # Change optimization parameters to alter how convergence is achieved out5 <- POM(mySim, params = c("spreadprob", "N"), objFn = objFnEx, nPattern = nPattern, propCellBurnedData = propCellBurnedData, caribouFn = caribouFn, propCellBurnedFn = propCellBurnedFn, #cl = cl, # uncomment for cluster # not yet implemented, waiting for DEoptim # see ?DEoptim.control for explanation of these options optimControl = list( NP = 100, # run 100 populations, allowing quantiles to be calculated initialpop = matrix(c(runif(100, 0.2, 0.24), runif(100, 80, 120)), ncol = 2), parallelType = 1 ) ) # Can also use an optimizer directly -- miss automatic parameter bounds, # and automatic objective function using option 2 library(DEoptim) out7 <- DEoptim(fn = objFnEx, sim = mySim, nPattern = nPattern, propCellBurnedData = propCellBurnedData, caribouFn = caribouFn, propCellBurnedFn = propCellBurnedFn, # cl = cl, # uncomment for cluster # see ?DEoptim.control for explanation of these options control = DEoptim.control( steptol = 3, parallelType = 1, # parallelType = 3, packages = list("raster", "SpaDES.core", "RColorBrewer"), parVar = list("objFnEx"), initialpop = matrix(c(runif(40, 0.2, 0.24), runif(40, 80, 120)), ncol = 2)), lower = c(0.2, 80), upper = c(0.24, 120)) }