AssimBatchVignette.Rmd
Only needs to be done the first time :
library(devtools)
install_github("PecanProject/pecan", subdir="all")
Currently, there are three ways of doing Parameter Data Assimilation (PDA) in PEcAn :
Note that, there used to be Metropolis-Hastings Markov Chain
Monte Carlo (MCMC) algorithms implemented natively in PEcAn, namely
/modules/assim.batch/R/pda.mcmc.R
and
pda.mcmc.bs.R
. These are deprecated and they will no longer
be maintained.
bayesian.tools : The
pda.bayesian.tools.R
is the main script of this PDA
workflow that performs a bruteforce (MCMC methods that require running
your models at least couple of thousand of times) calibration. This
script uses the BayesianTools
R-package (Hartig, Minuno and
Paul, 2019) that includes MCMC and SMC samplers and other tools for
Bayesian parameter calibration with external models. If you choose
bayesian.tools option, PEcAn framework will hand the PDA
calculations over to the BayesianTools
package. Although
this package includes algorithms that are designed to explore the
parameter space more efficiently than the traditional MH-MCMC, you would
still need a relatively faster model to use these algorithms. Note that
the BayesianTools
R-package itself is constantly under
development, and its integration with PEcAn may not be able to leverage
all BayesianTools
functionality.
emulator : The pda.emulator.R
is the
main script of this PDA workflow. When a model is slow, it is
practically not possible to run it hundreds of thousands of times
sequentially to explore the parameter space and draw enough samples to
converge the target distribution with bruteforce algorithms. Instead, we
can run the model for a relatively smaller number of times with
parameter combinations that have been carefully chosen to give good
coverage of parameter space. Then we can interpolate the likelihood
calculated for each of those runs to get a surface that “emulates” the
true likelihood and perform regular MCMC (just like the “bruteforce”
approach), except instead of actually running the model on every
iteration to obtain a likelihood value, this time we will just get an
approximation from the likelihood emulator. A lot more details about the
emulator approach can be found in Fer et al. 2018.
Multi-site HPDA : The pda.emulator.ms.R
is the main script of this PDA workflow which performs a multi-site
hierarchical Bayesian calibration using the emulator approach. In the
multi-site hierarhical calibration, global posteriors are fitted
simultaneously using an additional Gibbs update step, and used as the
new prior hyperparameters in the next iteration of MCMC to obtain
site-level hierarchical posteriors. This PDA workflow has its own
vignette (see
/modules/assim.batch/vignettes/MultiSitePDAVignette.Rmd
)
and will not be one of the examples here. For more information also see
Fer et al. 2021
bioRxiv preprint.
The easiest way to use PEcAn’s parameter data assimilation module is
to add an <assim.batch>
block to pecan.xml, load the
file with read.settings
, and pass the resulting settings
object to pda.bayesian.tools()
or
pda.emulator()
. There are some differences in the settings
for using different PDA methods (we will go through test cases, at this
point just familiarize yourself with the general pattern), but here is
an example <assim.batch>
block :
<assim.batch>
<iter>100000</iter>
<method>emulator</method> <--- WHICH PDA APPROACH
<chain>3</chain>
<param.names>
<temperate.coniferous> <--- YOUR PFT(S)
<param>PARAM1</param> <--- PARAMETER(S) YOU TARGET IN PDA
<param>PARAM2</param>
</temperate.coniferous>
</param.names>
<jump>
<adapt>250</adapt>
<adj.min>0.1</adj.min>
<ar.target>0.3</ar.target>
</jump>
<inputs>
<file>
<input.id> ... </input.id> <--- YOUR CONSTRAINT ID FROM BETY DB
<path>
<path> ... </path> <--- YOUR CONSTRAINT FILE PATH(S)
<path> ... </path>
</path>
<likelihood>Laplace</likelihood> <--- LIKELIHOOD FORM FOR THIS CONSTRAINT
<variable.id>298</variable.id> <--- (CONSTRAINT) VARIABLE ID FROM BETY DB
<variable.name>
<variable.name>LE</variable.name> <--- (CONSTRAINT) VARIABLE
<variable.name>UST</variable.name> <--- HELPER VARIABLE SPECIFIC TO FLUX CASE
</variable.name>
</file>
<file>
<input.id>...</input.id>
<path>
<path>...</path>
</path>
<likelihood>multipGauss</likelihood>
<hyper.pars>
<parama>0.001</parama> <--- HYPERPARAMETERS SPECIFIC TO GAUSSIAN FORM
<paramb>0.001</paramb>
</hyper.pars>
<ss.positive>TRUE</ss.positive>
<variable.id>...</variable.id>
<variable.name>
<variable.name>...</variable.name>
</variable.name>
</file>
</inputs>
</assim.batch>
Here are some more details about the PDA tags:
<iter>
Specifies the number of MCMC iterations
to run. If continuing a previous MCMC, this is the number of additional
iterations, which will be added to the previous total. Defaults to 100
if missing.
<chain>
Specifies the number of MCMC chains to
be run. Defaults to 2. Note that some of the post-processing functions
will look for at least 2 chains for diagnostics.
<prior>
Identifies the prior to be used for
PDA. Can be one of either:
+ `<posterior.id>` A posterior ID in BETY specifying the posterior from a previous PEcAn analysis (e.g., meta-analysis or previous PDA) to be used as the prior for PDA. Defaults to the most recent relevant posterior in the database if omitted (and no `<path>` specified instead; see below).
+ `<path>` As an alternative to using a posterior ID, can specify a file path to either a `prior.distns.Rdata` or `post.distns.Rdata` file generated from an earlier analysis. Conceptually, using a posterior distribution as the prior for PDA is preferred, as this allows the multiple analyses to work together to iteratively constrain parameters. In practice, previous analyses may have over-constrained parameters to ranges that do not actually optimize model outputs, so using a less informative prior for PDA might yield better results.
NOTE: It is recommended to leave the `<prior>` tag empty if you are doing this for the first time. PEcAn workflow will handle it for you.
<param.names>
The names of parameters to be
constrained by assimilation, listed in individual
<param>
tags. These must be the standard names
given in BETY DB (e.g. wood_density), not your model specific parameter
names (e.g. wooddens), i.e. check out the id
column of
the trait.dictionary
, i.e. :
## id figid
## 1 plant_min_temp Plant Min Temp
## 2 GDD Growing Degree Days
## 3 leafC Leaf C conc.
## 4 c2n_leaf Leaf C:N
## 5 leaf_respiration_rate_m2 Leaf Respiration Rate
## 6 dark_respiration_factor Dark Respiration Factor
You can inspect your write.config.MODEL.R
script for
standard parameter names and their mappings to model parameter
names.
NOTE : `<param.names>` chunk should always be on your xml file regardless of the method you use in the PDA.
<inputs>
Observation (constraint) data to be
compared to the model. In principle, can be one or more datasets,
specified in a variety of ways. In practice, the code is tested for
assimilating flux datasets numerous times using the NEE/FC or LE
variables.
<file>
Denotes a set of tags for a single input
(constraint data). Would be repeated for multiple datasets/variables,
e.g. in this case note the differences in
<variable.id>
and <variable.name>
:...
<inputs>
<file>
<input.id>15000000028</input.id>
<path>
<path>/data/dbfiles/ICOS_site_1-266/FLX_FI-Var_FLUXNET2015_FULLSET_HH_2016-2018_beta-3.csv</path>
</path>
<likelihood>Laplace</likelihood>
<variable.id>297</variable.id>
<variable.name>
<variable.name>NEE</variable.name>
<variable.name>UST</variable.name>
</variable.name>
</file>
<file>
<input.id>15000000028</input.id>
<path>
<path>/data/dbfiles/ICOS_site_1-266/FLX_FI-Var_FLUXNET2015_FULLSET_HH_2016-2018_beta-3.csv</path>
</path>
<likelihood>Laplace</likelihood>
<variable.id>298</variable.id>
<variable.name>
<variable.name>LE</variable.name>
<variable.name>UST</variable.name>
</variable.name>
</file>
</inputs>
...
<input.id>
BETY input ID for looking up the
input.
<path>
File path to the input. Both
<id>
and <path>
of the observation
data should be supplied for the PDA.
<source>
A standardized source of input data
(e.g., Ameriflux). Not implemented yet, but the idea would be similar to
the met workflow, PEcAn would be able to use standard data sources
automatically where available. Only used if no <id>
or <path>
is given.
<likelihood>
Identifier for the likelihood to
use. E.g., the Ameriflux NEE/FC and LE data use a Laplacian
likelihood.
<hyper.pars>
Optional. Hyperparameters for
your likelihood. E.g. Prior parameters of the precision for Gaussian
likelihood. Defaults to scaled values if not provided.
<ss.positive>
When using the emulator, it is
important to let the algorithm know that you have a likelihood whose
sufficient statistics is zero bound.
<variable.id>
The BETY variable ID associated
with this dataset. The idea is that specific preprocessing steps (e.g.,
estimating heteroskedastic error for tower NEE) would be associated with
particular IDs. Could automate further by assigning default
<likelihood>
to variable.id values (allowing
<likelihood>
to be omitted from pecan.xml).
NOTE : `<inputs>` chunk should always be on your xml file regardless of the method you use in the PDA.
<jump>
Settings for the specifics of the
proposal schema and adaptation functionality. It is used in the
emulator
approach, it’s possible to set adaptation options
using bayesian.tools
specific tags as well (see below for
<bt.settings>
).
<ar.target>
Target acceptance rate for the
adaptive jump algorithm. Defaults to 0.5 if missing.<adapt>
Number of iterations between jump
variance adaptations. Defaults to floor(iter/10)
if
missing. If set equal to the number of <iter>
,
basically turns off the adaptation functionality, it is not recommended
to turn-off this functionality for the emulator approach.<adj.min>
Minimum factor by which to reduce jump
variance when adapting. Prevents jump variances from degenerating to 0.
Defaults to 0.1 if missing.Now we will go through 4 examples of how you could arrange your
<assim.batch>
tags and run a PDA in PEcAn:
Assuming you already familiarized yourself with Demo 1 and Demo 2,
and ran SIPNET at Niwot Ridge (US-NR1), in this section we will go
through the pecan.xml
tags for calibrating SIPNET with
Ameriflux NEE data with the emulator approach at the US-NR1 site. The
SIPNET PFT for this site was temperate.coniferous
, let’s
assume the top three parameters contributing to the model NEE output
uncertainty (Demo 2) were
psnTOpt
,half_saturation_PAR
, and
Vm_low_temp
, so we decided to constrain these in the PDA.
Go to your workflow directory where you performed the uncertainty
analysis, open up your pecan.CONFIGS.xml
, insert the
following tags at the top of your xml after the
<pecan>
tag (don’t forget to remove
<---
comments), modify the paths according to
your setup, and save the file as
pecan.PDA.xml
:
<assim.batch>
<method>emulator</method>
<n.knot>60</n.knot> <--- 20 x p (= no. of paramaters you target)
<iter>100000</iter>
<chain>3</chain>
<param.names>
<temperate.coniferous>
<param>psnTOpt</param>
<param>half_saturation_PAR</param>
<param>Vm_low_temp</param>
</temperate.coniferous>
</param.names>
<jump>
<adapt>250</adapt>
<adj.min>0.1</adj.min>
<ar.target>0.3</ar.target>
</jump>
<inputs>
<file>
<input.id>1000011238</input.id> <--- your input ID (of the DB record of where the AMF csv file is)
<path>
<path>/.../AmerifluxLBL_site_0-772/AMF_US-NR1_BASE_HH_12-5.csv</path> <--- your path
</path>
<likelihood>Laplace</likelihood>
<variable.id>1000000042</variable.id>
<variable.name>
<variable.name>FC</variable.name>
<variable.name>UST</variable.name>
</variable.name>
</file>
</inputs>
</assim.batch>
For reference, these are the tags for a similar workflow for SIPNET at ICOS-NEE (FI-Var site) combination. Note the differences in PFT name, input ID and path, variable ID and name:
<assim.batch>
<method>emulator</method>
<n.knot>60</n.knot>
<iter>100000</iter>
<chain>3</chain>
<param.names>
<boreal.coniferous>
<param>psnTOpt</param>
<param>half_saturation_PAR</param>
<param>Vm_low_temp</param>
</boreal.coniferous>
</param.names>
<jump>
<adapt>250</adapt>
<adj.min>0.1</adj.min>
<ar.target>0.3</ar.target>
</jump>
<inputs>
<file>
<input.id>15000000028</input.id>
<path>
<path>/.../ICOS_site_1-266/FLX_FI-Var_FLUXNET2015_FULLSET_HH_2016-2018_beta-3.csv</path>
</path>
<likelihood>Laplace</likelihood>
<variable.id>297</variable.id>
<variable.name>
<variable.name>NEE</variable.name>
<variable.name>UST</variable.name>
</variable.name>
</file>
</inputs>
</assim.batch>
Now you can read in your pecan.PDA.xml
and run PDA (This
will take some time, go grab a drink. The first round especially takes
time because this is when the constraint data is loaded and its
effective sample size is calculated):
settings <- read.settings("pecan.PDA.xml")
settings_PDA_round1 <- pda.emulator(settings)
The PDA workflow has populated you settings
list object
with couple of relevant tags. Now settings_PDA_round1
has
these additional tags (note that this post-PDA settings
list is also automatically written to your workflow directory as
pecan.pdaR1_ENS_ID.xml
). These tags look like this:
<prior>
<prior.id>
<boreal.coniferous>INITIAL_PDA_PRIOR_ID</boreal.coniferous>
</prior.id>
</prior>
<ensemble.id>
<id>R1_ENS_ID</id>
</ensemble.id>
<emulator.path>/.../PEcAn_WORKFLOWID/emulator.pdaR1_ENS_ID.Rdata</emulator.path>
<ss.path>/.../PEcAn_WORKFLOWID/ss.pdaR1_ENS_ID.Rdata</ss.path>
<mcmc.path>/.../PEcAn_WORKFLOWID/mcmc.list.pdaR1_ENS_ID.Rdata</mcmc.path>
<resume.path>/.../PEcAn_WORKFLOWID/resume.pdaR1_ENS_ID.Rdata</resume.path>
<round_counter>1</round_counter>
<extension>round</extension>
<prior.id>
This is the initial prior that went
into your first PDA round. Even though the PDA workflow updates the
<posteriorid>
under your PFT block, if you do another
emulator round only the initial prior will be used in the MCMC
(otherwise you use data twice!). Hence, we keep track of its ID in the
xml. But it is still perfectly fine to propose the new training points
for the next emulator round from the posterior of the previous round, so
that you can refine your emulator surface, that’s when the PDA workflow
uses <posteriorid>
under the hood.
<ensemble.id>
Each emulator round is assigned
a new ensemble ID. You can see this ID in every file recorded for the
associated emulator round, e.g.:
history.pdaR1_ENS_ID.Rdata
,
pecan.pdaR1_ENS_ID.xml
,
post.distns.pda.YOURPFT_R1_ENS_ID.Rdata
etc.
Rest of the tags are for the emulator workflow to be able to find
previous rounds’ information and understand what to do in the next
round. The emulator workflow automatically adds
<extension>round</extension>
because if you
forget to add this tag manually after the first round, you will just be
repeating the first round while you almost certainly intend to do an
iterative round if you are using the settings of your previous round.
Forgetting to add this tag has happened enough times that we now add
this tag automatically. But note that in PDA we have a second type of
extension which is
<extension>longer</extension>
. This is to run
your MCMC chain longer in case it didn’t converge. This will be more of
an issue for the bruteforce calibration (see below), but it is also
possible to run your MCMC chains longer if your emulator-based MCMC also
didn’t converge:
<extension>
Specifies the extension type of
additional emulator runs, should be skipped if this a first emulator run
of its own. Otherwise it is possible to extend emulator PDA runs in two
ways:
<extension>longer</extension>
using the
same emulator, this extension run takes the MCMC sampling from where it
was left in the previous emulator run and runs a longer MCMC
chain.
<extension>round</extension>
this
extension run proposes new points in the parameter space in addition to
the previous ones, and builds a new emulator including these additional
points for a new MCMC sampling. These new points can come from both your
inital PDA prior and the posterior of your first round of emulator run.
You can determine the percentage of new knots coming from the posterior
of your previous run in the <knot.par>
tag. If you
leave it empty, 90% of your new points will be drawn from the posterior
of your previous run by default.
Now you can either pass the post PDA settings in your environment to
the pda.emulator()
function again (yes you can put this in
a loop) or you can re-read your pecan.pdaR1_ENS_ID.xml
at
any time to perform an iterative emulator round.
### settings_PDA_round1 <- read.settings("pecan.pdaR1_ENS_ID.xml")
settings_PDA_round2 <- pda.emulator(settings_PDA_round1)
Before moving on to the BayesianTools
case, here is what
your pecan.PDA.xml
could look like for a case where you
constrain parameters of more than one model PFT with more than one
Ameriflux data stream, note the ordering in
<assim.batch><param.names>
and
<pfts>
:
...
<assim.batch>
...
<param.names>
<PFT1>
<param>PFT1_PDA_param1</param>
<param>PFT1_PDA_param2</param>
</PFT1>
<PFT2>
<param>PFT2_PDA_param1</param>
<param>PFT2_PDA_param2</param>
<param>PFT2_PDA_param3</param>
</PFT2>
</param.names>
...
<inputs>
<file>
<input.id>YOURINPUTID</input.id>
<path>
<path>/YOURPATH/AmerifluxLBL_site_***/AMF_***_BASE_HH_***.csv</path>
</path>
<likelihood>Laplace</likelihood>
<variable.id>297</variable.id>
<variable.name>
<variable.name>NEE</variable.name>
<variable.name>UST</variable.name>
</variable.name>
</file>
<file>
<input.id>YOURINPUTID</input.id>
<path>
<path>/YOURPATH/AmerifluxLBL_site_***/AMF_***_BASE_HH_***.csv</path>
</path>
<likelihood>Laplace</likelihood>
<variable.id>298</variable.id>
<variable.name>
<variable.name>LE</variable.name>
<variable.name>UST</variable.name>
</variable.name>
</file>
</inputs>
...
</assim.batch>
...
<pfts>
<pft>
<name>PFT1</name>
<outdir>/YOUR/WORKFLOWS/PATH/PEcAn_WORKFLOWID/pft/PFT1</outdir>
<posteriorid>PFT1_POSTID</posteriorid>
</pft>
<pft>
<name>PFT2</name>
<outdir>/YOUR/WORKFLOWS/PATH/PEcAn_WORKFLOWID/pft/PFT2</outdir>
<posteriorid>PFT2_POSTID</posteriorid>
</pft>
</pfts>
...
Using the bruteforce PDA approach with one of the MCMC-flavors as
implemented in the BayesianTools R-package requires slightly different
tags to configure its setup. pda.bayesian.tools() would
look for sampler specific settings that can be passed through the
pecan.xml as a block under the <bt.settings>
tag.
Currently, the available samplers in the BayesianTools package are:
Note that it is not possible to use some of the samplers in this package for univariate cases. The ones that you can use for univariate cases are: “Metropolis”, “DE”, “DEzs” and “Twalk”. Some of the samplers are also restartable : “Metropolis”, “DE”, “DEzs”, “DREAM”, “DREAMzs”, “Twalk”. For further details please see the BayesianTools documentation which itself has a helpful vignette.
The name of the chosen sampler would be passed under
<sampler>
tag within the
<bt.settings>
block. E.g. for Metropolis
variations:
...
<method>bayesian.tools</method>
<bt.settings>
<iter>10000</iter>
<sampler>Metropolis</sampler>
<DRlevels>1</DRlevels> <-- set 2 for Delayed Rejection
<optimize>FALSE</optimize> <-- set TRUE for pre-optimization
<adapt>FALSE</adapt> <-- set TRUE for Adaptive Metropolis
<adaptationNotBefore>500</adaptationNotBefore> <-- set for Adaptive Metropolis
</bt.settings>
...
The rest of the <assim.batch>
block is the same
(except you can now omit the <jump>
tag as it takes
no effect in this case). While all the algorithms can be accessed by the
“Metropolis” sampler in BayesianTools package by specifying the
sampler’s settings, it could be easier to directly pass the sampler name
explicitly. Below is an example for DREAMzs (recommended) algorithm:
<assim.batch>
<method>bayesian.tools</method>
<bt.settings>
<sampler>DREAMzs</sampler>
<iter>10000</iter>
<chain>3</chain>
</bt.settings>
<param.names>
<soil>
<param>som_respiration_rate</param>
</soil>
<boreal.coniferous>
<param>psnTOpt</param>
<param>half_saturation_PAR</param>
<param>Vm_low_temp</param>
</boreal.coniferous>
</param.names>
<inputs>
<file>
<input.id>15000000028</input.id>
<path>
<path>/data/dbfiles/ICOS_site_1-266/FLX_FI-Var_FLUXNET2015_FULLSET_HH_2016-2018_beta-3.csv</path>
</path>
<likelihood>Laplace</likelihood>
<variable.id>297</variable.id>
<variable.name>
<variable.name>NEE</variable.name>
<variable.name>UST</variable.name>
</variable.name>
</file>
</inputs>
</assim.batch>
Assuming you save these tags under a pecan.PDA.xml
settings <- read.settings("pecan.PDA.xml")
settings_postPDA <- pda.bayesian.tools(settings)
In case of no convergence and/or if you would like to run the same chains longer, you can add the extension option:
settings_postPDA$assim.batch$extension <- 'longer'
settings_postPDA_longer <- pda.bayesian.tools(settings_postPDA)
Sometimes you might want to use your own local data (e.g. from a
tower that is not necessarily part of any flux tower network PEcAn has
access to, or some chamber measurements etc.) for constraining your
model. Although we still do recommend using formal PEcAn data
registration and processing pipelines, sometimes you may not have access
to the DB and/or may want to bypass other data processing steps to
perform a quick PDA analysis to plan ahead. In that case, you can use
external.*
arguments of the pda.emulator()
function. You can prepare the external.*
arguments
manually, but the easiest way to generate them is to use
pda.generate.externals()
function that you can find under
the script with the same name. pda.emulator()
function
accepts the following external.*
arguments:
external.data
This is a list containing your
constraining variables in the format expected by the PDA functions (see
also pda.generate.externals()
function). The length of this
list is the number of your data streams (e.g. if you are using NEE and
LE in calibration, it will have two sublists). Hint: You may want to
perform case #1 and load the history.pda
file to check the
format of the inputs
object out, because if
external.data
is passed to the pda.emulator()
it will simply be assined to this variable,
inputs <- external.data
. To be able to prepare
external.data
with the
pda.generate.externals()
function you need your data as a(n
ordered) list where each sublist corresponds to a data frame of your
constraining variable with two columns, variable name - posix.
external.formats
This is a list of variable names in
the format expected by the PDA functions (see also
pda.generate.externals()
function). The length of this list
is the number of your data streams (e.g. if you are using NEE and LE in
calibration, it will have two sublists). Depending on expressions
provided in this object, PDA functions are capable of performing simple
derivations on data. Hence, this list has a slightly more complex
structure than a simple list of names, but don’t worry about it. To be
able to prepare external.formats
with the
pda.generate.externals()
it is enough to pass
varn
vector that has the standard variable names (when in
doubt, check the standard variable names under
/base/utils/data/standard_vars.csv
).
external.priors
This is a list of pecan probability
distribution dataframes to be used as PDA priors, one sublist per PFT,
e.g.:
[[1]]
distn parama paramb n
frozenSoilEff beta 1.000 1.00 NA
soilRespMoistEffect unif 0.000 2.00 NA
...
[[2]]
distn parama paramb n
growth_resp_factor beta 2.630 6.5e+00 0
root_respiration_rate norm 8.700 9.5e-01 NA
...
external.knots
This is a list of dataframes containing
emulator training points. Only pass it if you want to build the emulator
on specific knots over which you want to have control, otherwise knots
are generated from the PDA priors using Latin Hypercube Sampling. Each
sublist should be a nknot x nPFTparam
dataframe, e.g. if
you are building an emulator with 250 knots and your PFT has 30
parameters, this will be a 250 x 30 dataframe where each row is a
parameter vector. Therefore only the columns of parameters that you are
targeting in PDA should have varying parameter values. E.g. if you are
calibrating the psnTOpt
parameter, it would look like
this:> external.knots
$temperate.deciduous
... half_saturation_PAR fine_root_respiration_Q10 psnTOpt AmaxFrac ...
[1,] ... 15.5 3.2 24.83626 0.8605507 ...
[2,] ... 15.5 3.2 25.18056 0.8605507 ...
[3,] ... 15.5 3.2 26.10204 0.8605507 ...
[4,] ... 15.5 3.2 25.03353 0.8605507 ...
... ... ... ... ... ... ...
Once you prepare external.data
and
external.format
, now you can omit the
<input.id></input.id>
and
<path></path>
tags in the
<assim.batch>
block (they will be ignored even if you
pass them when external.data
is not NULL:
<assim.batch>
<method>emulator</method>
<n.knot>60</n.knot>
<iter>100000</iter>
<chain>3</chain>
<param.names>
<boreal.coniferous>
<param>psnTOpt</param>
<param>half_saturation_PAR</param>
<param>Vm_low_temp</param>
</boreal.coniferous>
</param.names>
<jump>
<adapt>250</adapt>
<adj.min>0.1</adj.min>
<ar.target>0.3</ar.target>
</jump>
<inputs>
<file>
<likelihood>Laplace</likelihood>
<variable.id>297</variable.id>
<variable.name>
<variable.name>NEE</variable.name>
<variable.name>UST</variable.name>
</variable.name>
</file>
</inputs>
</assim.batch>
Save and read your pecan.PDA.xml
accordingly. Next, pass
the external data and formats you prepared explicitly to the
pda.emulator()
function:
settings <- read.settings("pecan.PDA.xml")
pda.externals <- pda.generate.externals(external.data = TRUE, obs = obs, varn = "NEE", varid = 297, external.formats = TRUE)
settings_postPDA <- pda.emulator(settings, external.data = pda.externals$external.data, external.formats = pda.externals$external.formats)
Note that this way we provided external constraint data and its
format information to the PDA functions but we are still using database
connection. If you want to use PDA functions with no database
connection, then you need to provide prior list to the function as an
external object as well and run it in remote = TRUE
mode.
In that case, it is also recommended to pass an ensemble.id
as an identifier for your PDA run (normally this is given by the PDA
workflow), it can even be a string:
settings <- read.settings("pecan.PDA.xml")
prior.list <- list(data.frame(distn = c("unif", "unif", "norm"),
parama = c(5, 4, 0),
paramb = c(40, 27, 3),
n = rep(NA, 3),
row.names = c("psnTOpt", "half_saturation_PAR", "Vm_low_temp")))
pda.externals <- pda.generate.externals(external.data = TRUE, obs = obs, varn = "NEE", varid = 297, external.formats = TRUE, external.priors = TRUE, prior.list = prior.list)
settings_postPDA <- pda.emulator(settings, external.data = pda.externals$external.data,
external.formats = pda.externals$external.formats,
external.priors = pda.externals$external.priors,
remote = TRUE, ensemble.id = "PDA_VIGNETTE")
Two more notes here before moving onto the next case. Above, in the
prior.list
object, we only passed prior distribution for
the three parameters we target. In this case rest of the model
parameters will be kept as their defaults (whichever way it is
implemented in the model package), they won’t be kept at the medians
of their prior range as it is normally the case in PDA because we
haven’t passed a prior (range) for them. If you don’t want that to
happen, pass a more comprehensive list or try to use prior (or meta
analysis posterior) objects generated by PEcAn workflows. Finally, we
haven’t used the external.knots
argument yet. We only use
it when we need to overwrite and enforce the training points for the
emulator and don’t necessarily need to populate this argument when doing
PDA without database connection. We leave testing with
external.knots
as an exercise.
It’s all very much the same as the cases above, in combination of #2
(prepare BayesianTools
specific xml tags) and #3 (prepare
external arguments).
<assim.batch>
<method>bayesian.tools</method>
<bt.settings>
<sampler>DREAMzs</sampler>
<iter>10000</iter>
<chain>3</chain>
</bt.settings>
<param.names>
<boreal.coniferous>
<param>psnTOpt</param>
<param>half_saturation_PAR</param>
<param>Vm_low_temp</param>
</boreal.coniferous>
</param.names>
<inputs>
<file>
<likelihood>Laplace</likelihood>
<variable.id>297</variable.id>
<variable.name>
<variable.name>NEE</variable.name>
<variable.name>UST</variable.name>
</variable.name>
</file>
</inputs>
</assim.batch>
Save and read your pecan.PDA.xml
accordingly. Next, pass
the external data and formats you prepared explicitly to the
pda.bayesian.tools()
function:
settings <- read.settings("pecan.PDA.xml")
pda.externals <- pda.generate.externals(external.data = TRUE, obs = obs, varn = "NEE", varid = 297, external.formats = TRUE, external.priors = TRUE, prior.list = prior.list)
settings_postPDA <- pda.bayesian.tools(settings, external.data = pda.externals$external.data,
external.formats = pda.externals$external.formats,
external.priors = pda.externals$external.priors,
remote = TRUE, ensemble.id = "PDA_VIGNETTE")