fitvario {RandomFields}R Documentation

LSQ and Maximum Likelihood Estimation of Random Field Parameters

Description

The function estimates arbitrary parameters of a random field specification with various methods.

Usage

fitvario(x, y=NULL, z=NULL, T=NULL, data, model, param,
         lower=NULL, upper=NULL, sill=NA, grid=!missing(gridtriple),
          gridtriple, ...)

fitvario.default(x, y=NULL, z=NULL, T=NULL, data, model, param,
         grid=!missing(gridtriple), gridtriple=FALSE,
         trend = NULL,
         BC.lambda, ## if missing then no BoxCox-Trafo
         BC.lambdaLB=-10, BC.lambdaUB=10,  
         lower=NULL, upper=NULL, sill=NA, 
         use.naturalscaling=FALSE, PrintLevel,
         optim.control=NULL, bins=20, nphi=1, ntheta=1, ntime=20,
         distance.factor=0.5,
         upperbound.scale.factor=3, lowerbound.scale.factor=3,
         lowerbound.scale.LS.factor=5,
         upperbound.var.factor=10, lowerbound.var.factor=100,
         lowerbound.sill=1E-10, scale.max.relative.factor=1000,
         minbounddistance=0.001, minboundreldist=0.02,
         approximate.functioncalls=50, refine.onborder=TRUE,
         minmixedvar=1/1000, maxmixedvar=1000,
         pch=RFparameters()$pch,
         transform=NULL, standard.style=NULL,
         var.name="X", time.name="T",
         lsq.methods=c("self", "plain", "sqrt.nr", "sd.inv", "internal"),
         mle.methods=c("ml"),
         cross.methods=NULL,

         users.guess=NULL, only.users = FALSE,
         Distances=NULL, truedim,
         solvesigma = NA, # if NA then use algorithm -- ToDo
         allowdistanceZero = FALSE,
         na.rm = TRUE)

Arguments

x

(n x 2)-matrix of coordinates, or vector of x-coordinates. All locations must be given explicitely and cannot be passed via a grid definition as in GaussRF

y

vector of y coordinates

z

vector of z coordinates

T

vector of T coordinates; these coordinates are given in triple notation, see GaussRF

data

vector or matrix of values measured at coord; If a matrix is given then the columns are interpreted as independent realisations.
If also a time component is given, then in the data the indices for the spatial components run the fastest.

If an n-variate model is used, then each realisation is given as n consecutive columns of data.

model

string or list; covariance model, see CovarianceFct and Covariance, or type PrintModelList() to get all options.

If model is a list, then the parameters with value NA are estimated. Parameters that have value NaN should be explicitely be defined by the function transform. An alternative to define NaN values and the function transform, is to replace the NaN by a real-valued function with solely parameter a list defining a covariance model. In case of the anisotropy matrix, the matrix must be replaced by a list if functions are introduced. Only the list elements variance, scale or anisotropy, and kappas can be used, and not the mean or the trend. Further, the mean or the trend cannot be set by such a function. See also transform below.

param

vector or matrix or NULL. If vector then param=c(mean, variance, nugget, scale,...); the parameters must be given in this order. Further parameters are to be added in case of a parametrised class of covariance functions, see CovarianceFct and Covariance. Any components set to NA are estimated; the others are kept fix. See also model above.

lower

list or vector. Lower bounds for the parameters. If param is a vector, lower has to be a vector as well and its length must equal the number of parameters to be estimated. The order of param has to be maintained. A component being NA means that no manual lower bound for the corresponding parameter is set.

If param is a list, lower has to be of (exactly) the same structure.

upper

list or vector. Upper bounds for the parameters. See also lower.

sill

If not NA the sill is kept fix. Only used if the standard format for the covariance model is given. See Details.

grid

boolean. Weather coordinates give a grid

gridtriple

boolean. Format, see GaussRF

BC.lambda

a vector of at most two numerical components (just one component corresponds to two identical ones) which are the parameters of the box-cox-transformation: (x^λ_1-1)/λ_1+λ_2 If the model is univariate, the first parameter can be estimated by using NA.

BC.lambdaLB

lower bound for the first box-cox-parameter

BC.lambdaUB

upper bound for the first box-cox-parameter

trend

If a univariate model is used, the following trend types are possible:
number: the constant mean (not to be estimated any more)
NA: there is a constant mean to be estimated
formula : uses X1, X2,... and T as internal
parameters for the coordinates; all parameters are estimated
list of matrices: length of the list must be the number of realisations; each matrix must have the same number of rows as x
list of matrices and formula: trend is a list of matrices (see above) and one additional entry which is a formula

In an n-variate model trend can be either a list of n trends for univariate models or a list of n*d matrices (d: number of independent realisations) where each entry of trend corresponds to a column of data.

...

arguments as given in fitvario.default and listed in the following.

use.naturalscaling

logical. Only used if model is given in standard (simple) way. If TRUE then internally, rescaled covariance functions will be used for which cov(1)~=0.05. use.naturalscaling has the advantage that scale and the form parameters of the model get ‘orthogonal’, but use.naturalscaling does not work for all models. See Details.

PrintLevel

level to which messages are shown. See Details.

optim.control

control list for optim, which uses 'L-BFGS-B'. However 'parscale' may not be given.

bins

number of bins of the empirical variogram. See Details.

nphi

scalar or vector of 2 components. If it is a vector then the first component gives the first angle of the xy plane and the second one gives the number of directions on the half circle. If scalar then the first angle is assumed to be zero. Note that a good estimation of the variogramm by LSQ with a anisotropic model a large value for ntheta might be needed (about 20).

ntheta

scalar or vector of 2 components. If it is a vector then the first component gives the first angle in the third direction and the second one gives the number of directions on the half circle. If scalar then the first angle is assumed to be zero.

Note that a good estimation of the variogramm by LSQ with a anisotropic model a large value for ntheta might be needed (about 20).

ntime

scalar or vector of 2 components. if ntimes is a vector, then the first component are the maximum time distance (in units of the grid length T[3]) and the second component gives the step size (in units of the grid length T[3]). If scalar then the step size is assumed to 1 (in units of the grid length T[3]).

distance.factor

relative right bound for the bins. See Details.

upperbound.scale.factor

relative upper bound for scale in LSQ and MLE. See Details.

lowerbound.scale.factor

relative lower bound for scale in MLE. See Details.

lowerbound.scale.LS.factor

relative lower bound for scale in LSQ. See Details.

upperbound.var.factor

relative upper bound for variance and nugget. See Details.

lowerbound.var.factor

relative lower bound for variance. See Details.

lowerbound.sill

absolute lower bound for variance and nugget. See Details.

scale.max.relative.factor

relative lower bound for scale below which an additional nugget effect is detected. See Details.

minbounddistance

absolute distance to the bounds below which a part of the algorithm is considered as having failed. See Details.

minboundreldist

relative distance to the bounds below which a part of the algorithm is considered as having failed. See Details.

approximate.functioncalls

approximate evaluations of the ML target function on a grid. See Details.

refine.onborder

logical. If refine.onborder=TRUE and if the result of any maximum likelihood method or cross validation method is on a borderline, then the optimisation is redone in a modified way (which takes about double extra time)

minmixedvar

lower bound for variance in a mixed model; so, the covariance model for mixed model part might be calibrated appropriately

maxmixedvar

upper bound for variance in a mixed model; so, the covariance model for mixed model part might be calibrated appropriately

pch

character shown before evaluating any method; if pch!="" then one or two additional steps in the MLE methods are marked by “+” and “#”. Default: "*".

var.name

basic name for the coordinates in the formula of the trend. Default: ‘X’

time.name

basic name for the time component in the formula of the trend. Default: ‘X’

transform

function. Essentially, transform allows for the definition of a parameter as a function of other estimated parameters. All the parameters are supposed to be in a vector called ‘param’ where the positions are given by parampositions. An example of transform is function(param) {param[3] <- 5 - param[1]; param}.

Note that the mean and the trend of the model can be neither set nor used in transform. See also standard.style.

Note further that many internal checks cannot be performed in case of the very flexible function transform. Hence, it is completely up to the user to get users.guess, lower and upper right. The parameter users.guess must be given; lower and upper should be given.

Default: NULL

standard.style

logical or NULL. This variable should only be set by the advanced user. If NULL then standard.style will be TRUE if the covariance model allows for a ‘standard’ definition (see CovarianceFct) and transform is NULL. If a ‘standard’ definition is given and both the variance and the nugget are either not estimated or do not appear on the right hand side of the transform, then standard.style might be set to TRUE by the user. This accelerates the MLE algorithm. The responsibility is completely left to the user, then.

lsq.methods

variants of the least squares fit of the variogram. See Details.

mle.methods

variants of the maximum likelihood fit of the covariance function. See Details.

cross.methods

Not implemented yet.

users.guess

User's guess of the parameters. All the parameters must be given using the same rules as for either param (except that no NA's should be contained) or model.

only.users

boolean. If true then only users.guess is used as a starting point for the fitting algorithms

Distances

alternatively to coordinates x, y, and z the distances themselves can be given. Then truedim must be indicated.

truedim

see Distances

solvesigma

Boolean – experimental stage! If a mixed effect part is present where the variance has to be estimated, then this variance parameter is solved iteratively within the profile likelihood function, if solvesigma=TRUE.This makes sense if the number of independent variables is very small. If solvesigma=FALSE then the variance parameter is treated as any other parameter to be estimated.

allowdistanceZero

boolean. If true, then multiple observations are allowed within a single data set. In this case, the coordinates are slightly scattered, so that the points have some tiny distances.

na.rm

boolean – experimental stage. Only the data may have missing values. If na.rm=TRUE then lines of (repeated) data are deleted if at least one missing value appears. If na.rm=FALSE then the repetitions are treated sepeartely.

Details

The optimisations are performed using optimize if one parameter has to be estimated only and optim, otherwise.

First, by means of various control parameters, see below, the algorithm first tries to estimate the bounds for the parameters to be estimated, if the bounds for the parameters are not given.
Independently whether users.guess is given, the algorithm guesses initial values for the parameters. The automatic guess and the user's guess will be called primitive methods in the following.

Second, the variogram model is fitted by various least squares methods (according to the value of lsq.methods) using the best parameter set among the primitive methods as initial value if the effective number of parameters is greater than 1.

[Remarks: (i) “best” with respect to the target value of the respective lsq method; (ii) the effective number of parameters in the optimisation algorithm can be smaller than the number of estimated parameters, since in some cases, some parameters can be calculated explicitely; relevant for the choice between optimize and optim is the effective number of parameters; (iii) optim needs]

Third, the model is fitted by various maximum likelihood methods (according to the value of mle.methods) using the best parameter set among the primitive methods and the lsq methods as initial value (if the effective number of parameters is greater than 1).

Comments on specific parameters:

Value

The function returns a list with the following elements

ev

list returned by EmpiricalVariogram

table

Matrix. The first rows contain the estimated parameters, followed by the target values of all methods for the given set of parameters; the last rows give the lower and upper bounds used in the estimations. The columns correspond to the various estimation methods for the parameters.

lowerbounds

lower bounds

lowerbounds

upper bounds

transform

transformation function

vario

obsolete

self

list containing

  • modelthe variogram or covariance model

  • residualsNULL

  • ml.valuethe likelihood value for the model

plain, sqrt.nr, sd.inv, internal, ml, reml

see self; excepetion is ml, where the residuals are given instead of NULL.

Acknowledgement

Thanks to Paulo Ribeiro for hints and comparing the perliminary versions of fitvario in RandomFields V1.0 to likfit of the package geoR whose homepage is at http://www.est.ufpr.br/geoR/.

Note

This function does not depend on the value of RFparameters()$PracticalRange. The function fitvario always uses the standard specification of the covariance model as given in CovarianceFct.

Further, the function has implemented accelerations if the model is simple. E.g., if there is a common variance to estimated and the definition by lists is used, then the leading model should be ‘$’ with var=NA.

Author(s)

Martin Schlather, martin.schlather@math.uni-goettingen.de http://www.stochastik.math.uni-goettingen.de/~schlather

References

See Also

Covariance, CovarianceFct, GetPracticalRange, parampositions RandomFields, weather.

Examples




 

 model <-"gencauchy"
 param <- c(0, 1, 0, 1, 1, 2)
 estparam <- c(0, NA, 0, NA, NA, 2) ## NA means: "to be estimated"
 ## sequence in `estparam' is
 ## mean, variance, nugget, scale, (+ further model parameters)
 ## So, mean, variance, and scale will be estimated here.
 ## Nugget is fixed and equals zero.
 points <- 100
 x <- runif(points,0,3)
 y <- runif(points,0,3) ## 300 random points in square [0, 3]^2
 ## simulate data according to the model:
 d <- GaussRF(x=x, y=y, grid=FALSE, model=model, param=param, n=1000) #1000
 ## fit the data:

Print(fitvario(x=cbind(x,y), data=d, model=model, param=estparam,
    lower=c(0.1, 0.1, 0.1), upper=c(1.9, 5, 2)))


#########################################################
## The next two estimations give about the same result.
## For the first the sill is fixed to 1.5. For the second the sill
## is reached if the estimated variance is smaller than 1.5
estparam <- c(0, NA, NA, NA, NA, NA) 
## Not run: 
Print(v <- fitvario(x=cbind(x,y), data=d, model=model, param=estparam,
    sill=1, use.nat=FALSE)) ## gencauchy works better with use.nat=FALSE

## End(Not run)

estmodel <-  list("+",
                  list("$", var=NA, scale=NA,
                       list("gencauchy", alpha=NA, beta=NA)
                       ),
                  list("$", var=NA, list("nugget"))
                 )
parampositions(model=estmodel, dim=2)
f <- function(variab) c(variab, max(0, 1.0 - variab[1]))
## Not run: 
Print(v2 <- fitvario(x=cbind(x,y), data=d, model=estmodel, 
                  lower = c(TRUE, TRUE, TRUE, TRUE, FALSE),
                  transform=f, use.nat=FALSE))

## End(Not run)

#########################################################
## estimation of coupled parameters (alpha = beta, here)
# source("RandomFields/tests/source.R")
f <- function(param) param[c(1:3,3,4)]
## Not run: 
Print(fitvario(x=cbind(x,y), data=d, model=estmodel,
             lower=c(TRUE, TRUE, TRUE, FALSE, TRUE),
             transform=f))

## End(Not run)



#########################################################
## estimation in a anisotropic framework

x <- y <- (1:6)/4
model <- list("$", aniso=matrix(nc=2, c(4,2,-2,1)), var=1.5,
              list("exp"))
z <- GaussRF(x=x, y=y, grid=TRUE, model=model, n=10)
estmodel <-list("$", aniso=matrix(nc=2, c(NA,NA,-2,1)), var=NA,
                list("exp"))
Print(fitvario(as.matrix(expand.grid(x, y)), data=z,
             model=estmodel, nphi=20))





#########################################################
## estimation with trend (formula)
model <- list("$", var=1, scale=2, list("gauss"))
estmodel <- list("$", var=NA, scale=NA, list("gauss"))
x <- seq(-pi,pi,pi/2)
n <- 5
data <- GaussRF(x, x, gridtri=FALSE, model=model,
       trend=function(X1,X2) sin(X1) + 2*cos(X2),n=n)
Print(v <- fitvario(x, x, data=data, gridtrip=FALSE,
                  model=estmodel,
                  trend=~sin(X1)+cos(X1)+sin(X2)+cos(X2)))



########################################################
## estimation of anisotropy matrix with two identical ##
## diagonal elements                                  ##
## Not run: 
x <- c(0, 5, 0.4)
model <- list("$", var=1, scale=1, list("exponential"))
z <- GaussRF(x, x, x, model=model, gridtriple=TRUE, n=10, Print=2)

est.model <- list("+",
                 list("$", var=NA, aniso=diag(c(NA, NA, NA)), list("exponen")),
                 list("$", var=NA, list("nugget")))
parampositions(est.model, dim=3)
trafo <- function(variab) {variab[c(1:2, 2:4)]}
lower <- c(TRUE, TRUE, FALSE, TRUE, TRUE) # which parameter to be estimated
fitlog <- fitvario(x, x, x, gridtriple=TRUE, data=z, model=est.model,
                   transform=trafo, lower=lower)
str(fitlog$ml)

## End(Not run)


[Package RandomFields version 2.0.54 Index]