Types of bootstrap scheme Bootstrapping (statistics)
1 types of bootstrap scheme
1.1 case resampling
1.1.1 estimating distribution of sample mean
1.1.2 regression
1.2 bayesian bootstrap
1.3 smooth bootstrap
1.4 parametric bootstrap
1.5 resampling residuals
1.6 gaussian process regression bootstrap
1.7 wild bootstrap
1.8 block bootstrap
1.8.1 time series: simple block bootstrap
1.8.2 time series: moving block bootstrap
1.8.3 cluster data: block bootstrap
types of bootstrap scheme
in univariate problems, acceptable resample individual observations replacement ( case resampling below) unlike subsampling, in resampling without replacement , valid under weaker conditions compared bootstrap. in small samples, parametric bootstrap approach might preferred. other problems, smooth bootstrap preferred.
for regression problems, various other alternatives available.
case resampling
bootstrap useful estimating distribution of statistic (e.g. mean, variance) without using normal theory (e.g. z-statistic, t-statistic). bootstrap comes in handy when there no analytical form or normal theory estimate distribution of statistics of interest, since bootstrap method can apply random quantities, e.g., ratio of variance , mean. there @ least 2 ways of performing case resampling.
estimating distribution of sample mean
consider coin-flipping experiment. flip coin , record whether lands heads or tails. let x = x1, x2, …, x10 10 observations experiment. xi = 1 if th flip lands heads, , 0 otherwise. normal theory, can use t-statistic estimate distribution of sample mean,
x
¯
=
1
10
(
x
1
+
x
2
+
…
+
x
10
)
{\displaystyle {\bar {x}}={\frac {1}{10}}(x_{1}+x_{2}+\ldots +x_{10})}
.
instead, use bootstrap, case resampling, derive distribution of
x
¯
{\displaystyle {\bar {x}}}
. first resample data obtain bootstrap resample. example of first resample might x1* = x2, x1, x10, x10, x3, x4, x6, x7, x1, x9. note there duplicates since bootstrap resample comes sampling replacement data. note number of data points in bootstrap resample equal number of data points in our original observations. compute mean of resample , obtain first bootstrap mean: μ1*. repeat process obtain second resample x2* , compute second bootstrap mean μ2*. if repeat 100 times, have μ1*, μ2*, …, μ100*. represents empirical bootstrap distribution of sample mean. empirical distribution, 1 can derive bootstrap confidence interval purpose of hypothesis testing.
regression
in regression problems, case resampling refers simple scheme of resampling individual cases - rows of data set. regression problems, long data set large, simple scheme acceptable. however, method open criticism.
in regression problems, explanatory variables fixed, or @ least observed more control response variable. also, range of explanatory variables defines information available them. therefore, resample cases means each bootstrap sample lose information. such, alternative bootstrap procedures should considered.
bayesian bootstrap
bootstrapping can interpreted in bayesian framework using scheme creates new datasets through reweighting initial data. given set of
n
{\displaystyle n}
data points, weighting assigned data point
i
{\displaystyle i}
in new dataset
d
j
{\displaystyle {\mathcal {d}}^{j}}
w
i
j
=
x
i
j
−
x
i
−
1
j
{\displaystyle w_{i}^{j}=x_{i}^{j}-x_{i-1}^{j}}
,
x
j
{\displaystyle \mathbf {x} ^{j}}
low-to-high ordered list of
n
−
1
{\displaystyle n-1}
uniformly distributed random numbers on
[
0
,
1
]
{\displaystyle [0,1]}
, preceded 0 , succeeded 1. distributions of parameter inferred considering many such datasets
d
j
{\displaystyle {\mathcal {d}}^{j}}
interpretable posterior distributions on parameter.
smooth bootstrap
under scheme, small amount of (usually distributed) zero-centered random noise added onto each resampled observation. equivalent sampling kernel density estimate of data.
parametric bootstrap
in case parametric model fitted data, maximum likelihood, , samples of random numbers drawn fitted model. sample drawn has same sample size original data. quantity, or estimate, of interest calculated these data. sampling process repeated many times other bootstrap methods. use of parametric model @ sampling stage of bootstrap methodology leads procedures different obtained applying basic statistical theory inference same model.
resampling residuals
another approach bootstrapping in regression problems resample residuals. method proceeds follows.
this scheme has advantage retains information in explanatory variables. however, question arises residuals resample. raw residuals 1 option; studentized residuals (in linear regression). whilst there arguments in favour of using studentized residuals; in practice, makes little difference , easy run both schemes , compare results against each other.
gaussian process regression bootstrap
when data temporally correlated, straightforward bootstrapping destroys inherent correlations. method uses gaussian process regression fit probabilistic model replicates may drawn. gaussian processes methods bayesian non-parametric statistics here used construct parametric bootstrap approach, implicitly allows time-dependence of data taken account.
wild bootstrap
the wild bootstrap, proposed wu (1986), suited when model exhibits heteroskedasticity. idea is, residual bootstrap, leave regressors @ sample value, resample response variable based on residuals values. is, each replicate, 1 computes new
y
{\displaystyle y}
based on
y
i
∗
=
y
^
i
+
ϵ
^
i
v
i
{\displaystyle y_{i}^{*}={\hat {y}}_{i}+{\hat {\epsilon }}_{i}v_{i}}
so residuals randomly multiplied random variable
v
i
{\displaystyle v_{i}}
mean 0 , variance 1. method assumes true residual distribution symmetric , can offer advantages on simple residual sampling smaller sample sizes. different forms used random variable
v
i
{\displaystyle v_{i}}
, such as
the standard normal distribution
a distribution suggested mammen (1993).
v
i
=
{
−
(
5
−
1
)
/
2
prob.
(
5
+
1
)
/
(
2
5
)
(
5
+
1
)
/
2
prob.
(
5
−
1
)
/
(
2
5
)
{\displaystyle v_{i}=\left\{{\begin{matrix}-({\sqrt {5}}-1)/2&{\mbox{with prob. }}({\sqrt {5}}+1)/(2{\sqrt {5}})\\({\sqrt {5}}+1)/2&{\mbox{with prob. }}({\sqrt {5}}-1)/(2{\sqrt {5}})\end{matrix}}\right.}
.
approximately, mammen s distribution is:
v
i
=
{
−
0.6180
prob.
0.7286
1.6180
prob.
0.2764
{\displaystyle v_{i}=\left\{{\begin{matrix}-0.6180&{\mbox{with prob. }}0.7286\\1.6180&{\mbox{with prob. }}0.2764\end{matrix}}\right.}
or simpler distribution, linked rademacher distribution:
v
i
=
{
−
1
prob.
1
/
2
1
prob.
1
/
2
{\displaystyle v_{i}=\left\{{\begin{matrix}-1&{\mbox{with prob. }}1/2\\1&{\mbox{with prob. }}1/2\end{matrix}}\right.}
block bootstrap
the block bootstrap used when data, or errors in model, correlated. in case, simple case or residual resampling fail, not able replicate correlation in data. block bootstrap tries replicate correlation resampling instead blocks of data. block bootstrap has been used data correlated in time (i.e. time series) can used data correlated in space, or among groups (so-called cluster data).
time series: simple block bootstrap
in (simple) block bootstrap, variable of interest split non-overlapping blocks.
time series: moving block bootstrap
in moving block bootstrap, introduced künsch (1989), data split n-b+1 overlapping blocks of length b: observation 1 b block 1, observation 2 b+1 block 2 etc. these n-b+1 blocks, n/b blocks drawn @ random replacement. aligning these n/b blocks in order picked, give bootstrap observations.
this bootstrap works dependent data, however, bootstrapped observations not stationary anymore construction. but, shown varying randomly block length can avoid problem. method known stationary bootstrap. other related modifications of moving block bootstrap markovian bootstrap , stationary bootstrap method matches subsequent blocks based on standard deviation matching.
cluster data: block bootstrap
cluster data describes data many observations per unit observed. observing many firms in many states, or observing students in many classes. in such cases, correlation structure simplified, , 1 make assumption data correlated group/cluster, independent between groups/clusters. structure of block bootstrap obtained (where block corresponds group), , groups resampled, while observations within groups left unchanged. cameron et al. (2008) discusses clustered errors in linear regression.
Comments
Post a Comment