| Title: | Assessing a multivariate goodness of fit |
|---|---|
| Description: | Functions for the computation of several goodness of fit statistics. Currently impemented are multivariate versions of well known EDF statistics: The Anderson-Darling and Cramer-von Mises statistics for multivariate uniformly(0,1) distributed data and the bivariate Kolmogorov- Smirnov statistic for (0,1) distributed data. Further Rosenblatt's transformation for multivariate normally distributed data is implemented, such that these tests can be applied as a test for normality. |
| Authors: | Andreas Maendle [aut, cre] |
| Maintainer: | Andreas Maendle <[email protected]> |
| License: | GPL-3 |
| Version: | 0.0.0.9000 |
| Built: | 2026-06-01 09:44:16 UTC |
| Source: | https://github.com/amaendle/mvGoF |
KS.cval returns the critical values of for the Anderson-Darling test statistic.
AD.cval(n, prob)AD.cval(n, prob)
n |
Sample size. |
prob |
Quantile. |
KS.quantiles returns quantiles for the Anderson-Darling test.
AD.quantiles( alphas = c(0.25, 0.2, 0.15, 0.1, 0.05, 0.025, 0.01, 0.005, 0.0025, 0.001) )AD.quantiles( alphas = c(0.25, 0.2, 0.15, 0.1, 0.05, 0.025, 0.01, 0.005, 0.0025, 0.001) )
alphas |
Alpha levels for which the quantiles should be returned. |
rtrafo for Rosenblatt's transformation on elliptical and Archimedean copulas.
x <- matrix(rnorm(50), nrow=25, ncol=2)x <- matrix(rnorm(50), nrow=25, ncol=2)
AD.stat returns the multivariate Anderson-Darling test statistic for uniformly distributed data.
AD.stat(x, nsum = 1000)AD.stat(x, nsum = 1000)
x |
The sample. |
nsum |
See |
AD.stat uses polylog from the copula package.
Other "test statistics":
CvM.stat(),
KSa.stat(),
biKS.stat()
set.seed(102015) u <- matrix(runif(50), ncol=2) plot(u) AD.stat(u) #[1] 1.570951 x <- matrix( seq(0.01, 0.99, length=100), ncol=2) plot(x) AD.stat(x) x <- matrix(c(0.5,0.1), ncol=2) plot(x) AD.stat(x) x <- MASS::mvrnorm(n = 200, rep(0, 2), diag(1,2,2)) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) AD.stat(u) # true situation: Data from Gumbel copula with known corellation. # false assumption: Gauss copula # Rosenblatt#s transformation under wrong assumptions and computation of AD.stat set.seed(5) library("gumbel") gdata <- qlnorm(rgumbel(50,alpha=2,dim=5)) pairs(gdata) sigma2 <- matrix((1/sqrt(2)),nrow=5,ncol=5)-diag(((1/sqrt(2))-1),5) sigma2 ndata <- qnorm(plnorm(gdata)) pairs(ndata) rbdata <- rosenblatt.norm(ndata, rep(0,5),sigma2) pairs(rbdata) AD.stat(rbdata)set.seed(102015) u <- matrix(runif(50), ncol=2) plot(u) AD.stat(u) #[1] 1.570951 x <- matrix( seq(0.01, 0.99, length=100), ncol=2) plot(x) AD.stat(x) x <- matrix(c(0.5,0.1), ncol=2) plot(x) AD.stat(x) x <- MASS::mvrnorm(n = 200, rep(0, 2), diag(1,2,2)) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) AD.stat(u) # true situation: Data from Gumbel copula with known corellation. # false assumption: Gauss copula # Rosenblatt#s transformation under wrong assumptions and computation of AD.stat set.seed(5) library("gumbel") gdata <- qlnorm(rgumbel(50,alpha=2,dim=5)) pairs(gdata) sigma2 <- matrix((1/sqrt(2)),nrow=5,ncol=5)-diag(((1/sqrt(2))-1),5) sigma2 ndata <- qnorm(plnorm(gdata)) pairs(ndata) rbdata <- rosenblatt.norm(ndata, rep(0,5),sigma2) pairs(rbdata) AD.stat(rbdata)
CvM.stat returns the value of the bivariate Kolmogorov-Smirnov test statistic
biKS.stat(u)biKS.stat(u)
u |
Data set. Coloumn size equals the dimension of the samples and the line number equals the number of observations. |
The Kolmogorov-Smirnov test statistic is in fact the same as the squared L_infinity star discrepancy.
Justel, A, D. Pe~na and R. Zamar, 1997. A multivariate Kolmogorov-Smirnov test of goodness of fit. Statistics & Probability Letters 35, 251-259.
Chiu, S.N. and K.I. Liu. Generalized Cramér–von Mises goodness-of-fit tests for multivariate distributions. Computational Statistics & Data Analysis, 53(11):3817-3834, 2009.
Other "test statistics":
AD.stat(),
CvM.stat(),
KSa.stat()
set.seed(102015) u <- matrix(runif(50), ncol=2) plot(u) CvM.stat(u) #[1] 0.006029274 biKS.stat(u) #[1] 0.252292 x <- MASS::mvrnorm(n = 200, rep(0, 2), diag(1,2,2)) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) CvM.stat(u) #[1] 0.002172406 biKS.stat(u) #[1] 0.1058533 u <- 1/array(9:4, c(6,2)) CvM.stat(u) biKS.stat(u)set.seed(102015) u <- matrix(runif(50), ncol=2) plot(u) CvM.stat(u) #[1] 0.006029274 biKS.stat(u) #[1] 0.252292 x <- MASS::mvrnorm(n = 200, rep(0, 2), diag(1,2,2)) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) CvM.stat(u) #[1] 0.002172406 biKS.stat(u) #[1] 0.1058533 u <- 1/array(9:4, c(6,2)) CvM.stat(u) biKS.stat(u)
KS.cval returns the critical values of for the Cramér-von Mises test statistic.
CvM.cval(n, prob)CvM.cval(n, prob)
n |
Sample size. |
prob |
Quantile. |
KS.quantiles returns quantiles for the Cramér-von Mises test.
CvM.quantiles( alphas = c(0.25, 0.2, 0.15, 0.1, 0.05, 0.025, 0.01, 0.005, 0.0025, 0.001) )CvM.quantiles( alphas = c(0.25, 0.2, 0.15, 0.1, 0.05, 0.025, 0.01, 0.005, 0.0025, 0.001) )
alphas |
Alpha levels for which the quantiles should be returned. |
rtrafo for Rosenblatt's transformation on elliptical and Archimedean copulas.
x <- matrix(rnorm(50), nrow=25, ncol=2)x <- matrix(rnorm(50), nrow=25, ncol=2)
CvM.stat returns the value of the Cramér-von Mises test statistic for testing uniformity.
CvM.stat(u)CvM.stat(u)
u |
Data set. Coloumn size equals the dimension of the samples and the line number equals the number of observations. |
The CvM test statistic is in fact the same as the squared L2 star discrepancy from Warnock (1972).
u has couloumn size equalto the dimension of the samples and line number equal to the number of observations.
Warnock, T.T., 1972. Computational investigations of low discrepancy point sets. In: Zaremba, S.K. (Ed.), Applications of Number Theory to Numerical Analysis. Academic Press, pp. 319-343.
Chiu, S.N. and K.I. Liu. Generalized Cramér–von Mises goodness-of-fit tests for multivariate distributions. Computational Statistics & Data Analysis, 53(11):3817-3834, 2009.
Other "test statistics":
AD.stat(),
KSa.stat(),
biKS.stat()
set.seed(102015) u <- matrix(runif(50), ncol=2) plot(u) CvM.stat(u) #[1] 0.1061887 x <- MASS::mvrnorm(n = 200, rep(0, 2), diag(1,2,2)) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) CvM.stat(u) u <- 1/array(9:4, c(6,2)) CvM.stat(u)set.seed(102015) u <- matrix(runif(50), ncol=2) plot(u) CvM.stat(u) #[1] 0.1061887 x <- MASS::mvrnorm(n = 200, rep(0, 2), diag(1,2,2)) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) CvM.stat(u) u <- 1/array(9:4, c(6,2)) CvM.stat(u)
KS.cval returns the critical values of for the Kolmogorov–Smirnov test statistic.
KS.cval(n, prob)KS.cval(n, prob)
n |
Sample size. |
prob |
Quantile. |
KS.quantiles returns quantiles for the Kolmogorov–Smirnov test.
KS.quantiles( alphas = c(0.25, 0.2, 0.15, 0.1, 0.05, 0.025, 0.01, 0.005, 0.0025, 0.001) )KS.quantiles( alphas = c(0.25, 0.2, 0.15, 0.1, 0.05, 0.025, 0.01, 0.005, 0.0025, 0.001) )
alphas |
Alpha levels for which the quantiles should be returned. |
rtrafo for Rosenblatt's transformation on elliptical and Archimedean copulas.
x <- matrix(rnorm(50), nrow=25, ncol=2)x <- matrix(rnorm(50), nrow=25, ncol=2)
KSa.stat returns the value of the approximative Kolmogorov-Smirnov test statistic
KSa.stat(u)KSa.stat(u)
u |
Data set. Coloumn size equals the dimension of the samples and the line number equals the number of observations. |
The statistic used by this function from
Justel, A.; Peña, D.; Zamar, R. (1997). "A multivariate Kolmogorov–Smirnov test of goodness of fit". Statistics & Probability Letters 35 (3): 251–259. doi:10.1016/S0167-7152(97)00020-5.
as an approximation of the Kolmogorov-Smirnov test statistic.
Justel, A, D. Pe~na and R. Zamar, 1997. A multivariate Kolmogorov-Smirnov test of goodness of fit. Statistics & Probability Letters 35, 251-259.
Chiu, S.N. and K.I. Liu. Generalized Cramér–von Mises goodness-of-fit tests for multivariate distributions. Computational Statistics & Data Analysis, 53(11):3817-3834, 2009.
Other "test statistics":
AD.stat(),
CvM.stat(),
biKS.stat()
set.seed(102015) u <- matrix(runif(50), ncol=2) plot(u) CvM.stat(u) #[1] 0.006029274 KSa.stat(u) #[1] 0.252292 x <- MASS::mvrnorm(n = 200, rep(0, 2), diag(1,2,2)) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) CvM.stat(u) #[1] 0.002172406 biKS.stat(u) #[1] 0.1058533 u <- 1/array(9:4, c(6,2)) CvM.stat(u) KSa.stat(u)set.seed(102015) u <- matrix(runif(50), ncol=2) plot(u) CvM.stat(u) #[1] 0.006029274 KSa.stat(u) #[1] 0.252292 x <- MASS::mvrnorm(n = 200, rep(0, 2), diag(1,2,2)) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) CvM.stat(u) #[1] 0.002172406 biKS.stat(u) #[1] 0.1058533 u <- 1/array(9:4, c(6,2)) CvM.stat(u) KSa.stat(u)
Multivariate versions of the popular EDF statistics by Cramér-von Mises, Anderson-Darling and Kolmogorov-Smirnov are supported.
AD.stat: Anderson-Darling
CvM.stat: Cramér-von Mises
biKS.stat: bivariate Kolmogorov-Smirnov
rosenblatt.norm Rosenblatt's transformation for normally distributed data
Maintainer: Andreas Maendle [email protected]
# Evaluate percentiles of the distribution by Monte-Carlo simulation # sampling from independent uniforms 0-1 for F_0 specified completely by H_0 n <- 10 # e.g. sample size 10 set.seed(14072016) sims <- stat.MC(n,100,biKS.stat) # 100 MC simulations (usually larger) # now quantiles can be estimated quantile(sims) # critival value for sample size 48 and alpha=0.05 KS.cval(48,0.05) # for the Lilliefors statistic: n <- 10 # e.g. sample size 10 set.seed(14072016) sims <- stat.MC(n,100,biKS.stat) # 100 MC simulations (usually larger) #evaluate empirical power of statistics #set parameters of true distribution (mu0 <- rep(0,2)) (Sigma0 <- diag(0.5,2,2)+matrix(0.5,2,2)) #set parameters for alternative distribution nMC <- 1000 # number of MC simulations n <- 15 # sample size eps <- 0.4 #c(0.1,0.2,0.4) mu <- c(3,3) #mu <- c(3,-1) tmp.KS.n15.sims <- rep(NA, nMC) tmp.AD.n15.sims <- rep(NA, nMC) tmp.CvM.n15.sims <- rep(NA, nMC) set.seed(15072016) for (i in 1:nMC) { rbts <- rosenblatt.norm( ((1-eps)*MASS::mvrnorm(n, mu0, Sigma0) + eps*MASS::mvrnorm(n, mu, Sigma0)) ,mu0,Sigma0) tmp.KS.n15.sims[i] <- biKS.stat( rbts ) tmp.AD.n15.sims[i] <- AD.stat( rbts ) tmp.CvM.n15.sims[i] <- CvM.stat( rbts ) } #length(t1.KS.n15.sims[t1.KS.n15.sims>KS.cval(15,0.05)])/length(t1.KS.n15.sims) #length(t2.KS.n15.sims[t2.KS.n15.sims>KS.cval(15,0.05)])/length(t2.KS.n15.sims) #length(t3.KS.n15.sims[t3.KS.n15.sims>KS.cval(15,0.05)])/length(t3.KS.n15.sims) #t1: mu <- c(3,3), eps <- 0.1 #t2: mu <- c(3,3), eps <- 0.2 #t3: mu <- c(3,3), eps <- 0.4# Evaluate percentiles of the distribution by Monte-Carlo simulation # sampling from independent uniforms 0-1 for F_0 specified completely by H_0 n <- 10 # e.g. sample size 10 set.seed(14072016) sims <- stat.MC(n,100,biKS.stat) # 100 MC simulations (usually larger) # now quantiles can be estimated quantile(sims) # critival value for sample size 48 and alpha=0.05 KS.cval(48,0.05) # for the Lilliefors statistic: n <- 10 # e.g. sample size 10 set.seed(14072016) sims <- stat.MC(n,100,biKS.stat) # 100 MC simulations (usually larger) #evaluate empirical power of statistics #set parameters of true distribution (mu0 <- rep(0,2)) (Sigma0 <- diag(0.5,2,2)+matrix(0.5,2,2)) #set parameters for alternative distribution nMC <- 1000 # number of MC simulations n <- 15 # sample size eps <- 0.4 #c(0.1,0.2,0.4) mu <- c(3,3) #mu <- c(3,-1) tmp.KS.n15.sims <- rep(NA, nMC) tmp.AD.n15.sims <- rep(NA, nMC) tmp.CvM.n15.sims <- rep(NA, nMC) set.seed(15072016) for (i in 1:nMC) { rbts <- rosenblatt.norm( ((1-eps)*MASS::mvrnorm(n, mu0, Sigma0) + eps*MASS::mvrnorm(n, mu, Sigma0)) ,mu0,Sigma0) tmp.KS.n15.sims[i] <- biKS.stat( rbts ) tmp.AD.n15.sims[i] <- AD.stat( rbts ) tmp.CvM.n15.sims[i] <- CvM.stat( rbts ) } #length(t1.KS.n15.sims[t1.KS.n15.sims>KS.cval(15,0.05)])/length(t1.KS.n15.sims) #length(t2.KS.n15.sims[t2.KS.n15.sims>KS.cval(15,0.05)])/length(t2.KS.n15.sims) #length(t3.KS.n15.sims[t3.KS.n15.sims>KS.cval(15,0.05)])/length(t3.KS.n15.sims) #t1: mu <- c(3,3), eps <- 0.1 #t2: mu <- c(3,3), eps <- 0.2 #t3: mu <- c(3,3), eps <- 0.4
rosenblatt.norm returns Rosenblatt's transformation of normally distributed vectors which are the lines of the array x, with mean vector Mu and correlation matrix Sigma.
rosenblatt.norm(x, Mu, Sigma)rosenblatt.norm(x, Mu, Sigma)
x |
Array containing the data which have to be transformed. Each line represents a data vector with the coloumn number as dimension. |
Mu |
Mean vector. |
Sigma |
Correlation matrix. |
rtrafo for Rosenblatt's transformation on elliptical and Archimedean copulas.
x <- matrix(rnorm(50), nrow=25, ncol=2) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) x <- MASS::mvrnorm(n = 500, rep(0, 2), diag(1,2,2)) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) # this example requires the gumbel package: if (requireNamespace("gumbel", quietly = TRUE)) { set.seed(5) library("gumbel") gdata <- qlnorm(rgumbel(12,alpha=2,dim=5)) pairs(gdata) sigma2 <- matrix((1/sqrt(2)),nrow=5,ncol=5)-diag(((1/sqrt(2))-1),5) sigma2 ndata <- qnorm(plnorm(gdata)) pairs(ndata) rbdata <- rosenblatt.norm(ndata, rep(0,5),sigma2) pairs(rbdata) AD.stat(rbdata) }x <- matrix(rnorm(50), nrow=25, ncol=2) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) x <- MASS::mvrnorm(n = 500, rep(0, 2), diag(1,2,2)) plot(x) u <- rosenblatt.norm(x, rep(0,2), diag(1,2,2)) plot(u) # this example requires the gumbel package: if (requireNamespace("gumbel", quietly = TRUE)) { set.seed(5) library("gumbel") gdata <- qlnorm(rgumbel(12,alpha=2,dim=5)) pairs(gdata) sigma2 <- matrix((1/sqrt(2)),nrow=5,ncol=5)-diag(((1/sqrt(2))-1),5) sigma2 ndata <- qnorm(plnorm(gdata)) pairs(ndata) rbdata <- rosenblatt.norm(ndata, rep(0,5),sigma2) pairs(rbdata) AD.stat(rbdata) }
func
stat.MC returns the simulated values of function func.
stat.MC(n, rp, func)stat.MC(n, rp, func)
n |
Sample size. |
rp |
Number of Monte Carlo simulations. |
func |
Function which evaluates the test statistic. |
KS.n12test.sims <- stat.MC(12,100,biKS.stat) # now quantiles can be estimated quantile(KS.n12test.sims,probs=(1-c(0.01,0.05,0.1,0.25)))KS.n12test.sims <- stat.MC(12,100,biKS.stat) # now quantiles can be estimated quantile(KS.n12test.sims,probs=(1-c(0.01,0.05,0.1,0.25)))