--- title: "Introduction to 'rg.test'" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to 'rg.test'} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ```{r setup} library(rgTest) ``` ## Working with rgTest ### Get example data ```{r} set.seed(100) d=200 vmu = rep(1.1/sqrt(d),d) vsd = c(rep(1.1, d/5), rep(1, d-d/5)) num1 = 100 num2 = 100 s1 = matrix(0,num1,d) # sample 1 s2 = matrix(0,num2,d) # sample 2 for (i in 1:num1) { s1[i,] = rnorm(d) } for (i in 1:(num2)) { s2[i,] = rnorm(d, mean = vmu, sd = vsd) } num1 = nrow(s1) # number of observations in sample 1 num2 = nrow(s2) # number of observations in sample 2 ``` #### Get an overview of the data. The data of both samples have 200 variables. We take a look at the matrix of scatterplots of the first five variables for the two samples. ```{r, fig.width = 8, fig.height = 8} plot_dat = cbind(as.data.frame(rbind(s1[,1:5], s2[,1:5])), label = rep(c('sample 1', 'sample 2'), each = 100)) my_cols = c("#00AFBB", "#E7B800") pairs(plot_dat[, 1:5], col = my_cols[as.factor(plot_dat$label)]) ``` Even though we know the observations from two samples are generated from different distribution, it is hard to tell the differnce by looking at the scatterplots. ### Graph-based two-sample test #### Use data matrices ```{r} res1 = rg.test(data.X = s1, data.Y = s2, n1 = num1, n2 = num2, k = 5, weigh.fun = weiMax, perm.num = 1000, progress_bar = F) ``` ```{r, echo=FALSE} type = c('robust generalized(asymptotic)', 'robust max-type(asymptotic)', 'robust generalized(permutation)', 'robust max-type(permutation)') test.statistic = c(res1$asy.gen.statistic, res1$asy.max.statistic, NA, NA) p.value = c(res1$asy.gen.pval, res1$asy.max.pval, res1$perm.gen.pval, res1$perm.max.pval) res_tbl = as.data.frame(cbind(type, test.statistic, p.value)) knitr::kable(res_tbl, col.names = gsub("[.]", " ", names(res_tbl))) ``` #### Use the distance matrix ```{r} data = rbind(s1, s2) dist = dist(as.matrix(data)) res2 = rg.test(dis = dist, n1 = num1, n2 = num2, k = 5, weigh.fun = weiMax, perm.num = 1000) ``` ```{r, echo=FALSE} type = c('robust generalized(asymptotic)', 'robust max-type(asymptotic)', 'robust generalized(permutation)', 'robust max-type(permutation)') test.statistic = c(res2$asy.gen.statistic, res2$asy.max.statistic, NA, NA) p.value = c(res2$asy.gen.pval, res2$asy.max.pval, res2$perm.gen.pval, res2$perm.max.pval) res_tbl = as.data.frame(cbind(type, test.statistic, p.value)) knitr::kable(res_tbl, col.names = gsub("[.]", " ", names(res_tbl))) ``` #### Use the edge matrix ```{r} E = kmst(dis=dist, k=5) res3 = rg.test(E = E, n1 = num1, n2 = num2, weigh.fun = weiMax, perm.num = 1000) ``` ```{r, echo=FALSE} type = c('robust generalized(asymptotic)', 'robust max-type(asymptotic)', 'robust generalized(permutation)', 'robust max-type(permutation)') test.statistic = c(res3$asy.gen.statistic, res3$asy.max.statistic, NA, NA) p.value = c(res3$asy.gen.pval, res3$asy.max.pval, res3$perm.gen.pval, res3$perm.max.pval) res_tbl = as.data.frame(cbind(type, test.statistic, p.value)) knitr::kable(res_tbl, col.names = gsub("[.]", " ", names(res_tbl))) ``` The two-sample test is done. We can see the asymptotic results are the same by using the data matrices, the distance matrix or the edge matrix generated by 5-MST. The p-values based on the permutation method are similar to those based on asymptotic method.