Title: | Variable Selection using Shrinkage Priors |
---|---|
Description: | Bayesian variable selection using shrinkage priors to identify significant variables in high-dimensional datasets. The package includes methods for determining the number of significant variables through innovative clustering techniques of posterior distributions, specifically utilizing the 2-Means and Sequential 2-Means (S2M) approaches. The package aims to simplify the variable selection process with minimal tuning required in statistical analysis. |
Authors: | Nilson Chapagain [aut, cre] |
Maintainer: | Nilson Chapagain <[email protected]> |
License: | GPL (>= 3) |
Version: | 1.0.0 |
Built: | 2025-02-23 05:02:35 UTC |
Source: | https://github.com/nilson01/vsusp-variable-selection-using-shrinkage-priors |
Variable selection using shrinkage priors :: numNoiseCoeff
numNoiseCoeff(Beta.i, b.i_r)
numNoiseCoeff(Beta.i, b.i_r)
Beta.i |
N by p matrix consisting of N posterior samples of p variables |
b.i_r |
tuning parameter value from Sequential 2-means (S2M) variable selection algorithm. |
number of noise coefficients of numeric data type
OptimalHbi function will take b.i and H.b.i as input which comes from the result of TwoMeans function. It will return plot from which you can infer about H: the optimal value of the tuning parameter.
OptimalHbi(bi, Hbi)
OptimalHbi(bi, Hbi)
bi |
a vector holding the values of the tuning parameter specified by the user |
Hbi |
The estimated number of signals corresponding to each b.i of numeric data type |
the optimal value (numeric) of tuning parameter and the associated H value
Makalic, E. & Schmidt, D. F. High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649, 2016
Li, H., & Pati, D. Variable selection using shrinkage priors Computational Statistics & Data Analysis, 107, 107-119.
n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) df <- data.frame(X, Y) rv.hs <- bayesreg::bayesreg(Y ~ ., df, "gaussian", "horseshoe+", 110, 100) Beta <- t(rv.hs$beta) lower <- 0 upper <- 1 l <- 5 S2Mbeta <- Sequential2MeansBeta(Beta, lower, upper, l) bi <- S2Mbeta$b.i Hbi <- S2Mbeta$H.b.i OptimalHbi(bi, Hbi)
n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) df <- data.frame(X, Y) rv.hs <- bayesreg::bayesreg(Y ~ ., df, "gaussian", "horseshoe+", 110, 100) Beta <- t(rv.hs$beta) lower <- 0 upper <- 1 l <- 5 S2Mbeta <- Sequential2MeansBeta(Beta, lower, upper, l) bi <- S2Mbeta$b.i Hbi <- S2Mbeta$H.b.i OptimalHbi(bi, Hbi)
S2MVarSelection function will take S2M: a list obtained from the 2Means.variables function and H: the estimated number of signals obtained from the optimal.H.b.i function. This will give out the important subset of variables for the Gaussian Linear model.
S2MVarSelection(Beta, H = 5)
S2MVarSelection(Beta, H = 5)
Beta |
matrix consisting of N posterior samples of p variables that is known either to user or from Sequential2Means function |
H |
Estimated number of signals obtained from the optimal.b.i function of numeric data type |
a vector containing indices of important subset of variables of dimension H X 1.
Makalic, E. & Schmidt, D. F. High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649, 2016
Li, H., & Pati, D. Variable selection using shrinkage priors Computational Statistics & Data Analysis, 107, 107-119.
n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) df <- data.frame(X, Y) # Fit a model using gaussian horseshoe+ for 200 samples # # recommended n.samples is 5000 and burning is 2000 rv.hs <- bayesreg::bayesreg(Y ~ ., df, "gaussian", "horseshoe+", 110, 100) Beta <- rv.hs$beta H <- 3 impVariablesGLM <- S2MVarSelection(Beta, H) impVariablesGLM
n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) df <- data.frame(X, Y) # Fit a model using gaussian horseshoe+ for 200 samples # # recommended n.samples is 5000 and burning is 2000 rv.hs <- bayesreg::bayesreg(Y ~ ., df, "gaussian", "horseshoe+", 110, 100) Beta <- rv.hs$beta H <- 3 impVariablesGLM <- S2MVarSelection(Beta, H) impVariablesGLM
S2MVarSelectionV1 function will take S2M: a list obtained from the 2Means.variables function and H: the estimated number of signals obtained from the optimal.b.i function. This will give out the important subset of variables for the Gaussian Linear model.
S2MVarSelectionV1(S2M, H = 5)
S2MVarSelectionV1(S2M, H = 5)
S2M |
List obtained from the 2Means.variables function |
H |
Estimated number (numeric) of signals, obtained from the optimal.b.i function (default = newValue) |
a vector of indices of important subset of variables for the Gaussian Linear modelof shape H X 1
Sequential2Means function will take as input X: design matrix, Y : response vector, t: vector of tuning parameter values from Sequential 2-means (S2M) variable selection algorithm. The function will return a list S2M which will hold p: the total number of variables, b.i: the values of the tuning parameter, H.b.i : the estimated number of signals corresponding to each b.i, abs.post.median: medians of the absolute values of the posterior samples.
Sequential2Means( X, Y, b.i, prior = "horseshoe+", n.samples = 5000, burnin = 2000 )
Sequential2Means( X, Y, b.i, prior = "horseshoe+", n.samples = 5000, burnin = 2000 )
X |
Design matrix of dimension n X p, where n = total data points and p = total number of features |
Y |
Response vector of dimension n X 1 |
b.i |
Vector of tuning parameter values from Sequential 2-means (S2M) variable selection algorithm of dimension specified by user. |
prior |
Shrinkage prior distribution over the Beta. Available options are ridge regression: prior="rr" or prior="ridge", lasso regression: prior="lasso", horseshoe regression: prior="hs" or prior="horseshoe", and horseshoe+ regression : prior="hs+" or prior="horseshoe+" ( String data type) |
n.samples |
Number of posterior samples to generate of numeric data type |
burnin |
Number of burn-in samples of numeric data type |
A list S2M which will hold Beta, b.i, and H.b.i.
Beta |
N by p matrix consisting of N posterior samples of p variables |
b.i |
the user specified vector holding the tuning parameter values |
H.b.i |
the estimated number of signals of numeric data type corresponding to each b.i |
Makalic, E. & Schmidt, D. F. High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649, 2016
Li, H., & Pati, D. Variable selection using shrinkage priors Computational Statistics & Data Analysis, 107, 107-119.
# ----------------------------------------------------------------- # Example 1: Gaussian Model and Horseshoe prior n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) b.i <- seq(0, 1, 0.05) # Sequential2Means with horseshoe+ using gibbs sampling # recommended n.samples is 5000 and burning is 2000 S2M <- Sequential2Means(X, Y, b.i, "horseshoe+", 110, 100) Beta <- S2M$Beta H.b.i <- S2M$H.b.i # ----------------------------------------------------------------- # Example 2: Gaussian Model and ridge prior n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) b.i <- seq(0, 1, 0.05) # Sequential2Means with ridge regression using gibbs sampling # recommended n.samples is 5000 and burning is 2000 S2M <- Sequential2Means(X, Y, b.i, "ridge", 110, 100) Beta <- S2M$Beta H.b.i <- S2M$H.b.i
# ----------------------------------------------------------------- # Example 1: Gaussian Model and Horseshoe prior n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) b.i <- seq(0, 1, 0.05) # Sequential2Means with horseshoe+ using gibbs sampling # recommended n.samples is 5000 and burning is 2000 S2M <- Sequential2Means(X, Y, b.i, "horseshoe+", 110, 100) Beta <- S2M$Beta H.b.i <- S2M$H.b.i # ----------------------------------------------------------------- # Example 2: Gaussian Model and ridge prior n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) b.i <- seq(0, 1, 0.05) # Sequential2Means with ridge regression using gibbs sampling # recommended n.samples is 5000 and burning is 2000 S2M <- Sequential2Means(X, Y, b.i, "ridge", 110, 100) Beta <- S2M$Beta H.b.i <- S2M$H.b.i
Sequential2MeansBeta function will take as input Beta : N by p matrix consisting of N posterior samples of p variables, lower : the lower bound of the chosen values of the tuning parameter, upper : the upper bound of the chosen values of the tuning parameter, and l :the number of chosen values of the tuning parameter. The function will return a list S2M which will hold p: the total number of variables, b.i: the values of the tuning parameter, H.b.i : the estimated number of signals corresponding to each b.i, abs.post.median: medians of the absolute values of the posterior samples.
Sequential2MeansBeta(Beta, lower, upper, l)
Sequential2MeansBeta(Beta, lower, upper, l)
Beta |
N by p matrix consisting of N posterior samples of p variables |
lower |
the lower bound of the chosen values of the tuning parameter of numeric data type. |
upper |
the upper bound of the chosen values of the tuning parameter of numeric data type. |
l |
the number of chosen values of the tuning parameter of numeric data type. |
A list S2M which will hold p, b.i, and H.b.i:
p |
total number of variables in the model |
b.i |
the vector values of the tuning parameter specified by the user |
H.b.i |
the estimated number of signals corresponding to each b.i of numeric data type |
Makalic, E. & Schmidt, D. F. High-Dimensional Bayesian Regularised Regression with the BayesReg Package arXiv:1611.06649, 2016
Li, H., & Pati, D. Variable selection using shrinkage priors Computational Statistics & Data Analysis, 107, 107-119.
# ----------------------------------------------------------------- # Example 1: Gaussian Model and Horseshoe prior n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) df <- data.frame(X, Y) # beta samples for gaussian model using horseshow prior and gibbs sampling rv.hs <- bayesreg::bayesreg(Y ~ ., df, "gaussian", "horseshoe+", 110, 100) Beta <- t(rv.hs$beta) lower <- 0 upper <- 1 l <- 20 S2Mbeta <- Sequential2MeansBeta(Beta, lower, upper, l) H.b.i <- S2Mbeta$H.b.i # ----------------------------------------------------------------- # Example 2: normal model and lasso prior #' n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) df <- data.frame(X, Y) rv.hs <- bayesreg::bayesreg(Y ~ ., df, "normal", "lasso", 150, 100) Beta <- t(rv.hs$beta) lower <- 0 upper <- 1 l <- 15 S2Mbeta <- Sequential2MeansBeta(Beta, lower, upper, l) H.b.i <- S2Mbeta$H.b.i
# ----------------------------------------------------------------- # Example 1: Gaussian Model and Horseshoe prior n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) df <- data.frame(X, Y) # beta samples for gaussian model using horseshow prior and gibbs sampling rv.hs <- bayesreg::bayesreg(Y ~ ., df, "gaussian", "horseshoe+", 110, 100) Beta <- t(rv.hs$beta) lower <- 0 upper <- 1 l <- 20 S2Mbeta <- Sequential2MeansBeta(Beta, lower, upper, l) H.b.i <- S2Mbeta$H.b.i # ----------------------------------------------------------------- # Example 2: normal model and lasso prior #' n <- 10 p <- 5 X <- matrix(rnorm(n * p), n, p) beta <- exp(rnorm(p)) Y <- as.vector(X %*% beta + rnorm(n, 0, 1)) df <- data.frame(X, Y) rv.hs <- bayesreg::bayesreg(Y ~ ., df, "normal", "lasso", 150, 100) Beta <- t(rv.hs$beta) lower <- 0 upper <- 1 l <- 15 S2Mbeta <- Sequential2MeansBeta(Beta, lower, upper, l) H.b.i <- S2Mbeta$H.b.i