Title: | High-Dimensional Metrics |
---|---|
Description: | Implementation of selected high-dimensional statistical and econometric methods for estimation and inference. Efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/ structural parameters are provided which appear in high-dimensional approximately sparse models. Including functions for fitting heteroscedastic robust Lasso regressions with non-Gaussian errors and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference and rely on a theoretically grounded, data-driven choice of the penalty. Chernozhukov, Hansen, Spindler (2016) <arXiv:1603.01700>. |
Authors: | Martin Spindler [cre, aut], Victor Chernozhukov [aut], Christian Hansen [aut], Philipp Bach [ctb] |
Maintainer: | Martin Spindler <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.1 |
Built: | 2025-02-13 04:29:06 UTC |
Source: | https://github.com/martinspindler/hdm |
This package implements methods for estimation and inference in a high-dimensional setting.
Package: | hdm |
Type: | Package |
Version: | 0.1 |
Date: | 2015-05-25 |
License: | GPL-3 |
This package provides efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/structural parameters appearing in high-dimensional approximately sparse models. The package includes functions for fitting heteroskedastic robust Lasso regressions with non-Gaussian erros and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference. Moreover, a theoretically grounded, data-driven choice of the penalty level is provided.
Victor Chernozhukov, Christian Hansen, Martin Spindler
Maintainer: Martin Spindler <[email protected]>
A. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369-2429.
A. Belloni, V. Chernozhukov and C. Hansen (2013). Inference for high-dimensional sparse econometric models. In Advances in Economics and Econometrics: 10th World Congress, Vol. 3: Econometrics, Cambirdge University Press: Cambridge, 245-295.
A. Belloni, V. Chernozhukov, C. Hansen (2014). Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies 81(2), 608-650.
Dataset on settler mortality.
Settler mortality
logarithm of Mort
Latitude
Latitude^2
Africa
Asia
North America
South America
Neo-Europes
GDP
Average protection against expropriation risk
Data set was analysed in Acemoglu et al. (2001). A detailed description of the data can be found at http://economics.mit.edu/faculty/acemoglu/data/ajr2001
D. Acemoglu, S. Johnson, J. A. Robinson (2001). Colonial origins of comparative development: an empirical investigation. American Economic Review, 91, 1369–1401.
data(AJR)
data(AJR)
Automobile data set from the US.
model name
model id
firm id
cdid
id
log price
miles per gallon
miles per dollar
horse power per weight
air conditioning (binary variable)
size of the car
market share
share s0
outcome variable defined as log(share) - log(outshr)
time trend
Data set was analysed in Berry, Levinsohn and Pakes (1995). The data stem from annual issues of the Automotive News Market Data Book.
The data set inlcudes information on all models marketed during the the period beginning 1971 and ending in 1990 cotaining 2217 model/years from 997 distinct models.
A detailed description is given in BLP (1995, 868–871). The internal function constructIV
constructs instrumental variables along the lines described and used in BLP (1995).
S. Berry, J. Levinsohn, A. Pakes (1995). Automobile Prices in Market EquilibriumD. Econometrica, 63(4), 841–890.
data(BLP)
data(BLP)
rlassoEffects
Method to extract coefficients from objects of class rlassoEffects
## S3 method for class 'rlassoEffects' coef( object, complete = TRUE, selection.matrix = FALSE, include.targets = FALSE, ... )
## S3 method for class 'rlassoEffects' coef( object, complete = TRUE, selection.matrix = FALSE, include.targets = FALSE, ... )
object |
an object of class |
complete |
general option of the function |
selection.matrix |
if TRUE, a selection matrix is returned that indicates the selected variables from each auxiliary regression. Default is set to FALSE. |
include.targets |
if FALSE (by default) only the selected control variables are listed in the |
... |
further arguments passed to functions coef or print. |
Printing coefficients and selection matrix for S3 object rlassoEffects
. Interpretation of entries in the selection matrix
"-"
indicates a target variable,
"x"
indicates that a variable has been selected with rlassoEffects (coefficient is different from zero),
"."
indicates that a variable has been de-selected with rlassoEffects (coefficient is zero).
library(hdm) set.seed(1) n = 100 #sample size p = 100 # number of variables s = 7 # number of non-zero variables X = matrix(rnorm(n*p), ncol=p) colnames(X) <- paste("X", 1:p, sep="") beta = c(rep(3,s), rep(0,p-s)) y = 1 + X%*%beta + rnorm(n) data = data.frame(cbind(y,X)) colnames(data)[1] <- "y" lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50), method = "double selection") coef(lasso.effect) # standard use of coef() - without selection matrix # with selection matrix coef(lasso.effect, selection.matrix = TRUE) # prettier output with print_coef (identical options as coef()) print_coef(lasso.effect, selection.matrix = TRUE)
library(hdm) set.seed(1) n = 100 #sample size p = 100 # number of variables s = 7 # number of non-zero variables X = matrix(rnorm(n*p), ncol=p) colnames(X) <- paste("X", 1:p, sep="") beta = c(rep(3,s), rep(0,p-s)) y = 1 + X%*%beta + rnorm(n) data = data.frame(cbind(y,X)) colnames(data)[1] <- "y" lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50), method = "double selection") coef(lasso.effect) # standard use of coef() - without selection matrix # with selection matrix coef(lasso.effect, selection.matrix = TRUE) # prettier output with print_coef (identical options as coef()) print_coef(lasso.effect, selection.matrix = TRUE)
rlassoIV
Method to extract coefficients from objects of class rlassoIV
.
## S3 method for class 'rlassoIV' coef(object, complete = TRUE, selection.matrix = FALSE, ...)
## S3 method for class 'rlassoIV' coef(object, complete = TRUE, selection.matrix = FALSE, ...)
object |
an object of class |
complete |
general option of the function |
selection.matrix |
if TRUE, a selection matrix is returned that indicates the selected variables from each first stage regression. Default is set to FALSE. See section on details for more information. |
... |
further arguments passed to function coef. |
Printing coefficients and selection matrix for S3 object rlassoIV
. "x"
indicates that a variable has been selected, i.e., the corresponding estimated coefficient is different from zero.
The very last column collects all variables that have been selected in at least one of the lasso regressions represented in the selection.matrix
.
rlassoIV
performs three lasso regression steps. A first stage lasso regression of the endogenous treatment variable d
on the instruments z
and exogenous covariates x
,
a lasso regression of y
on the exogenous variables x
, and a lasso regression of the instrumented treatment variable, i.e., a regression of the predicted values of d
, on controls x
.
Coefficients obtained from rlassoIV
by default. If option selection.matrix
is TRUE
, a list is returned with final coefficients, a matrix selection.matrix
, and a matrix selection.matrixZ
:
selection.matrix
contains the selection index for the lasso regression of y
on x
(first column) and the lasso regression of the predicted values of d
on x
together with the union of these indizes.
selection.matrixZ
contains the selection index from the first-stage lasso regression of d
on z
and x
.
## Not run: data(EminentDomain) z <- EminentDomain$logGDP$z # instruments x <- EminentDomain$logGDP$x # exogenous variables y <- EminentDomain$logGDP$y # outcome varialbe d <- EminentDomain$logGDP$d # treatment / endogenous variable lasso.IV = rlassoIV(x=x, d=d, y=y, z=z, select.X=TRUE, select.Z=TRUE) coef(lasso.IV) # default behavior coef(lasso.IV, selection.matrix = T) # print selection matrix ## End(Not run)
## Not run: data(EminentDomain) z <- EminentDomain$logGDP$z # instruments x <- EminentDomain$logGDP$x # exogenous variables y <- EminentDomain$logGDP$y # outcome varialbe d <- EminentDomain$logGDP$d # treatment / endogenous variable lasso.IV = rlassoIV(x=x, d=d, y=y, z=z, select.X=TRUE, select.Z=TRUE) coef(lasso.IV) # default behavior coef(lasso.IV, selection.matrix = T) # print selection matrix ## End(Not run)
rlassoIVselectX
Method to extract coefficients and selection matrix from objects of class rlassoIVselectX
.
## S3 method for class 'rlassoIVselectX' coef(object, complete = TRUE, selection.matrix = FALSE, ...)
## S3 method for class 'rlassoIVselectX' coef(object, complete = TRUE, selection.matrix = FALSE, ...)
object |
an object of class |
complete |
general option of the function |
selection.matrix |
if TRUE, a selection matrix is returned that indicates the selected variables from each regression. Default is set to FALSE. See section on details for more information. |
... |
further arguments passed to functions coef. |
Printing coefficients and selection matrix for S3 object rlassoIVselectX
. The first column of the selection matrix reports the selection index for the lasso regression of y
on x
in the specified
rlassoIVselectX
command. "x"
indicates that a variable has been selected, i.e., the corresponding estimated coefficient is different from zero.
The second column contains the selection index for the lasso regression of d
on x
and the remaining columns
the index of selected variables x
for the instruments z
. The very last column collects all variables that have been selected in at least one of the lasso regressions.
## Not run: library(hdm) data(AJR); y = AJR$GDP; d = AJR$Exprop; z = AJR$logMort x = model.matrix(~ -1 + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2, data=AJR) AJR.Xselect = rlassoIV(GDP ~ Exprop + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2 | logMort + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2, data=AJR, select.X=TRUE, select.Z=FALSE) coef(AJR.Xselect) # Default behavior coef(AJR.Xselect, selection.matrix = TRUE) # print selection matrix ## End(Not run)
## Not run: library(hdm) data(AJR); y = AJR$GDP; d = AJR$Exprop; z = AJR$logMort x = model.matrix(~ -1 + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2, data=AJR) AJR.Xselect = rlassoIV(GDP ~ Exprop + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2 | logMort + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2, data=AJR, select.X=TRUE, select.Z=FALSE) coef(AJR.Xselect) # Default behavior coef(AJR.Xselect, selection.matrix = TRUE) # print selection matrix ## End(Not run)
rlassoIVselectZ
Method to extract coefficients from objects of class rlassoIVselectZ
.
## S3 method for class 'rlassoIVselectZ' coef(object, complete = TRUE, selection.matrix = FALSE, ...)
## S3 method for class 'rlassoIVselectZ' coef(object, complete = TRUE, selection.matrix = FALSE, ...)
object |
an object of class |
complete |
general option of the function |
selection.matrix |
if TRUE, a selection matrix is returned that indicates the selected variables from each first stage regression. Default is set to FALSE. See section on details for more information. |
... |
further arguments passed to functions coef. |
Printing coefficients and selection matrix for S3 object rlassoIVselectZ
. The columns of the selection matrix report the selection index for the first stage lasso regressions as specified
rlassoIVselectZ
command, i.e., the selected variables for each of the endogenous variables. "x"
indicates that a variable has been selected, i.e., the corresponding estimated coefficient is different from zero.
The very last column collects all variables that have been selected in at least one of the lasso regressions.
## Not run: lasso.IV.Z = rlassoIVselectZ(x=x, d=d, y=y, z=z) data(EminentDomain) z <- EminentDomain$logGDP$z # instruments x <- EminentDomain$logGDP$x # exogenous variables y <- EminentDomain$logGDP$y # outcome varialbe d <- EminentDomain$logGDP$d # treatment / endogenous variable lasso.IV.Z = rlassoIVselectZ(x=x, d=d, y=y, z=z) coef(lasso.IV.Z) # Default behavior coef(lasso.IV.Z, selection.matrix = T) ## End(Not run)
## Not run: lasso.IV.Z = rlassoIVselectZ(x=x, d=d, y=y, z=z) data(EminentDomain) z <- EminentDomain$logGDP$z # instruments x <- EminentDomain$logGDP$x # exogenous variables y <- EminentDomain$logGDP$y # outcome varialbe d <- EminentDomain$logGDP$d # treatment / endogenous variable lasso.IV.Z = rlassoIVselectZ(x=x, d=d, y=y, z=z) coef(lasso.IV.Z) # Default behavior coef(lasso.IV.Z, selection.matrix = T) ## End(Not run)
Census data from the US for the year 2012.
log of hourly wage (annual earnings / annual hours)
female indicator
six indicators: widowed, divorced, separated, nevermarried, and married (omitted)
six indicators: hsd08, hsd911, hsg, cg, ad, and sc (omitted)
four indicators: mw, so, we, and ne (omitted)
(max[0, age - years of education - 7]): exp1, exp2 (divided by 100), exp3 (divided by 1000), exp4 (divided by 10000)
March Supplement sampling weight
CPS year
The CPS is a monthly U.S. household survey conducted jointly by the U.S. Census Bureau and the Bureau of Labor Statistics. The data comprise the year 2012. This data set was used in Mulligan and Rubinstein (2008). The sample comprises white non-hipanic, ages 25-54, working full time full year (35+ hours per week at least 50 weeks), exclude living in group quarters, self-employed, military, agricultural, and private household sector, allocated earning, inconsistent report on earnings and employment, missing data.
C. B. Mulligan and Y. Rubinstein (2008). Selection, investment, and women's relative wages over time. The Quarterly Journal of Economics, 1061–1110.
data(BLP)
data(BLP)
Dataset on judicial eminent domain decisions.
economic outcome variable
set of exogenous variables
eminent domain decisions
set of potential instruments
Data set was analyzed in Belloni et al. (2012). They estimate the effect of judicial eminent domain decisions on economic outcomes with instrumental variables (IV) in a setting high a large set of potential IVs. A detailed decription of the data can be found at https://www.econometricsociety.org/publications/econometrica/2012/11/01/sparse-models-and-methods-optimal-instruments-application The data set contains four "sub-data sets" which differ mainly in the dependent variables: repeat-sales FHFA/OFHEO house price index for metro (FHFA) and non-metro (NM) area, the Case-Shiller home price index (CS), and state-level GDP from the Bureau of Economic Analysis - all transformed with the logarithm. The structure of each subdata set is given above. In the data set the following variables and name conventions are used: "numpanelskx_..." is the number of panels with at least k members with the characteristic following the "_". The probability controls (names start with "F_prob_") follow a similar naming convention and give the probability of observing a panel with characteristic given following second "_" given the characteristics of the pool of judges available to be assigned to the case.
Characteristics in the data for the control variables or instruments:
judge reports no religious affiliation
judge's law degree is from a public university
judge reports being a democrat
judge is female
judge is nonwhite (and not black)
judge is black
judge is Jewish
judge is Catholic
baseline religion
belongs to a protestant church
belongs to an evangelical church
judge's undergraduate degree was obtained within state
judge's undergraduate degree was obtained at a public university
judge was elevated from a district court
year dummy (reference category is one year before the earliest year in the data set (excluded))
dummy for the circuit level (reference category excluded)
a dummy for whether there were no cases in that circuit-year
the number of takings appellate decisions
D. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369–2429.
data(EminentDomain)
data(EminentDomain)
Data set of growth compiled by Barro Lee.
Dataframe with the following variables:
dependent variable: national growth rates in GDP per capita for the periods 1965-1975 and 1975-1985
covariates which might influence growth
The data set contains growth data of Barro-Lee. The Barro Lee data consists
of a panel of 138 countries for the period 1960 to 1985. The dependent
variable is national growth rates in GDP per capita for the periods
1965-1975 and 1975-1985. The growth rate in GDP over a period from to
is commonly defined as
. The number of covariates is p=62.
The number of complete observations is 90.
The full data set and further details can be found at http://www.nber.org/pub/barro.lee, http://www.barrolee.com, and, http://www.bristol.ac.uk//Depts//Economics//Growth//barlee.htm.
R.J. Barro, J.W. Lee (1994). Data set for a panel of 139 countries. NBER.
R.J. Barro, X. Sala-i-Martin (1995). Economic Growth. McGrwa-Hill, New York.
data(GrwothData)
data(GrwothData)
This function implements different methods for calculation of the penalization parameter . Further details can be found under rlasso.
lambdaCalculation( penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL, c = 1.1, gamma = 0.1), y = NULL, x = NULL )
lambdaCalculation( penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL, c = 1.1, gamma = 0.1), y = NULL, x = NULL )
penalty |
list with options for the calculation of the penalty.
|
y |
residual which is used for calculation of the variance or the data-dependent loadings |
x |
matrix of regressor variables |
The functions returns a list with the penalty lambda
which is the product of lambda0
and Ups0
. Ups0
denotes either the variance (independent
case) or the data-dependent loadings for the regressors. method
gives the selected method for the calculation.
Implementation of the Shooting Lasso (Fu, 1998) with variable dependent penalization weights.
LassoShooting.fit( x, y, lambda, control = list(maxIter = 1000, optTol = 10^(-5), zeroThreshold = 10^(-6)), XX = NULL, Xy = NULL, beta.start = NULL )
LassoShooting.fit( x, y, lambda, control = list(maxIter = 1000, optTol = 10^(-5), zeroThreshold = 10^(-6)), XX = NULL, Xy = NULL, beta.start = NULL )
x |
matrix of regressor variables ( |
y |
dependent variable (vector or matrix) |
lambda |
vector of length |
control |
list with control parameters: |
XX |
optional, precalculated matrix |
Xy |
optional, precalculated matrix |
beta.start |
start value for beta |
The function implements the Shooting Lasso (Fu, 1998) with variable dependent
penalization. The arguments XX
and Xy
are optional and allow to use precalculated matrices which might improve performance.
coefficients |
estimated coefficients by the Shooting Lasso Algorithm |
coef.list |
matrix of coefficients from each iteration |
num.it |
number of iterations run |
Fu, W. (1998). Penalized regressions: the bridge vs the lasso. Journal of Computational and Graphical Software 7, 397-416.
rlassoEffects
and lm
Multiple hypotheses testing adjustment of p-values from a high-dimensional linear model.
p_adjust(x, ...) ## S3 method for class 'rlassoEffects' p_adjust(x, method = "RW", B = 1000, ...) ## S3 method for class 'lm' p_adjust(x, method = "RW", B = 1000, test.index = NULL, ...)
p_adjust(x, ...) ## S3 method for class 'rlassoEffects' p_adjust(x, method = "RW", B = 1000, ...) ## S3 method for class 'lm' p_adjust(x, method = "RW", B = 1000, test.index = NULL, ...)
x |
an object of S3 class |
... |
further arguments passed on to methods. |
method |
the method of p-value adjustment for multiple testing.
Romano-Wolf stepdown (' |
B |
number of bootstrap repetitions (default 1000). |
test.index |
vector of integers, logicals or variables names indicating
the position of coefficients (integer case), logical vector of length of the
coefficients (TRUE or FALSE) or the coefficient names of x which should be
tested simultaneously (only for S3 class |
Multiple testing adjustment is performed for S3 objects of class
rlassoEffects
and lm
. Implemented methods for multiple testing
adjustment are Romano-Wolf stepdown 'RW
' (default) and the adjustment
methods available in the p.adjust
function of the stats
package,
including the Bonferroni, Bonferroni-Holm, and Benjamini-Hochberg corrections,
see p.adjust.methods
.
Objects of class rlassoEffects
are constructed by
rlassoEffects
.
A matrix with the estimated coefficients and the p-values that are adjusted according to the specified method.
rlassoEffects
: rlassoEffects
.
lm
: lm
.
J.P. Romano, M. Wolf (2005). Exact and approximate stepdown methods for multiple hypothesis testing. Journal of the American Statistical Association, 100(469), 94-108.
J.P. Romano, M. Wolf (2016). Efficient computation of adjusted p-values for resampling-based stepdown multiple testing. Statistics and Probability Letters, (113), 38-40.
A. Belloni, V. Chernozhukov, K. Kato (2015). Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems. Biometrika, 102(1), 77-94.
library(hdm); set.seed(1) n = 100 #sample size p = 25 # number of variables s = 3 # nubmer of non-zero variables X = matrix(rnorm(n*p), ncol=p) colnames(X) <- paste("X", 1:p, sep="") beta = c(rep(3,s), rep(0,p-s)) y = 1 + X%*%beta + rnorm(n) data = data.frame(cbind(y,X)) colnames(data)[1] <- "y" lasso.effect = rlassoEffects(X, y, index=c(1:20)) pvals.lasso.effect = p_adjust(lasso.effect, method = "RW", B = 1000) ols = lm(y ~ -1 + X, data) pvals.ols = p_adjust(ols, method = "RW", B = 1000) pvals.ols = p_adjust(ols, method = "RW", B = 1000, test.index = c(1,2,5)) pvals.ols = p_adjust(ols, method = "RW", B = 1000, test.index = c(rep(TRUE, 5), rep(FALSE, p-5)))
library(hdm); set.seed(1) n = 100 #sample size p = 25 # number of variables s = 3 # nubmer of non-zero variables X = matrix(rnorm(n*p), ncol=p) colnames(X) <- paste("X", 1:p, sep="") beta = c(rep(3,s), rep(0,p-s)) y = 1 + X%*%beta + rnorm(n) data = data.frame(cbind(y,X)) colnames(data)[1] <- "y" lasso.effect = rlassoEffects(X, y, index=c(1:20)) pvals.lasso.effect = p_adjust(lasso.effect, method = "RW", B = 1000) ols = lm(y ~ -1 + X, data) pvals.ols = p_adjust(ols, method = "RW", B = 1000) pvals.ols = p_adjust(ols, method = "RW", B = 1000, test.index = c(1,2,5)) pvals.ols = p_adjust(ols, method = "RW", B = 1000, test.index = c(rep(TRUE, 5), rep(FALSE, p-5)))
Data set on financial wealth and 401(k) plan participation
Dataframe with the following variables (amongst others):
participation in 401(k)
eligibility for 401(k)
401(k) assets
total wealth (in US $)
financial assets (in US $)
net financial assets (in US $)
non-401k financial assets (in US $)
net non-401k financial assets
net non-401(k) assets (in US $)
individual retirement account (IRA)
income (in US $)
age
family size
married
participation in IRA
defined benefit pension
home owner
education (in years)
male
two earners
dummies for education: no high-school, high-school, some college, college
home mortage (in US $)
home equity (in US $)
home value (in US $)
The sample is drawn from the 1991 Survey of Income and Program Participation (SIPP) and consists of 9,915 observations. The observational units are household reference persons aged 25-64 and spouse if present. Households are included in the sample if at least one person is employed and no one is self-employed. The data set was analysed in Chernozhukov and Hansen (2004) and Belloni et al. (2014) where further details can be found. They examine the effects of 401(k) plans on wealth using data from the Survey of Income and Program Participation using 401(k) eligibility as an instrument for 401(k) participation.
V. Chernohukov, C. Hansen (2004). The impact of 401(k) participation on the wealth distribution: An instrumental quantile regression analysis. The Review of Economic and Statistics 86 (3), 735–751.
A. Belloni, V. Chernozhukov, I. Fernandez-Val, and C. Hansen (2014). Program evaluation with high-dimensional data. Working Paper.
data(pension)
data(pension)
rlassologit
Objects of class rlassologit
are constructed by rlassologit
.
print.rlassologit
prints and displays some information about fitted rlassologit
objects.
summary.rlassologit
summarizes information of a fitted rlassologit
object.
predict.rlassologit
predicts values based on a rlassologit
object.
model.matrix.rlassologit
constructs the model matrix of a lasso object.
## S3 method for class 'rlassologit' predict(object, newdata = NULL, type = "response", ...) ## S3 method for class 'rlassologit' model.matrix(object, ...) ## S3 method for class 'rlassologit' print(x, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassologit' summary(object, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassologit' predict(object, newdata = NULL, type = "response", ...) ## S3 method for class 'rlassologit' model.matrix(object, ...) ## S3 method for class 'rlassologit' print(x, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassologit' summary(object, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)
object |
an object of class |
newdata |
new data set for prediction |
type |
type of prediction required. The default ('response) is on the scale of the response variable; the alternative 'link' is on the scale of the linear predictors. |
... |
arguments passed to the print function and other methods |
x |
an object of class |
all |
logical, indicates if coefficients of all variables (TRUE) should be displayed or only the non-zero ones (FALSE) |
digits |
significant digits in printout |
rlassoEffects
Printing coefficients for class rlassoEffects
print_coef(x, ...) ## S3 method for class 'rlassoEffects' print_coef( x, complete = TRUE, selection.matrix = FALSE, include.targets = TRUE, ... )
print_coef(x, ...) ## S3 method for class 'rlassoEffects' print_coef( x, complete = TRUE, selection.matrix = FALSE, include.targets = TRUE, ... )
x |
an object of class |
... |
further arguments passed to functions coef or print. |
complete |
general option of the function |
selection.matrix |
if TRUE, a selection matrix is returned that indicates the selected variables from each auxiliary regression. Default is set to FALSE. |
include.targets |
if FALSE (by default) only the selected control variables are listed in the |
Printing coefficients and selection matrix for S3 object rlassoEffects
library(hdm) set.seed(1) n = 100 #sample size p = 100 # number of variables s = 7 # number of non-zero variables X = matrix(rnorm(n*p), ncol=p) colnames(X) <- paste("X", 1:p, sep="") beta = c(rep(3,s), rep(0,p-s)) y = 1 + X%*%beta + rnorm(n) data = data.frame(cbind(y,X)) colnames(data)[1] <- "y" lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50), method = "double selection") # without target coefficient estimates print_coef(lasso.effect, selection.matrix = TRUE) # with target coefficient estimates print_coef(lasso.effect, selection.matrix = TRUE, targets = TRUE)
library(hdm) set.seed(1) n = 100 #sample size p = 100 # number of variables s = 7 # number of non-zero variables X = matrix(rnorm(n*p), ncol=p) colnames(X) <- paste("X", 1:p, sep="") beta = c(rep(3,s), rep(0,p-s)) y = 1 + X%*%beta + rnorm(n) data = data.frame(cbind(y,X)) colnames(data)[1] <- "y" lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50), method = "double selection") # without target coefficient estimates print_coef(lasso.effect, selection.matrix = TRUE) # with target coefficient estimates print_coef(lasso.effect, selection.matrix = TRUE, targets = TRUE)
rlasso
Objects of class rlasso
are constructed by rlasso
.
print.rlasso
prints and displays some information about fitted rlasso
objects.
summary.rlasso
summarizes information of a fitted rlasso
object.
predict.rlasso
predicts values based on a rlasso
object.
model.matrix.rlasso
constructs the model matrix of a rlasso
object.
## S3 method for class 'rlasso' print(x, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlasso' summary(object, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlasso' model.matrix(object, ...) ## S3 method for class 'rlasso' predict(object, newdata = NULL, ...)
## S3 method for class 'rlasso' print(x, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlasso' summary(object, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlasso' model.matrix(object, ...) ## S3 method for class 'rlasso' predict(object, newdata = NULL, ...)
x |
an object of class |
all |
logical, indicates if coefficients of all variables (TRUE) should be displayed or only the non-zero ones (FALSE) |
digits |
significant digits in printout |
... |
arguments passed to the print function and other methods |
object |
an object of class |
newdata |
new data set for prediction. An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are returned. |
rlassoEffects
Objects of class rlassoEffects
are constructed by rlassoEffects
.
print.rlassoEffects
prints and displays some information about fitted rlassoEffect
objects.
summary.rlassoEffects summarizes information of a fitted rlassoEffect
object and is described at summary.rlassoEffects
.
confint.rlassoEffects
extracts the confidence intervals.
plot.rlassoEffects
plots the estimates with confidence intervals.
## S3 method for class 'rlassoEffects' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoEffects' confint(object, parm, level = 0.95, joint = FALSE, ...) ## S3 method for class 'rlassoEffects' plot( x, joint = FALSE, level = 0.95, main = "", xlab = "coef", ylab = "", xlim = NULL, ... )
## S3 method for class 'rlassoEffects' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoEffects' confint(object, parm, level = 0.95, joint = FALSE, ...) ## S3 method for class 'rlassoEffects' plot( x, joint = FALSE, level = 0.95, main = "", xlab = "coef", ylab = "", xlim = NULL, ... )
x |
an object of class |
digits |
significant digits in printout |
... |
arguments passed to the print function and other methods. |
object |
an object of class |
parm |
a specification of which parameters are to be given confidence intervals among the variables for which inference was done, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
confidence level required |
joint |
logical, if |
main |
an overall title for the plot |
xlab |
a title for the x axis |
ylab |
a title for the y axis |
xlim |
vector of length two giving lower and upper bound of x axis |
rlassoIV
Objects of class rlassoIV
are constructed by rlassoIV
.
print.rlassoIV
prints and displays some information about fitted rlassoIV
objects.
summary.rlassoIV
summarizes information of a fitted rlassoIV
object.
confint.rlassoIV
extracts the confidence intervals.
## S3 method for class 'rlassoIV' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIV' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIV' confint(object, parm, level = 0.95, ...)
## S3 method for class 'rlassoIV' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIV' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIV' confint(object, parm, level = 0.95, ...)
x |
an object of class |
digits |
significant digits in printout |
... |
arguments passed to the print function and other methods |
object |
An object of class |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
confidence level required. |
rlassoIVselectX
Objects of class rlassoIVselectX
are constructed by rlassoIVselectX
.
print.rlassoIVselectX
prints and displays some information about fitted rlassoIVselectX
objects.
summary.rlassoIVselectX
summarizes information of a fitted rlassoIVselectX
object.
confint.rlassoIVselectX
extracts the confidence intervals.
## S3 method for class 'rlassoIVselectX' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIVselectX' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIVselectX' confint(object, parm, level = 0.95, ...)
## S3 method for class 'rlassoIVselectX' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIVselectX' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIVselectX' confint(object, parm, level = 0.95, ...)
x |
an object of class |
digits |
significant digits in printout |
... |
arguments passed to the print function and other methods |
object |
an object of class |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
the confidence level required. |
rlassoIVselectZ
Objects of class rlassoIVselectZ
are constructed by rlassoIVselectZ
.
print.rlassoIVselectZ
prints and displays some information about fitted rlassoIVselectZ
objects.
summary.rlassoIVselectZ
summarizes information of a fitted rlassoIVselectZ
object.
confint.rlassoIVselectZ
extracts the confidence intervals.
## S3 method for class 'rlassoIVselectZ' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIVselectZ' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIVselectZ' confint(object, parm, level = 0.95, ...)
## S3 method for class 'rlassoIVselectZ' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIVselectZ' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoIVselectZ' confint(object, parm, level = 0.95, ...)
x |
an object of class |
digits |
significant digits in printout |
... |
arguments passed to the print function and other methods |
object |
an object of class |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
confidence level required. |
rlassologitEffects
Objects of class rlassologitEffects
are construced by rlassologitEffects
or rlassologitEffect
.
print.rlassologitEffects
prints and displays some information about fitted rlassologitEffect
objects.
summary.rlassologitEffects
summarizes information of a fitted rlassologitEffects
object.
confint.rlassologitEffects
extracts the confidence intervals.
## S3 method for class 'rlassologitEffects' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassologitEffects' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassologitEffects' confint(object, parm, level = 0.95, joint = FALSE, ...)
## S3 method for class 'rlassologitEffects' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassologitEffects' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassologitEffects' confint(object, parm, level = 0.95, joint = FALSE, ...)
x |
an object of class |
digits |
number of significant digits in printout |
... |
arguments passed to the print function and other methods |
object |
an object of class |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
confidence level required. |
joint |
logical, if joint confidence intervals should be clalculated |
rlassoTE
Objects of class rlassoTE
are constructed by rlassoATE
, rlassoATET
, rlassoLATE
, rlassoLATET
.
print.rlassoTE
prints and displays some information about fitted rlassoTE
objects.
summary.rlassoTE
summarizes information of a fitted rlassoTE
object.
confint.rlassoTE
extracts the confidence intervals.
## S3 method for class 'rlassoTE' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoTE' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoTE' confint(object, parm, level = 0.95, ...)
## S3 method for class 'rlassoTE' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoTE' summary(object, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'rlassoTE' confint(object, parm, level = 0.95, ...)
x |
an object of class |
digits |
number of significant digits in printout |
... |
arguments passed to the print function and other methods |
object |
an object of class |
parm |
a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered. |
level |
confidence level required. |
tsls
Objects of class tsls
are constructed by tsls
.
print.tsls
prints and displays some information about fitted tsls
objects.
summary.tsls
summarizes information of a fitted tsls
object.
## S3 method for class 'tsls' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'tsls' summary(object, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'tsls' print(x, digits = max(3L, getOption("digits") - 3L), ...) ## S3 method for class 'tsls' summary(object, digits = max(3L, getOption("digits") - 3L), ...)
x |
an object of class |
digits |
significant digits in printout |
... |
arguments passed to the print function and other methods |
object |
an object of class |
The function estimates the coefficients of a Lasso regression with
data-driven penalty under homoscedasticity and heteroscedasticity with non-Gaussian noise and X-dependent or X-independent design. The
method of the data-driven penalty can be chosen. The object which is
returned is of the S3 class rlasso
.
rlasso(x, ...) ## S3 method for class 'formula' rlasso( formula, data = NULL, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(numIter = 15, tol = 10^-5, threshold = NULL), ... ) ## S3 method for class 'character' rlasso( x, data = NULL, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(numIter = 15, tol = 10^-5, threshold = NULL), ... ) ## Default S3 method: rlasso( x, y, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(numIter = 15, tol = 10^-5, threshold = NULL), ... )
rlasso(x, ...) ## S3 method for class 'formula' rlasso( formula, data = NULL, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(numIter = 15, tol = 10^-5, threshold = NULL), ... ) ## S3 method for class 'character' rlasso( x, data = NULL, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(numIter = 15, tol = 10^-5, threshold = NULL), ... ) ## Default S3 method: rlasso( x, y, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(numIter = 15, tol = 10^-5, threshold = NULL), ... )
x |
regressors (vector, matrix or object can be coerced to matrix) |
... |
further arguments (only for consistent defintion of methods) |
formula |
an object of class "formula" (or one that can be coerced to
that class): a symbolic description of the model to be fitted in the form
|
data |
an optional data frame, list or environment (or object coercible
by as.data.frame to a data frame) containing the variables in the model. If
not found in data, the variables are taken from environment(formula),
typically the environment from which |
post |
logical. If |
intercept |
logical. If |
model |
logical. If |
penalty |
list with options for the calculation of the penalty.
|
control |
list with control values.
|
y |
dependent variable (vector, matrix or object can be coerced to matrix) |
The function estimates the coefficients of a Lasso regression with
data-driven penalty under homoscedasticity / heteroscedasticity and non-Gaussian noise. The options homoscedastic
is a logical with FALSE
by default.
Moreover, for the calculation of the penalty parameter it can be chosen, if the penalization parameter depends on the design matrix (X.dependent.lambda=TRUE
) or independent
(default, X.dependent.lambda=FALSE
).
The default value of the constant c
is 1.1
in the post-Lasso case and 0.5
in the Lasso case.
A special option is to set homoscedastic
to none
and to supply a values lambda.start
. Then this value is used as penalty parameter with independent design and heteroscedastic errors to weight the regressors.
For details of the
implementation of the Algorithm for estimation of the data-driven penalty,
in particular the regressor-independent loadings, we refer to Appendix A in
Belloni et al. (2012). When the option "none" is chosen for homoscedastic
(together with
lambda.start
), lambda is set to lambda.start
and the
regressor-independent loadings und heteroscedasticity are used. The options "X-dependent" and
"X-independent" under homoscedasticity are described in Belloni et al. (2013).
The option post=TRUE
conducts post-lasso estimation, i.e. a refit of
the model with the selected variables.
rlasso
returns an object of class rlasso
. An object of
class "rlasso" is a list containing at least the following components:
coefficients |
parameter estimates |
beta |
parameter estimates (named vector of coefficients without intercept) |
intercept |
value of the intercept |
index |
index of selected variables (logical vector) |
lambda |
data-driven penalty term for each variable, product of lambda0 (the penalization parameter) and the loadings |
lambda0 |
penalty term |
loadings |
loading for each regressor |
residuals |
residuals, response minus fitted values |
sigma |
root of the variance of the residuals |
iter |
number of iterations |
call |
function call |
options |
options |
model |
model matrix (if |
A. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369-2429.
A. Belloni, V. Chernozhukov and C. Hansen (2013). Inference for high-dimensional sparse econometric models. In Advances in Economics and Econometrics: 10th World Congress, Vol. 3: Econometrics, Cambirdge University Press: Cambridge, 245-295.
set.seed(1) n = 100 #sample size p = 100 # number of variables s = 3 # nubmer of variables with non-zero coefficients X = Xnames = matrix(rnorm(n*p), ncol=p) colnames(Xnames) <- paste("V", 1:p, sep="") beta = c(rep(5,s), rep(0,p-s)) Y = X%*%beta + rnorm(n) reg.lasso <- rlasso(Y~Xnames) Xnew = matrix(rnorm(n*p), ncol=p) # new X colnames(Xnew) <- paste("V", 1:p, sep="") Ynew = Xnew%*%beta + rnorm(n) #new Y yhat = predict(reg.lasso, newdata = Xnew)
set.seed(1) n = 100 #sample size p = 100 # number of variables s = 3 # nubmer of variables with non-zero coefficients X = Xnames = matrix(rnorm(n*p), ncol=p) colnames(Xnames) <- paste("V", 1:p, sep="") beta = c(rep(5,s), rep(0,p-s)) Y = X%*%beta + rnorm(n) reg.lasso <- rlasso(Y~Xnames) Xnew = matrix(rnorm(n*p), ncol=p) # new X colnames(Xnew) <- paste("V", 1:p, sep="") Ynew = Xnew%*%beta + rnorm(n) #new Y yhat = predict(reg.lasso, newdata = Xnew)
This class of functions estimates the average treatment effect (ATE), the ATE of the tretated (ATET), the local average treatment effects (LATE) and the LATE of the tretated (LATET). The estimation methods rely on immunized / orthogonal moment conditions which guarantee valid post-selection inference in a high-dimensional setting. Further details can be found in Belloni et al. (2014).
rlassoATE(x, ...) ## Default S3 method: rlassoATE(x, d, y, bootstrap = "none", nRep = 500, ...) ## S3 method for class 'formula' rlassoATE(formula, data, bootstrap = "none", nRep = 500, ...) rlassoATET(x, ...) ## Default S3 method: rlassoATET(x, d, y, bootstrap = "none", nRep = 500, ...) ## S3 method for class 'formula' rlassoATET(formula, data, bootstrap = "none", nRep = 500, ...) rlassoLATE(x, ...) ## Default S3 method: rlassoLATE( x, d, y, z, bootstrap = "none", nRep = 500, post = TRUE, intercept = TRUE, always_takers = TRUE, never_takers = TRUE, ... ) ## S3 method for class 'formula' rlassoLATE( formula, data, bootstrap = "none", nRep = 500, post = TRUE, intercept = TRUE, always_takers = TRUE, never_takers = TRUE, ... ) rlassoLATET(x, ...) ## Default S3 method: rlassoLATET( x, d, y, z, bootstrap = "none", nRep = 500, post = TRUE, intercept = TRUE, always_takers = TRUE, ... ) ## S3 method for class 'formula' rlassoLATET( formula, data, bootstrap = "none", nRep = 500, post = TRUE, intercept = TRUE, always_takers = TRUE, ... )
rlassoATE(x, ...) ## Default S3 method: rlassoATE(x, d, y, bootstrap = "none", nRep = 500, ...) ## S3 method for class 'formula' rlassoATE(formula, data, bootstrap = "none", nRep = 500, ...) rlassoATET(x, ...) ## Default S3 method: rlassoATET(x, d, y, bootstrap = "none", nRep = 500, ...) ## S3 method for class 'formula' rlassoATET(formula, data, bootstrap = "none", nRep = 500, ...) rlassoLATE(x, ...) ## Default S3 method: rlassoLATE( x, d, y, z, bootstrap = "none", nRep = 500, post = TRUE, intercept = TRUE, always_takers = TRUE, never_takers = TRUE, ... ) ## S3 method for class 'formula' rlassoLATE( formula, data, bootstrap = "none", nRep = 500, post = TRUE, intercept = TRUE, always_takers = TRUE, never_takers = TRUE, ... ) rlassoLATET(x, ...) ## Default S3 method: rlassoLATET( x, d, y, z, bootstrap = "none", nRep = 500, post = TRUE, intercept = TRUE, always_takers = TRUE, ... ) ## S3 method for class 'formula' rlassoLATET( formula, data, bootstrap = "none", nRep = 500, post = TRUE, intercept = TRUE, always_takers = TRUE, ... )
x |
exogenous variables |
... |
arguments passed, e.g. |
d |
treatment variable (binary) |
y |
outcome variable / dependent variable |
bootstrap |
boostrap method which should be employed: 'none', 'Bayes', 'normal', 'wild' |
nRep |
number of replications for the bootstrap |
formula |
An object of class |
data |
An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model.
If not found in data, the variables are taken from environment(formula), typically the environment from which |
z |
instrumental variables (binary) |
post |
logical. If |
intercept |
logical. If |
always_takers |
option to adapt to cases with (default) and without always-takers. If |
never_takers |
option to adapt to cases with (default) and without never-takers. If |
Details can be found in Belloni et al. (2014).
Functions return an object of class rlassoTE
with estimated effects, standard errors and
individual effects in the form of a list
.
A. Belloni, V. Chernozhukov, I. Fernandez-Val, and C. Hansen (2014). Program evaluation with high-dimensional data. Working Paper.
Estimation and inference of (low-dimensional) target coefficients in a high-dimensional linear model.
rlassoEffects(x, ...) ## Default S3 method: rlassoEffects( x, y, index = c(1:ncol(x)), method = "partialling out", I3 = NULL, post = TRUE, ... ) ## S3 method for class 'formula' rlassoEffects( formula, data, I, method = "partialling out", included = NULL, post = TRUE, ... ) rlassoEffect(x, y, d, method = "double selection", I3 = NULL, post = TRUE, ...)
rlassoEffects(x, ...) ## Default S3 method: rlassoEffects( x, y, index = c(1:ncol(x)), method = "partialling out", I3 = NULL, post = TRUE, ... ) ## S3 method for class 'formula' rlassoEffects( formula, data, I, method = "partialling out", included = NULL, post = TRUE, ... ) rlassoEffect(x, y, d, method = "double selection", I3 = NULL, post = TRUE, ...)
x |
matrix of regressor variables serving as controls and potential
treatments. For |
... |
parameters passed to the |
y |
outcome variable (vector or matrix) |
index |
vector of integers, logicals or variables names indicating the position (column) of
variables (integer case), logical vector of length of the variables (TRUE or FALSE) or the variable names of |
method |
method for inference, either 'partialling out' (default) or 'double selection'. |
I3 |
For the 'double selection'-method the logical vector |
post |
logical, if post Lasso is conducted with default |
formula |
An element of class |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called. |
I |
An one-sided formula specifying the variables for which inference is conducted. |
included |
One-sided formula of variables which should be included in any case (only for method="double selection"). |
d |
variable for which inference is conducted (treatment variable) |
The functions estimates (low-dimensional) target coefficients in a high-dimensional linear model.
An application is e.g. estimation of a treatment effect in a
setting of high-dimensional controls. The user can choose between the so-called post-double-selection method and partialling-out.
The idea of the double selection method is to select variables by Lasso regression of
the outcome variable on the control variables and the treatment variable on
the control variables. The final estimation is done by a regression of the
outcome on the treatment effect and the union of the selected variables in
the first two steps. In partialling-out first the effect of the regressors on the outcome and the treatment variable is taken out by Lasso and then a regression of the residuals is conducted. The resulting estimator for
is normal
distributed which allows inference on the treatment effect. It presents a wrap function for
rlassoEffect
which does inference for a single variable.
The function returns an object of class rlassoEffects
with the following entries:
coefficients |
vector with estimated values of the coefficients for each selected variable |
se |
standard error (vector) |
t |
t-statistic |
pval |
p-value |
samplesize |
sample size of the data set |
index |
index of the variables for which inference is performed |
A. Belloni, V. Chernozhukov, C. Hansen (2014). Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies 81(2), 608-650.
library(hdm); library(ggplot2) set.seed(1) n = 100 #sample size p = 100 # number of variables s = 3 # number of non-zero variables X = matrix(rnorm(n*p), ncol=p) colnames(X) <- paste("X", 1:p, sep="") beta = c(rep(3,s), rep(0,p-s)) y = 1 + X%*%beta + rnorm(n) data = data.frame(cbind(y,X)) colnames(data)[1] <- "y" fm = paste("y ~", paste(colnames(X), collapse="+")) fm = as.formula(fm) lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50)) lasso.effect = rlassoEffects(fm, I = ~ X1 + X2 + X3 + X50, data=data) print(lasso.effect) summary(lasso.effect) confint(lasso.effect) plot(lasso.effect)
library(hdm); library(ggplot2) set.seed(1) n = 100 #sample size p = 100 # number of variables s = 3 # number of non-zero variables X = matrix(rnorm(n*p), ncol=p) colnames(X) <- paste("X", 1:p, sep="") beta = c(rep(3,s), rep(0,p-s)) y = 1 + X%*%beta + rnorm(n) data = data.frame(cbind(y,X)) colnames(data)[1] <- "y" fm = paste("y ~", paste(colnames(X), collapse="+")) fm = as.formula(fm) lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50)) lasso.effect = rlassoEffects(fm, I = ~ X1 + X2 + X3 + X50, data=data) print(lasso.effect) summary(lasso.effect) confint(lasso.effect) plot(lasso.effect)
The function estimates a treatment effect in a setting with very many controls and very many instruments (even larger than the sample size).
rlassoIV(x, ...) ## Default S3 method: rlassoIV(x, d, y, z, select.Z = TRUE, select.X = TRUE, post = TRUE, ...) ## S3 method for class 'formula' rlassoIV(formula, data, select.Z = TRUE, select.X = TRUE, post = TRUE, ...) rlassoIVmult(x, d, y, z, select.Z = TRUE, select.X = TRUE, ...)
rlassoIV(x, ...) ## Default S3 method: rlassoIV(x, d, y, z, select.Z = TRUE, select.X = TRUE, post = TRUE, ...) ## S3 method for class 'formula' rlassoIV(formula, data, select.Z = TRUE, select.X = TRUE, post = TRUE, ...) rlassoIVmult(x, d, y, z, select.Z = TRUE, select.X = TRUE, ...)
x |
matrix of exogenous variables |
... |
arguments passed to the function |
d |
endogenous variable |
y |
outcome / dependent variable (vector or matrix) |
z |
matrix of instrumental variables |
select.Z |
logical, indicating selection on the instruments. |
select.X |
logical, indicating selection on the exogenous variables. |
post |
logical, wheter post-Lasso should be conducted (default= |
formula |
An object of class |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model.
If not found in data, the variables are taken from environment(formula), typically the environment from which |
The implementation for selection on x and z follows the procedure described in Chernozhukov et al.
(2015) and is built on 'triple selection' to achieve an orthogonal moment
function. The function returns an object of S3 class rlassoIV
.
Moreover, it is wrap function for the case that selection should be done only with the instruments Z (rlassoIVselectZ
) or with
the control variables X (rlassoIVselectX
) or without selection (tsls
). Exogenous variables
x
are automatically used as instruments and added to the
instrument set z
.
an object of class rlassoIV
containing at least the following
components:
coefficients |
estimated parameter value |
se |
variance-covariance matrix |
V. Chernozhukov, C. Hansen, M. Spindler (2015). Post-selection and post-regularization inference in linear models with many controls and instruments. American Economic Review: Paper & Proceedings 105(5), 486–490.
## Not run: data(EminentDomain) z <- EminentDomain$logGDP$z # instruments x <- EminentDomain$logGDP$x # exogenous variables y <- EminentDomain$logGDP$y # outcome varialbe d <- EminentDomain$logGDP$d # treatment / endogenous variable lasso.IV.Z = rlassoIV(x=x, d=d, y=y, z=z, select.X=FALSE, select.Z=TRUE) summary(lasso.IV.Z) confint(lasso.IV.Z) ## End(Not run)
## Not run: data(EminentDomain) z <- EminentDomain$logGDP$z # instruments x <- EminentDomain$logGDP$x # exogenous variables y <- EminentDomain$logGDP$y # outcome varialbe d <- EminentDomain$logGDP$d # treatment / endogenous variable lasso.IV.Z = rlassoIV(x=x, d=d, y=y, z=z, select.X=FALSE, select.Z=TRUE) summary(lasso.IV.Z) confint(lasso.IV.Z) ## End(Not run)
This function estimates the coefficient of an endogenous variable by employing Instrument Variables in a setting where the exogenous variables are high-dimensional and hence
selection on the exogenous variables is required.
The function returns an element of class rlassoIVselectX
rlassoIVselectX(x, ...) ## Default S3 method: rlassoIVselectX(x, d, y, z, post = TRUE, ...) ## S3 method for class 'formula' rlassoIVselectX(formula, data, post = TRUE, ...)
rlassoIVselectX(x, ...) ## Default S3 method: rlassoIVselectX(x, d, y, z, post = TRUE, ...) ## S3 method for class 'formula' rlassoIVselectX(formula, data, post = TRUE, ...)
x |
exogenous variables in the structural equation (matrix) |
... |
arguments passed to the function |
d |
endogenous variables in the structural equation (vector or matrix) |
y |
outcome or dependent variable in the structural equation (vector or matrix) |
z |
set of potential instruments for the endogenous variables. |
post |
logical. If |
formula |
An object of class |
data |
An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model.
If not found in data, the variables are taken from environment(formula), typically the environment from which |
The implementation is a special case of of Chernozhukov et al. (2015).
The option post=TRUE
conducts post-lasso estimation for the Lasso estimations, i.e. a refit of the
model with the selected variables. Exogenous variables
x
are automatically used as instruments and added to the
instrument set z
.
An object of class rlassoIVselectX
containing at least the following
components:
coefficients |
estimated parameter vector |
vcov |
variance-covariance matrix |
residuals |
residuals |
samplesize |
sample size |
Chernozhukov, V., Hansen, C. and M. Spindler (2015). Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments American Economic Review, Papers and Proceedings 105(5), 486–490.
library(hdm) data(AJR); y = AJR$GDP; d = AJR$Exprop; z = AJR$logMort x = model.matrix(~ -1 + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2, data=AJR) dim(x) #AJR.Xselect = rlassoIV(x=x, d=d, y=y, z=z, select.X=TRUE, select.Z=FALSE) AJR.Xselect = rlassoIV(GDP ~ Exprop + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2 | logMort + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2, data=AJR, select.X=TRUE, select.Z=FALSE) summary(AJR.Xselect) confint(AJR.Xselect)
library(hdm) data(AJR); y = AJR$GDP; d = AJR$Exprop; z = AJR$logMort x = model.matrix(~ -1 + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2, data=AJR) dim(x) #AJR.Xselect = rlassoIV(x=x, d=d, y=y, z=z, select.X=TRUE, select.Z=FALSE) AJR.Xselect = rlassoIV(GDP ~ Exprop + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2 | logMort + (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2, data=AJR, select.X=TRUE, select.Z=FALSE) summary(AJR.Xselect) confint(AJR.Xselect)
This function selects the instrumental variables in the first stage by
Lasso. First stage predictions are then used in the second stage as optimal
instruments to estimate the parameter vector. The function returns an element of class rlassoIVselectZ
rlassoIVselectZ(x, ...) ## Default S3 method: rlassoIVselectZ(x, d, y, z, post = TRUE, intercept = TRUE, ...) ## S3 method for class 'formula' rlassoIVselectZ(formula, data, post = TRUE, intercept = TRUE, ...)
rlassoIVselectZ(x, ...) ## Default S3 method: rlassoIVselectZ(x, d, y, z, post = TRUE, intercept = TRUE, ...) ## S3 method for class 'formula' rlassoIVselectZ(formula, data, post = TRUE, intercept = TRUE, ...)
x |
exogenous variables in the structural equation (matrix) |
... |
arguments passed to the function |
d |
endogenous variables in the structural equation (vector or matrix) |
y |
outcome or dependent variable in the structural equation (vector or matrix) |
z |
set of potential instruments for the endogenous variables. Exogenous variables serve as their own instruments. |
post |
logical. If |
intercept |
logical. If |
formula |
An object of class |
data |
An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model.
If not found in data, the variables are taken from environment(formula), typically the environment from which |
The implementation follows the procedure described in Belloni et al. (2012).
Option post=TRUE
conducts post-lasso estimation, i.e. a refit of the
model with the selected variables, to estimate the optimal instruments. The
parameter vector of the structural equation is then fitted by two-stage
least square (tsls) estimation.
An object of class rlassoIVselectZ
containing at least the following
components:
coefficients |
estimated parameter vector |
vcov |
variance-covariance matrix |
residuals |
residuals |
samplesize |
sample size |
selection.matrix |
matrix of selected variables in the first stage for each endogenous variable |
D. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369–2429.
The function estimates the coefficients of a logistic Lasso regression with
data-driven penalty. The method of the data-driven penalty can be chosen.
The object which is returned is of the S3 class rlassologit
rlassologit(x, ...) ## S3 method for class 'formula' rlassologit( formula, data = NULL, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(threshold = NULL), ... ) ## S3 method for class 'character' rlassologit( x, data = NULL, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(threshold = NULL), ... ) ## Default S3 method: rlassologit( x, y, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(threshold = NULL), ... )
rlassologit(x, ...) ## S3 method for class 'formula' rlassologit( formula, data = NULL, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(threshold = NULL), ... ) ## S3 method for class 'character' rlassologit( x, data = NULL, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(threshold = NULL), ... ) ## Default S3 method: rlassologit( x, y, post = TRUE, intercept = TRUE, model = TRUE, penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)), control = list(threshold = NULL), ... )
x |
regressors (matrix) |
... |
further parameters passed to glmnet |
formula |
an object of class 'formula' (or one that can be coerced to
that class): a symbolic description of the model to be fitted in the form
|
data |
an optional data frame, list or environment. |
post |
logical. If |
intercept |
logical. If |
model |
logical. If |
penalty |
list with options for the calculation of the penalty. |
control |
list with control values.
|
y |
dependent variable (vector or matrix) |
The function estimates the coefficients of a Logistic Lasso regression with
data-driven penalty. The
option post=TRUE
conducts post-lasso estimation, i.e. a refit of the
model with the selected variables.
rlassologit
returns an object of class
rlassologit
. An object of class rlassologit
is a list
containing at least the following components:
coefficients |
parameter estimates |
beta |
parameter estimates (without intercept) |
intercept |
value of intercept |
index |
index of selected variables (logicals) |
lambda |
penalty term |
residuals |
residuals |
sigma |
root of the variance of the residuals |
call |
function call |
options |
options |
Belloni, A., Chernozhukov and Y. Wei (2013). Honest confidence regions for logistic regression with a large number of controls. arXiv preprint arXiv:1304.3969.
## Not run: library(hdm) ## DGP set.seed(2) n <- 250 p <- 100 px <- 10 X <- matrix(rnorm(n*p), ncol=p) beta <- c(rep(2,px), rep(0,p-px)) intercept <- 1 P <- exp(intercept + X %*% beta)/(1+exp(intercept + X %*% beta)) y <- rbinom(length(y), size=1, prob=P) ## fit rlassologit object rlassologit.reg <- rlassologit(y~X) ## methods summary(rlassologit.reg, all=F) print(rlassologit.reg) predict(rlassologit.reg, type='response') X3 <- matrix(rnorm(n*p), ncol=p) predict(rlassologit.reg, newdata=X3) ## End(Not run)
## Not run: library(hdm) ## DGP set.seed(2) n <- 250 p <- 100 px <- 10 X <- matrix(rnorm(n*p), ncol=p) beta <- c(rep(2,px), rep(0,p-px)) intercept <- 1 P <- exp(intercept + X %*% beta)/(1+exp(intercept + X %*% beta)) y <- rbinom(length(y), size=1, prob=P) ## fit rlassologit object rlassologit.reg <- rlassologit(y~X) ## methods summary(rlassologit.reg, all=F) print(rlassologit.reg) predict(rlassologit.reg, type='response') X3 <- matrix(rnorm(n*p), ncol=p) predict(rlassologit.reg, newdata=X3) ## End(Not run)
The function estimates (low-dimensional) target coefficients in a high-dimensional logistic model.
rlassologitEffects(x, ...) ## Default S3 method: rlassologitEffects(x, y, index = c(1:ncol(x)), I3 = NULL, post = TRUE, ...) ## S3 method for class 'formula' rlassologitEffects(formula, data, I, included = NULL, post = TRUE, ...) rlassologitEffect(x, y, d, I3 = NULL, post = TRUE)
rlassologitEffects(x, ...) ## Default S3 method: rlassologitEffects(x, y, index = c(1:ncol(x)), I3 = NULL, post = TRUE, ...) ## S3 method for class 'formula' rlassologitEffects(formula, data, I, included = NULL, post = TRUE, ...) rlassologitEffect(x, y, d, I3 = NULL, post = TRUE)
x |
matrix of regressor variables serving as controls and potential
treatments. For |
... |
additional parameters |
y |
outcome variable |
index |
vector of integers, logical or names indicating the position (column) or name of variables of x which should be used as treatment variables. |
I3 |
logical vector with same length as the number of controls; indicates if variables (TRUE) should be included in any case. |
post |
logical. If |
formula |
An element of class |
data |
an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called. |
I |
An one-sided formula specifying the variables for which inference is conducted. |
included |
One-sided formula of variables which should be included in any case. |
d |
variable for which inference is conducted (treatment variable) |
The functions estimates (low-dimensional) target coefficients in a high-dimensional logistic model.
An application is e.g. estimation of a treatment effect in a
setting of high-dimensional controls. The function is a wrap function for
rlassologitEffect
which does inference for only one variable (d).
The function returns an object of class rlassologitEffects
with the following entries:
coefficients |
estimated value of the coefficients |
se |
standard errors |
t |
t-statistics |
pval |
p-values |
samplesize |
sample size of the data set |
I |
index of variables of the union of the lasso regressions |
A. Belloni, V. Chernozhukov, Y. Wei (2013). Honest confidence regions for a regression parameter in logistic regression with a loarge number of controls. cemmap working paper CWP67/13.
## Not run: library(hdm) ## DGP set.seed(2) n <- 250 p <- 100 px <- 10 X <- matrix(rnorm(n*p), ncol=p) colnames(X) = paste("V", 1:p, sep="") beta <- c(rep(2,px), rep(0,p-px)) intercept <- 1 P <- exp(intercept + X %*% beta)/(1+exp(intercept + X %*% beta)) y <- rbinom(n, size=1, prob=P) xd <- X[,2:50] d <- X[,1] logit.effect <- rlassologitEffect(x=xd, d=d, y=y) logit.effects <- rlassologitEffects(X,y, index=c(1,2,40)) logit.effects.f <- rlassologitEffects(y ~ X, I = ~ V1 + V2) ## End(Not run)
## Not run: library(hdm) ## DGP set.seed(2) n <- 250 p <- 100 px <- 10 X <- matrix(rnorm(n*p), ncol=p) colnames(X) = paste("V", 1:p, sep="") beta <- c(rep(2,px), rep(0,p-px)) intercept <- 1 P <- exp(intercept + X %*% beta)/(1+exp(intercept + X %*% beta)) y <- rbinom(n, size=1, prob=P) xd <- X[,2:50] d <- X[,1] logit.effect <- rlassologitEffect(x=xd, d=d, y=y) logit.effects <- rlassologitEffects(X,y, index=c(1,2,40)) logit.effects.f <- rlassologitEffects(y ~ X, I = ~ V1 + V2) ## End(Not run)
Summary method for class rlassoEffects
## S3 method for class 'rlassoEffects' summary(object, ...) ## S3 method for class 'summary.rlassoEffects' print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassoEffects' summary(object, ...) ## S3 method for class 'summary.rlassoEffects' print(x, digits = max(3L, getOption("digits") - 3L), ...)
object |
an object of class |
... |
further arguments passed to or from other methods. |
x |
an object of class |
digits |
the number of significant digits to use when printing. |
Summary of objects of class rlassoEffects
The function does Two-Stage Least Squares Estimation (TSLS).
tsls(x, ...) ## Default S3 method: tsls(x, d, y, z, intercept = TRUE, homoscedastic = TRUE, ...) ## S3 method for class 'formula' tsls(formula, data, intercept = TRUE, homoscedastic = TRUE, ...)
tsls(x, ...) ## Default S3 method: tsls(x, d, y, z, intercept = TRUE, homoscedastic = TRUE, ...) ## S3 method for class 'formula' tsls(formula, data, intercept = TRUE, homoscedastic = TRUE, ...)
x |
exogenous variables |
... |
further arguments (only for consistent defintion of methods) |
d |
endogenous variables |
y |
outcome variable |
z |
instruments |
intercept |
logical, if intercept should be included |
homoscedastic |
logical, if homoscedastic ( |
formula |
An object of class |
data |
An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model.
If not found in data, the variables are taken from environment(formula), typically the environment from which |
The function computes tsls estimate (coefficients) and variance-covariance-matrix assuming homoskedasticity
for outcome variable y
where d
are endogenous variables in structural equation, x
are exogensous variables in
structural equation and z are instruments. It returns an object of class tsls
for which the methods print
and summary
are provided.
The function returns a list with the following elements
coefficients |
coefficients |
vcov |
variance-covariance matrix |
residuals |
outcome minus predicted values |
call |
function call |
samplesize |
sample size |
se |
standard error |