Package 'hdm'

Title:	High-Dimensional Metrics
Description:	Implementation of selected high-dimensional statistical and econometric methods for estimation and inference. Efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/ structural parameters are provided which appear in high-dimensional approximately sparse models. Including functions for fitting heteroscedastic robust Lasso regressions with non-Gaussian errors and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference and rely on a theoretically grounded, data-driven choice of the penalty. Chernozhukov, Hansen, Spindler (2016) <arXiv:1603.01700>.
Authors:	Martin Spindler [cre, aut], Victor Chernozhukov [aut], Christian Hansen [aut], Philipp Bach [ctb]
Maintainer:	Martin Spindler <[email protected]>
License:	MIT + file LICENSE
Version:	0.3.1
Built:	2025-02-13 04:29:06 UTC
Source:	https://github.com/martinspindler/hdm

Help Index

hdm: High-Dimensional Metrics
AJR data set
BLP data set
Coefficients from S3 objects rlassoEffects
Coefficients from S3 objects rlassoIV
Coefficients from S3 objects rlassoIVselectX
Coefficients from S3 objects rlassoIVselectZ
cps2012 data set
Eminent Domain data set
Growth data set
Function for Calculation of the penalty parameter
Shooting Lasso
Multiple Testing Adjustment of p-values for S3 objects rlassoEffects and lm
Pension 401(k) data set
Methods for S3 object rlassologit
Printing coefficients from S3 objects rlassoEffects
Methods for S3 object rlasso
Methods for S3 object rlassoEffects
Methods for S3 object rlassoIV
Methods for S3 object rlassoIVselectX
Methods for S3 object rlassoIVselectZ
Methods for S3 object rlassologitEffects
Methods for S3 object rlassoTE
Methods for S3 object tsls
rlasso: Function for Lasso estimation under homoscedastic and heteroscedastic non-Gaussian disturbances
Functions for estimation of treatment effects
rigorous Lasso for Linear Models: Inference
Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments
Instrumental Variable Estimation with Selection on the exogenous Variables by Lasso
Instrumental Variable Estimation with Lasso
rlassologit: Function for logistic Lasso estimation
rigorous Lasso for Logistic Models: Inference
Summarizing rlassoEffects fits
Two-Stage Least Squares Estimation (TSLS)

hdm: High-Dimensional Metrics

Description

This package implements methods for estimation and inference in a high-dimensional setting.

Details

Package:	hdm
Type:	Package
Version:	0.1
Date:	2015-05-25
License:	GPL-3

This package provides efficient estimators and uniformly valid confidence intervals for various low-dimensional causal/structural parameters appearing in high-dimensional approximately sparse models. The package includes functions for fitting heteroskedastic robust Lasso regressions with non-Gaussian erros and for instrumental variable (IV) and treatment effect estimation in a high-dimensional setting. Moreover, the methods enable valid post-selection inference. Moreover, a theoretically grounded, data-driven choice of the penalty level is provided.

Author(s)

Victor Chernozhukov, Christian Hansen, Martin Spindler

Maintainer: Martin Spindler <[email protected]>

References

A. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369-2429.

A. Belloni, V. Chernozhukov and C. Hansen (2013). Inference for high-dimensional sparse econometric models. In Advances in Economics and Econometrics: 10th World Congress, Vol. 3: Econometrics, Cambirdge University Press: Cambridge, 245-295.

A. Belloni, V. Chernozhukov, C. Hansen (2014). Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies 81(2), 608-650.

AJR data set

Description

Dataset on settler mortality.

Format

Mort: Settler mortality
logMort: logarithm of Mort
Latitude: Latitude
Latitude2: Latitude^2
Africa: Africa
Asia: Asia
Namer: North America
Samer: South America
Neo: Neo-Europes
GDP: GDP
Exprop: Average protection against expropriation risk

Details

Data set was analysed in Acemoglu et al. (2001). A detailed description of the data can be found at http://economics.mit.edu/faculty/acemoglu/data/ajr2001

References

D. Acemoglu, S. Johnson, J. A. Robinson (2001). Colonial origins of comparative development: an empirical investigation. American Economic Review, 91, 1369–1401.

Examples

data(AJR)
data(AJR)

BLP data set

Description

Automobile data set from the US.

Format

model.name: model name
model.id: model id
firm.id: firm id
cdid: cdid
id: id
price: log price
mpg: miles per gallon
mpd: miles per dollar
hpwt: horse power per weight
air: air conditioning (binary variable)
space: size of the car
share: market share
outshr: share s0
y: outcome variable defined as log(share) - log(outshr)
trend: time trend

Details

Data set was analysed in Berry, Levinsohn and Pakes (1995). The data stem from annual issues of the Automotive News Market Data Book. The data set inlcudes information on all models marketed during the the period beginning 1971 and ending in 1990 cotaining 2217 model/years from 997 distinct models. A detailed description is given in BLP (1995, 868–871). The internal function constructIV constructs instrumental variables along the lines described and used in BLP (1995).

References

S. Berry, J. Levinsohn, A. Pakes (1995). Automobile Prices in Market EquilibriumD. Econometrica, 63(4), 841–890.

Examples

data(BLP)
data(BLP)

Coefficients from S3 objects `rlassoEffects`

Description

Method to extract coefficients from objects of class rlassoEffects

Usage

## S3 method for class 'rlassoEffects'
coef(
  object,
  complete = TRUE,
  selection.matrix = FALSE,
  include.targets = FALSE,
  ...
)
## S3 method for class 'rlassoEffects'
coef(
  object,
  complete = TRUE,
  selection.matrix = FALSE,
  include.targets = FALSE,
  ...
)

Arguments

`object`	an object of class `rlassoEffects`, usually a result of a call `rlassoEffect` or `rlassoEffects`.
`complete`	general option of the function `coef`.
`selection.matrix`	if TRUE, a selection matrix is returned that indicates the selected variables from each auxiliary regression. Default is set to FALSE.
`include.targets`	if FALSE (by default) only the selected control variables are listed in the `selection.matrix`. If set to TRUE, the selection matrix will also indicate the selection of the target coefficients that are specified in the `rlassoEffects` call.
`...`	further arguments passed to functions coef or print.

Details

Printing coefficients and selection matrix for S3 object rlassoEffects. Interpretation of entries in the selection matrix

"-" indicates a target variable,
"x" indicates that a variable has been selected with rlassoEffects (coefficient is different from zero),
"." indicates that a variable has been de-selected with rlassoEffects (coefficient is zero).

Examples

library(hdm)
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 7 # number of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50), 
                             method = "double selection")
coef(lasso.effect) # standard use of coef() - without selection matrix
# with selection matrix
coef(lasso.effect, selection.matrix = TRUE)
# prettier output with print_coef (identical options as coef())
print_coef(lasso.effect, selection.matrix = TRUE) 
library(hdm)
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 7 # number of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50), 
                             method = "double selection")
coef(lasso.effect) # standard use of coef() - without selection matrix
# with selection matrix
coef(lasso.effect, selection.matrix = TRUE)
# prettier output with print_coef (identical options as coef())
print_coef(lasso.effect, selection.matrix = TRUE)

Coefficients from S3 objects `rlassoIV`

Description

Method to extract coefficients from objects of class rlassoIV.

Usage

## S3 method for class 'rlassoIV'
coef(object, complete = TRUE, selection.matrix = FALSE, ...)
## S3 method for class 'rlassoIV'
coef(object, complete = TRUE, selection.matrix = FALSE, ...)

Arguments

`object`	an object of class `rlassoIV`, usually a result of a call `rlassoIV` with options `select.X=TRUE` and `select.Z=TRUE`.
`complete`	general option of the function `coef`.
`selection.matrix`	if TRUE, a selection matrix is returned that indicates the selected variables from each first stage regression. Default is set to FALSE. See section on details for more information.
`...`	further arguments passed to function coef.

Details

Printing coefficients and selection matrix for S3 object rlassoIV. "x" indicates that a variable has been selected, i.e., the corresponding estimated coefficient is different from zero. The very last column collects all variables that have been selected in at least one of the lasso regressions represented in the selection.matrix. rlassoIV performs three lasso regression steps. A first stage lasso regression of the endogenous treatment variable d on the instruments z and exogenous covariates x, a lasso regression of y on the exogenous variables x, and a lasso regression of the instrumented treatment variable, i.e., a regression of the predicted values of d, on controls x.

Value

Coefficients obtained from rlassoIV by default. If option selection.matrix is TRUE, a list is returned with final coefficients, a matrix selection.matrix, and a matrix selection.matrixZ: selection.matrix contains the selection index for the lasso regression of y on x (first column) and the lasso regression of the predicted values of d on x together with the union of these indizes. selection.matrixZ contains the selection index from the first-stage lasso regression of d on z and x.

Examples

## Not run: 
data(EminentDomain)
z <- EminentDomain$logGDP$z # instruments
x <- EminentDomain$logGDP$x # exogenous variables
y <- EminentDomain$logGDP$y # outcome varialbe
d <- EminentDomain$logGDP$d # treatment / endogenous variable
lasso.IV = rlassoIV(x=x, d=d, y=y, z=z, select.X=TRUE, select.Z=TRUE) 
coef(lasso.IV) # default behavior
coef(lasso.IV, selection.matrix = T) # print selection matrix

## End(Not run)
## Not run: 
data(EminentDomain)
z <- EminentDomain$logGDP$z # instruments
x <- EminentDomain$logGDP$x # exogenous variables
y <- EminentDomain$logGDP$y # outcome varialbe
d <- EminentDomain$logGDP$d # treatment / endogenous variable
lasso.IV = rlassoIV(x=x, d=d, y=y, z=z, select.X=TRUE, select.Z=TRUE) 
coef(lasso.IV) # default behavior
coef(lasso.IV, selection.matrix = T) # print selection matrix

## End(Not run)

Coefficients from S3 objects `rlassoIVselectX`

Description

Method to extract coefficients and selection matrix from objects of class rlassoIVselectX.

Usage

## S3 method for class 'rlassoIVselectX'
coef(object, complete = TRUE, selection.matrix = FALSE, ...)
## S3 method for class 'rlassoIVselectX'
coef(object, complete = TRUE, selection.matrix = FALSE, ...)

Arguments

`object`	an object of class `rlassoIVselectX`, usually a result of a call `rlassoIVselectX` or `rlassoIV` with options `select.X=TRUE` and `select.Z=FALSE`.
`complete`	general option of the function `coef`.
`selection.matrix`	if TRUE, a selection matrix is returned that indicates the selected variables from each regression. Default is set to FALSE. See section on details for more information.
`...`	further arguments passed to functions coef.

Details

Printing coefficients and selection matrix for S3 object rlassoIVselectX. The first column of the selection matrix reports the selection index for the lasso regression of y on x in the specified rlassoIVselectX command. "x" indicates that a variable has been selected, i.e., the corresponding estimated coefficient is different from zero. The second column contains the selection index for the lasso regression of d on x and the remaining columns the index of selected variables x for the instruments z. The very last column collects all variables that have been selected in at least one of the lasso regressions.

Examples

## Not run: 
library(hdm)
data(AJR); y = AJR$GDP; d = AJR$Exprop; z = AJR$logMort
x = model.matrix(~ -1 + (Latitude + Latitude2 + Africa + 
                           Asia + Namer + Samer)^2, data=AJR)
AJR.Xselect = rlassoIV(GDP ~ Exprop +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2 |
                         logMort +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2,
                       data=AJR, select.X=TRUE, select.Z=FALSE)
coef(AJR.Xselect) # Default behavior
coef(AJR.Xselect, selection.matrix = TRUE) # print selection matrix

## End(Not run)
## Not run: 
library(hdm)
data(AJR); y = AJR$GDP; d = AJR$Exprop; z = AJR$logMort
x = model.matrix(~ -1 + (Latitude + Latitude2 + Africa + 
                           Asia + Namer + Samer)^2, data=AJR)
AJR.Xselect = rlassoIV(GDP ~ Exprop +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2 |
                         logMort +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2,
                       data=AJR, select.X=TRUE, select.Z=FALSE)
coef(AJR.Xselect) # Default behavior
coef(AJR.Xselect, selection.matrix = TRUE) # print selection matrix

## End(Not run)

Coefficients from S3 objects `rlassoIVselectZ`

Description

Method to extract coefficients from objects of class rlassoIVselectZ.

Usage

## S3 method for class 'rlassoIVselectZ'
coef(object, complete = TRUE, selection.matrix = FALSE, ...)
## S3 method for class 'rlassoIVselectZ'
coef(object, complete = TRUE, selection.matrix = FALSE, ...)

Arguments

`object`	an object of class `rlassoIVselectZ`, usually a result of a call `rlassoIVselectZ` or `rlassoIV` with options `select.X=FALSE` and `select.Z=TRUE`.
`complete`	general option of the function `coef`.
`selection.matrix`	if TRUE, a selection matrix is returned that indicates the selected variables from each first stage regression. Default is set to FALSE. See section on details for more information.
`...`	further arguments passed to functions coef.

Details

Printing coefficients and selection matrix for S3 object rlassoIVselectZ. The columns of the selection matrix report the selection index for the first stage lasso regressions as specified rlassoIVselectZ command, i.e., the selected variables for each of the endogenous variables. "x" indicates that a variable has been selected, i.e., the corresponding estimated coefficient is different from zero. The very last column collects all variables that have been selected in at least one of the lasso regressions.

Examples

## Not run: 
lasso.IV.Z = rlassoIVselectZ(x=x, d=d, y=y, z=z)
data(EminentDomain)
z <- EminentDomain$logGDP$z # instruments
x <- EminentDomain$logGDP$x # exogenous variables
y <- EminentDomain$logGDP$y # outcome varialbe
d <- EminentDomain$logGDP$d # treatment / endogenous variable
lasso.IV.Z = rlassoIVselectZ(x=x, d=d, y=y, z=z)
coef(lasso.IV.Z) # Default behavior
coef(lasso.IV.Z, selection.matrix = T)

## End(Not run)
## Not run: 
lasso.IV.Z = rlassoIVselectZ(x=x, d=d, y=y, z=z)
data(EminentDomain)
z <- EminentDomain$logGDP$z # instruments
x <- EminentDomain$logGDP$x # exogenous variables
y <- EminentDomain$logGDP$y # outcome varialbe
d <- EminentDomain$logGDP$d # treatment / endogenous variable
lasso.IV.Z = rlassoIVselectZ(x=x, d=d, y=y, z=z)
coef(lasso.IV.Z) # Default behavior
coef(lasso.IV.Z, selection.matrix = T)

## End(Not run)

cps2012 data set

Description

Census data from the US for the year 2012.

Format

lnw: log of hourly wage (annual earnings / annual hours)
female: female indicator
married status: six indicators: widowed, divorced, separated, nevermarried, and married (omitted)
education attainment: six indicators: hsd08, hsd911, hsg, cg, ad, and sc (omitted)
region indicators: four indicators: mw, so, we, and ne (omitted)
potential experience: (max[0, age - years of education - 7]): exp1, exp2 (divided by 100), exp3 (divided by 1000), exp4 (divided by 10000)
weight: March Supplement sampling weight
year: CPS year

Details

The CPS is a monthly U.S. household survey conducted jointly by the U.S. Census Bureau and the Bureau of Labor Statistics. The data comprise the year 2012. This data set was used in Mulligan and Rubinstein (2008). The sample comprises white non-hipanic, ages 25-54, working full time full year (35+ hours per week at least 50 weeks), exclude living in group quarters, self-employed, military, agricultural, and private household sector, allocated earning, inconsistent report on earnings and employment, missing data.

References

C. B. Mulligan and Y. Rubinstein (2008). Selection, investment, and women's relative wages over time. The Quarterly Journal of Economics, 1061–1110.

Examples

data(BLP)
data(BLP)

Eminent Domain data set

Description

Dataset on judicial eminent domain decisions.

Format

y: economic outcome variable
x: set of exogenous variables
d: eminent domain decisions
z: set of potential instruments

Details

Data set was analyzed in Belloni et al. (2012). They estimate the effect of judicial eminent domain decisions on economic outcomes with instrumental variables (IV) in a setting high a large set of potential IVs. A detailed decription of the data can be found at https://www.econometricsociety.org/publications/econometrica/2012/11/01/sparse-models-and-methods-optimal-instruments-application The data set contains four "sub-data sets" which differ mainly in the dependent variables: repeat-sales FHFA/OFHEO house price index for metro (FHFA) and non-metro (NM) area, the Case-Shiller home price index (CS), and state-level GDP from the Bureau of Economic Analysis - all transformed with the logarithm. The structure of each subdata set is given above. In the data set the following variables and name conventions are used: "numpanelskx_..." is the number of panels with at least k members with the characteristic following the "_". The probability controls (names start with "F_prob_") follow a similar naming convention and give the probability of observing a panel with characteristic given following second "_" given the characteristics of the pool of judges available to be assigned to the case.

Characteristics in the data for the control variables or instruments:

noreligion: judge reports no religious affiliation
jd_public: judge's law degree is from a public university
dem: judge reports being a democrat
female: judge is female
nonwhite: judge is nonwhite (and not black)
black: judge is black
jewish: judge is Jewish
catholic: judge is Catholic
mainline: baseline religion
protestant: belongs to a protestant church
evangelical: belongs to an evangelical church
instate_ba: judge's undergraduate degree was obtained within state
ba_public: judge's undergraduate degree was obtained at a public university
elev: judge was elevated from a district court
year: year dummy (reference category is one year before the earliest year in the data set (excluded))
circuit: dummy for the circuit level (reference category excluded)
missing_cy_12: a dummy for whether there were no cases in that circuit-year
numcasecat_12: the number of takings appellate decisions

References

D. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369–2429.

Examples

data(EminentDomain)
data(EminentDomain)

Growth data set

Description

Data set of growth compiled by Barro Lee.

Format

Dataframe with the following variables:

outcome: dependent variable: national growth rates in GDP per capita for the periods 1965-1975 and 1975-1985
x: covariates which might influence growth

Details

The data set contains growth data of Barro-Lee. The Barro Lee data consists of a panel of 138 countries for the period 1960 to 1985. The dependent variable is national growth rates in GDP per capita for the periods 1965-1975 and 1975-1985. The growth rate in GDP over a period from $t_1$ to $t_2$ is commonly defined as $\log(GDP_{t_1}/GDP_{t_2})$ . The number of covariates is p=62. The number of complete observations is 90.

Source

The full data set and further details can be found at http://www.nber.org/pub/barro.lee, http://www.barrolee.com, and, http://www.bristol.ac.uk//Depts//Economics//Growth//barlee.htm.

References

R.J. Barro, J.W. Lee (1994). Data set for a panel of 139 countries. NBER.

R.J. Barro, X. Sala-i-Martin (1995). Economic Growth. McGrwa-Hill, New York.

Examples

data(GrwothData)
data(GrwothData)

Function for Calculation of the penalty parameter

Description

This function implements different methods for calculation of the penalization parameter $\lambda$ . Further details can be found under rlasso.

Usage

lambdaCalculation(
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start =
    NULL, c = 1.1, gamma = 0.1),
  y = NULL,
  x = NULL
)
lambdaCalculation(
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start =
    NULL, c = 1.1, gamma = 0.1),
  y = NULL,
  x = NULL
)

Arguments

penalty

list with options for the calculation of the penalty.

c and gamma constants for the penalty with default c=1.1 and gamma=0.1
homoscedastic logical, if homoscedastic errors are considered (default FALSE). Option none is described below.
X.dependent.lambda if independent or dependent design matrix X is assumed for calculation of the parameter $\lambda$
numSim number of simulations for the X-dependent methods
lambda.start initial penalization value, compulsory for method "none"

y

residual which is used for calculation of the variance or the data-dependent loadings

x

matrix of regressor variables

Value

The functions returns a list with the penalty lambda which is the product of lambda0 and Ups0. Ups0 denotes either the variance (independent case) or the data-dependent loadings for the regressors. method gives the selected method for the calculation.

Shooting Lasso

Description

Implementation of the Shooting Lasso (Fu, 1998) with variable dependent penalization weights.

Usage

LassoShooting.fit(
  x,
  y,
  lambda,
  control = list(maxIter = 1000, optTol = 10^(-5), zeroThreshold = 10^(-6)),
  XX = NULL,
  Xy = NULL,
  beta.start = NULL
)
LassoShooting.fit(
  x,
  y,
  lambda,
  control = list(maxIter = 1000, optTol = 10^(-5), zeroThreshold = 10^(-6)),
  XX = NULL,
  Xy = NULL,
  beta.start = NULL
)

Arguments

`x`	matrix of regressor variables (`n` times `p` where `n` denotes the number of observations and `p` the number of regressors)
`y`	dependent variable (vector or matrix)
`lambda`	vector of length `p` of penalization parameters for each regressor
`control`	list with control parameters: `maxIter` maximal number of iterations, `optTol` tolerance for parameter precision, `zeroThreshold` threshold applied to the estimated coefficients for numerical issues.
`XX`	optional, precalculated matrix $t(X)*X$
`Xy`	optional, precalculated matrix $t(X)*y$
`beta.start`	start value for beta

Details

The function implements the Shooting Lasso (Fu, 1998) with variable dependent penalization. The arguments XX and Xy are optional and allow to use precalculated matrices which might improve performance.

Value

`coefficients`	estimated coefficients by the Shooting Lasso Algorithm
`coef.list`	matrix of coefficients from each iteration
`num.it`	number of iterations run

References

Fu, W. (1998). Penalized regressions: the bridge vs the lasso. Journal of Computational and Graphical Software 7, 397-416.

Multiple Testing Adjustment of p-values for S3 objects `rlassoEffects` and `lm`

Description

Multiple hypotheses testing adjustment of p-values from a high-dimensional linear model.

Usage

p_adjust(x, ...)

## S3 method for class 'rlassoEffects'
p_adjust(x, method = "RW", B = 1000, ...)

## S3 method for class 'lm'
p_adjust(x, method = "RW", B = 1000, test.index = NULL, ...)
p_adjust(x, ...)

## S3 method for class 'rlassoEffects'
p_adjust(x, method = "RW", B = 1000, ...)

## S3 method for class 'lm'
p_adjust(x, method = "RW", B = 1000, test.index = NULL, ...)

Arguments

`x`	an object of S3 class `rlassoEffects` or `lm`.
`...`	further arguments passed on to methods.
`method`	the method of p-value adjustment for multiple testing. Romano-Wolf stepdown ('`RW`') is chosen by default.
`B`	number of bootstrap repetitions (default 1000).
`test.index`	vector of integers, logicals or variables names indicating the position of coefficients (integer case), logical vector of length of the coefficients (TRUE or FALSE) or the coefficient names of x which should be tested simultaneously (only for S3 class `lm`). If missing, all coefficients are considered.

Details

Multiple testing adjustment is performed for S3 objects of class rlassoEffects and lm. Implemented methods for multiple testing adjustment are Romano-Wolf stepdown 'RW' (default) and the adjustment methods available in the p.adjust function of the stats package, including the Bonferroni, Bonferroni-Holm, and Benjamini-Hochberg corrections, see p.adjust.methods.

Objects of class rlassoEffects are constructed by rlassoEffects.

Value

A matrix with the estimated coefficients and the p-values that are adjusted according to the specified method.

Methods (by class)

rlassoEffects: rlassoEffects.
lm: lm.

References

J.P. Romano, M. Wolf (2005). Exact and approximate stepdown methods for multiple hypothesis testing. Journal of the American Statistical Association, 100(469), 94-108.

J.P. Romano, M. Wolf (2016). Efficient computation of adjusted p-values for resampling-based stepdown multiple testing. Statistics and Probability Letters, (113), 38-40.

A. Belloni, V. Chernozhukov, K. Kato (2015). Uniform post-selection inference for least absolute deviation regression and other Z-estimation problems. Biometrika, 102(1), 77-94.

Examples

library(hdm);
set.seed(1)
n = 100 #sample size
p = 25 # number of variables
s = 3 # nubmer of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
lasso.effect = rlassoEffects(X, y, index=c(1:20))
pvals.lasso.effect = p_adjust(lasso.effect, method = "RW", B = 1000)
ols = lm(y ~ -1 + X, data)
pvals.ols = p_adjust(ols, method = "RW", B = 1000)
pvals.ols = p_adjust(ols, method = "RW", B = 1000, test.index = c(1,2,5))
pvals.ols = p_adjust(ols, method = "RW", B = 1000, test.index = c(rep(TRUE, 5), rep(FALSE, p-5)))
library(hdm);
set.seed(1)
n = 100 #sample size
p = 25 # number of variables
s = 3 # nubmer of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
lasso.effect = rlassoEffects(X, y, index=c(1:20))
pvals.lasso.effect = p_adjust(lasso.effect, method = "RW", B = 1000)
ols = lm(y ~ -1 + X, data)
pvals.ols = p_adjust(ols, method = "RW", B = 1000)
pvals.ols = p_adjust(ols, method = "RW", B = 1000, test.index = c(1,2,5))
pvals.ols = p_adjust(ols, method = "RW", B = 1000, test.index = c(rep(TRUE, 5), rep(FALSE, p-5)))

Pension 401(k) data set

Description

Data set on financial wealth and 401(k) plan participation

Format

Dataframe with the following variables (amongst others):

p401: participation in 401(k)
e401: eligibility for 401(k)
a401: 401(k) assets
tw: total wealth (in US $)
tfa: financial assets (in US $)
net_tfa: net financial assets (in US $)
nifa: non-401k financial assets (in US $)
net_nifa: net non-401k financial assets
net_n401: net non-401(k) assets (in US $)
ira: individual retirement account (IRA)
inc: income (in US $)
age: age
fsize: family size
marr: married
pira: participation in IRA
db: defined benefit pension
hown: home owner
educ: education (in years)
male: male
twoearn: two earners
nohs, hs, smcol, col: dummies for education: no high-school, high-school, some college, college
hmort: home mortage (in US $)
hequity: home equity (in US $)
hval: home value (in US $)

Details

The sample is drawn from the 1991 Survey of Income and Program Participation (SIPP) and consists of 9,915 observations. The observational units are household reference persons aged 25-64 and spouse if present. Households are included in the sample if at least one person is employed and no one is self-employed. The data set was analysed in Chernozhukov and Hansen (2004) and Belloni et al. (2014) where further details can be found. They examine the effects of 401(k) plans on wealth using data from the Survey of Income and Program Participation using 401(k) eligibility as an instrument for 401(k) participation.

References

V. Chernohukov, C. Hansen (2004). The impact of 401(k) participation on the wealth distribution: An instrumental quantile regression analysis. The Review of Economic and Statistics 86 (3), 735–751.

A. Belloni, V. Chernozhukov, I. Fernandez-Val, and C. Hansen (2014). Program evaluation with high-dimensional data. Working Paper.

Examples

data(pension)
data(pension)

Methods for S3 object `rlassologit`

Description

Objects of class rlassologit are constructed by rlassologit. print.rlassologit prints and displays some information about fitted rlassologit objects. summary.rlassologit summarizes information of a fitted rlassologit object. predict.rlassologit predicts values based on a rlassologit object. model.matrix.rlassologit constructs the model matrix of a lasso object.

Usage

## S3 method for class 'rlassologit'
predict(object, newdata = NULL, type = "response", ...)

## S3 method for class 'rlassologit'
model.matrix(object, ...)

## S3 method for class 'rlassologit'
print(x, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassologit'
summary(object, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassologit'
predict(object, newdata = NULL, type = "response", ...)

## S3 method for class 'rlassologit'
model.matrix(object, ...)

## S3 method for class 'rlassologit'
print(x, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassologit'
summary(object, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)

Arguments

`object`	an object of class `rlassologit`
`newdata`	new data set for prediction
`type`	type of prediction required. The default ('response) is on the scale of the response variable; the alternative 'link' is on the scale of the linear predictors.
`...`	arguments passed to the print function and other methods
`x`	an object of class `rlassologit`
`all`	logical, indicates if coefficients of all variables (TRUE) should be displayed or only the non-zero ones (FALSE)
`digits`	significant digits in printout

Printing coefficients from S3 objects `rlassoEffects`

Description

Printing coefficients for class rlassoEffects

Usage

print_coef(x, ...)

## S3 method for class 'rlassoEffects'
print_coef(
  x,
  complete = TRUE,
  selection.matrix = FALSE,
  include.targets = TRUE,
  ...
)
print_coef(x, ...)

## S3 method for class 'rlassoEffects'
print_coef(
  x,
  complete = TRUE,
  selection.matrix = FALSE,
  include.targets = TRUE,
  ...
)

Arguments

`x`	an object of class `rlassoEffects`, usually a result of a call `rlassoEffect` or `rlassoEffects`.
`...`	further arguments passed to functions coef or print.
`complete`	general option of the function `coef`.
`selection.matrix`	if TRUE, a selection matrix is returned that indicates the selected variables from each auxiliary regression. Default is set to FALSE.
`include.targets`	if FALSE (by default) only the selected control variables are listed in the `selection.matrix`. If set to TRUE, the selection matrix will also indicate the selection of the target coefficients that are specified in the `rlassoEffects` call.

Details

Printing coefficients and selection matrix for S3 object rlassoEffects

Examples

library(hdm)
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 7 # number of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50), 
                             method = "double selection")
# without target coefficient estimates
print_coef(lasso.effect, selection.matrix = TRUE) 
# with target coefficient estimates
print_coef(lasso.effect, selection.matrix = TRUE, targets = TRUE) 
library(hdm)
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 7 # number of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50), 
                             method = "double selection")
# without target coefficient estimates
print_coef(lasso.effect, selection.matrix = TRUE) 
# with target coefficient estimates
print_coef(lasso.effect, selection.matrix = TRUE, targets = TRUE)

Methods for S3 object `rlasso`

Description

Objects of class rlasso are constructed by rlasso. print.rlasso prints and displays some information about fitted rlasso objects. summary.rlasso summarizes information of a fitted rlasso object. predict.rlasso predicts values based on a rlasso object. model.matrix.rlasso constructs the model matrix of a rlasso object.

Usage

## S3 method for class 'rlasso'
print(x, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlasso'
summary(object, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlasso'
model.matrix(object, ...)

## S3 method for class 'rlasso'
predict(object, newdata = NULL, ...)
## S3 method for class 'rlasso'
print(x, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlasso'
summary(object, all = TRUE, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlasso'
model.matrix(object, ...)

## S3 method for class 'rlasso'
predict(object, newdata = NULL, ...)

Arguments

`x`	an object of class `rlasso`
`all`	logical, indicates if coefficients of all variables (TRUE) should be displayed or only the non-zero ones (FALSE)
`digits`	significant digits in printout
`...`	arguments passed to the print function and other methods
`object`	an object of class `rlasso`
`newdata`	new data set for prediction. An optional data frame in which to look for variables with which to predict. If omitted, the fitted values are returned.

Methods for S3 object `rlassoEffects`

Description

Objects of class rlassoEffects are constructed by rlassoEffects. print.rlassoEffects prints and displays some information about fitted rlassoEffect objects. summary.rlassoEffects summarizes information of a fitted rlassoEffect object and is described at summary.rlassoEffects. confint.rlassoEffects extracts the confidence intervals. plot.rlassoEffects plots the estimates with confidence intervals.

Usage

## S3 method for class 'rlassoEffects'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoEffects'
confint(object, parm, level = 0.95, joint = FALSE, ...)

## S3 method for class 'rlassoEffects'
plot(
  x,
  joint = FALSE,
  level = 0.95,
  main = "",
  xlab = "coef",
  ylab = "",
  xlim = NULL,
  ...
)
## S3 method for class 'rlassoEffects'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoEffects'
confint(object, parm, level = 0.95, joint = FALSE, ...)

## S3 method for class 'rlassoEffects'
plot(
  x,
  joint = FALSE,
  level = 0.95,
  main = "",
  xlab = "coef",
  ylab = "",
  xlim = NULL,
  ...
)

Arguments

`x`	an object of class `rlassoEffects`
`digits`	significant digits in printout
`...`	arguments passed to the print function and other methods.
`object`	an object of class `rlassoEffects`
`parm`	a specification of which parameters are to be given confidence intervals among the variables for which inference was done, either a vector of numbers or a vector of names. If missing, all parameters are considered.
`level`	confidence level required
`joint`	logical, if `TRUE` joint confidence intervals are calculated.
`main`	an overall title for the plot
`xlab`	a title for the x axis
`ylab`	a title for the y axis
`xlim`	vector of length two giving lower and upper bound of x axis

Methods for S3 object `rlassoIV`

Description

Objects of class rlassoIV are constructed by rlassoIV. print.rlassoIV prints and displays some information about fitted rlassoIV objects. summary.rlassoIV summarizes information of a fitted rlassoIV object. confint.rlassoIV extracts the confidence intervals.

Usage

## S3 method for class 'rlassoIV'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIV'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIV'
confint(object, parm, level = 0.95, ...)
## S3 method for class 'rlassoIV'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIV'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIV'
confint(object, parm, level = 0.95, ...)

Arguments

`x`	an object of class `rlassoIV`
`digits`	significant digits in printout
`...`	arguments passed to the print function and other methods
`object`	An object of class `rlassoIV`
`parm`	a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.
`level`	confidence level required.

Methods for S3 object `rlassoIVselectX`

Description

Objects of class rlassoIVselectX are constructed by rlassoIVselectX. print.rlassoIVselectX prints and displays some information about fitted rlassoIVselectX objects. summary.rlassoIVselectX summarizes information of a fitted rlassoIVselectX object. confint.rlassoIVselectX extracts the confidence intervals.

Usage

## S3 method for class 'rlassoIVselectX'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIVselectX'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIVselectX'
confint(object, parm, level = 0.95, ...)
## S3 method for class 'rlassoIVselectX'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIVselectX'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIVselectX'
confint(object, parm, level = 0.95, ...)

Arguments

`x`	an object of class `rlassoIVselectX`
`digits`	significant digits in printout
`...`	arguments passed to the print function and other methods
`object`	an object of class `rlassoIVselectX`
`parm`	a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.
`level`	the confidence level required.

Methods for S3 object `rlassoIVselectZ`

Description

Objects of class rlassoIVselectZ are constructed by rlassoIVselectZ. print.rlassoIVselectZ prints and displays some information about fitted rlassoIVselectZ objects. summary.rlassoIVselectZ summarizes information of a fitted rlassoIVselectZ object. confint.rlassoIVselectZ extracts the confidence intervals.

Usage

## S3 method for class 'rlassoIVselectZ'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIVselectZ'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIVselectZ'
confint(object, parm, level = 0.95, ...)
## S3 method for class 'rlassoIVselectZ'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIVselectZ'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoIVselectZ'
confint(object, parm, level = 0.95, ...)

Arguments

`x`	an object of class `rlassoIVselectZ`
`digits`	significant digits in printout
`...`	arguments passed to the print function and other methods
`object`	an object of class `rlassoIVselectZ`
`parm`	a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.
`level`	confidence level required.

Methods for S3 object `rlassologitEffects`

Description

Objects of class rlassologitEffects are construced by rlassologitEffects or rlassologitEffect. print.rlassologitEffects prints and displays some information about fitted rlassologitEffect objects. summary.rlassologitEffects summarizes information of a fitted rlassologitEffects object. confint.rlassologitEffects extracts the confidence intervals.

Usage

## S3 method for class 'rlassologitEffects'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassologitEffects'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassologitEffects'
confint(object, parm, level = 0.95, joint = FALSE, ...)
## S3 method for class 'rlassologitEffects'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassologitEffects'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassologitEffects'
confint(object, parm, level = 0.95, joint = FALSE, ...)

Arguments

`x`	an object of class `rlassologitEffects`
`digits`	number of significant digits in printout
`...`	arguments passed to the print function and other methods
`object`	an object of class `rlassologitEffects`
`parm`	a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.
`level`	confidence level required.
`joint`	logical, if joint confidence intervals should be clalculated

Methods for S3 object `rlassoTE`

Description

Objects of class rlassoTE are constructed by rlassoATE, rlassoATET, rlassoLATE, rlassoLATET. print.rlassoTE prints and displays some information about fitted rlassoTE objects. summary.rlassoTE summarizes information of a fitted rlassoTE object. confint.rlassoTE extracts the confidence intervals.

Usage

## S3 method for class 'rlassoTE'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoTE'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoTE'
confint(object, parm, level = 0.95, ...)
## S3 method for class 'rlassoTE'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoTE'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'rlassoTE'
confint(object, parm, level = 0.95, ...)

Arguments

`x`	an object of class `rlassoTE`
`digits`	number of significant digits in printout
`...`	arguments passed to the print function and other methods
`object`	an object of class `rlassoTE`
`parm`	a specification of which parameters are to be given confidence intervals, either a vector of numbers or a vector of names. If missing, all parameters are considered.
`level`	confidence level required.

Methods for S3 object `tsls`

Description

Objects of class tsls are constructed by tsls. print.tsls prints and displays some information about fitted tsls objects. summary.tsls summarizes information of a fitted tsls object.

Usage

## S3 method for class 'tsls'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'tsls'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'tsls'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

## S3 method for class 'tsls'
summary(object, digits = max(3L, getOption("digits") - 3L), ...)

Arguments

`x`	an object of class `tsls`
`digits`	significant digits in printout
`...`	arguments passed to the print function and other methods
`object`	an object of class `tsls`

rlasso: Function for Lasso estimation under homoscedastic and heteroscedastic non-Gaussian disturbances

Description

The function estimates the coefficients of a Lasso regression with data-driven penalty under homoscedasticity and heteroscedasticity with non-Gaussian noise and X-dependent or X-independent design. The method of the data-driven penalty can be chosen. The object which is returned is of the S3 class rlasso.

Usage

rlasso(x, ...)

## S3 method for class 'formula'
rlasso(
  formula,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start =
    NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(numIter = 15, tol = 10^-5, threshold = NULL),
  ...
)

## S3 method for class 'character'
rlasso(
  x,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start =
    NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(numIter = 15, tol = 10^-5, threshold = NULL),
  ...
)

## Default S3 method:
rlasso(
  x,
  y,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start =
    NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(numIter = 15, tol = 10^-5, threshold = NULL),
  ...
)
rlasso(x, ...)

## S3 method for class 'formula'
rlasso(
  formula,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start =
    NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(numIter = 15, tol = 10^-5, threshold = NULL),
  ...
)

## S3 method for class 'character'
rlasso(
  x,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start =
    NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(numIter = 15, tol = 10^-5, threshold = NULL),
  ...
)

## Default S3 method:
rlasso(
  x,
  y,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(homoscedastic = FALSE, X.dependent.lambda = FALSE, lambda.start =
    NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(numIter = 15, tol = 10^-5, threshold = NULL),
  ...
)

Arguments

`x`	regressors (vector, matrix or object can be coerced to matrix)
`...`	further arguments (only for consistent defintion of methods)
`formula`	an object of class "formula" (or one that can be coerced to that class): a symbolic description of the model to be fitted in the form `y~x`
`data`	an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which `rlasso` is called.
`post`	logical. If `TRUE`, post-Lasso estimation is conducted.
`intercept`	logical. If `TRUE`, intercept is included which is not penalized.
`model`	logical. If `TRUE` (default), model matrix is returned.
`penalty`	list with options for the calculation of the penalty. `c` and `gamma` constants for the penalty with default `c=1.1` and `gamma=0.1` `homoscedastic` logical, if homoscedastic errors are considered (default `FALSE`). Option `none` is described below. `X.dependent.lambda` logical, `TRUE`, if the penalization parameter depends on the the design of the matrix `x`. `FALSE`, if independent of the design matrix (default). `numSim` number of simulations for the dependent methods, default=5000 `lambda.start` initial penalization value, compulsory for method "none"
`control`	list with control values. `numIter` number of iterations for the algorithm for the estimation of the variance and data-driven penalty, ie. loadings, `tol` tolerance for improvement of the estimated variances. `threshold` is applied to the final estimated lasso coefficients. Absolute values below the threshold are set to zero.
`y`	dependent variable (vector, matrix or object can be coerced to matrix)

Details

The function estimates the coefficients of a Lasso regression with data-driven penalty under homoscedasticity / heteroscedasticity and non-Gaussian noise. The options homoscedastic is a logical with FALSE by default. Moreover, for the calculation of the penalty parameter it can be chosen, if the penalization parameter depends on the design matrix (X.dependent.lambda=TRUE) or independent (default, X.dependent.lambda=FALSE). The default value of the constant c is 1.1 in the post-Lasso case and 0.5 in the Lasso case. A special option is to set homoscedastic to none and to supply a values lambda.start. Then this value is used as penalty parameter with independent design and heteroscedastic errors to weight the regressors. For details of the implementation of the Algorithm for estimation of the data-driven penalty, in particular the regressor-independent loadings, we refer to Appendix A in Belloni et al. (2012). When the option "none" is chosen for homoscedastic (together with lambda.start), lambda is set to lambda.start and the regressor-independent loadings und heteroscedasticity are used. The options "X-dependent" and "X-independent" under homoscedasticity are described in Belloni et al. (2013).

The option post=TRUE conducts post-lasso estimation, i.e. a refit of the model with the selected variables.

Value

rlasso returns an object of class rlasso. An object of class "rlasso" is a list containing at least the following components:

`coefficients`	parameter estimates
`beta`	parameter estimates (named vector of coefficients without intercept)
`intercept`	value of the intercept
`index`	index of selected variables (logical vector)
`lambda`	data-driven penalty term for each variable, product of lambda0 (the penalization parameter) and the loadings
`lambda0`	penalty term
`loadings`	loading for each regressor
`residuals`	residuals, response minus fitted values
`sigma`	root of the variance of the residuals
`iter`	number of iterations
`call`	function call
`options`	options
`model`	model matrix (if `model = TRUE` in function call)

References

A. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369-2429.

Examples

set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 3 # nubmer of variables with non-zero coefficients
X = Xnames = matrix(rnorm(n*p), ncol=p)
colnames(Xnames) <- paste("V", 1:p, sep="")
beta = c(rep(5,s), rep(0,p-s))
Y = X%*%beta + rnorm(n)
reg.lasso <- rlasso(Y~Xnames)
Xnew = matrix(rnorm(n*p), ncol=p)  # new X
colnames(Xnew) <- paste("V", 1:p, sep="")
Ynew =  Xnew%*%beta + rnorm(n)  #new Y
yhat = predict(reg.lasso, newdata = Xnew)
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 3 # nubmer of variables with non-zero coefficients
X = Xnames = matrix(rnorm(n*p), ncol=p)
colnames(Xnames) <- paste("V", 1:p, sep="")
beta = c(rep(5,s), rep(0,p-s))
Y = X%*%beta + rnorm(n)
reg.lasso <- rlasso(Y~Xnames)
Xnew = matrix(rnorm(n*p), ncol=p)  # new X
colnames(Xnew) <- paste("V", 1:p, sep="")
Ynew =  Xnew%*%beta + rnorm(n)  #new Y
yhat = predict(reg.lasso, newdata = Xnew)

Functions for estimation of treatment effects

Description

This class of functions estimates the average treatment effect (ATE), the ATE of the tretated (ATET), the local average treatment effects (LATE) and the LATE of the tretated (LATET). The estimation methods rely on immunized / orthogonal moment conditions which guarantee valid post-selection inference in a high-dimensional setting. Further details can be found in Belloni et al. (2014).

Usage

rlassoATE(x, ...)

## Default S3 method:
rlassoATE(x, d, y, bootstrap = "none", nRep = 500, ...)

## S3 method for class 'formula'
rlassoATE(formula, data, bootstrap = "none", nRep = 500, ...)

rlassoATET(x, ...)

## Default S3 method:
rlassoATET(x, d, y, bootstrap = "none", nRep = 500, ...)

## S3 method for class 'formula'
rlassoATET(formula, data, bootstrap = "none", nRep = 500, ...)

rlassoLATE(x, ...)

## Default S3 method:
rlassoLATE(
  x,
  d,
  y,
  z,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  never_takers = TRUE,
  ...
)

## S3 method for class 'formula'
rlassoLATE(
  formula,
  data,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  never_takers = TRUE,
  ...
)

rlassoLATET(x, ...)

## Default S3 method:
rlassoLATET(
  x,
  d,
  y,
  z,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  ...
)

## S3 method for class 'formula'
rlassoLATET(
  formula,
  data,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  ...
)
rlassoATE(x, ...)

## Default S3 method:
rlassoATE(x, d, y, bootstrap = "none", nRep = 500, ...)

## S3 method for class 'formula'
rlassoATE(formula, data, bootstrap = "none", nRep = 500, ...)

rlassoATET(x, ...)

## Default S3 method:
rlassoATET(x, d, y, bootstrap = "none", nRep = 500, ...)

## S3 method for class 'formula'
rlassoATET(formula, data, bootstrap = "none", nRep = 500, ...)

rlassoLATE(x, ...)

## Default S3 method:
rlassoLATE(
  x,
  d,
  y,
  z,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  never_takers = TRUE,
  ...
)

## S3 method for class 'formula'
rlassoLATE(
  formula,
  data,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  never_takers = TRUE,
  ...
)

rlassoLATET(x, ...)

## Default S3 method:
rlassoLATET(
  x,
  d,
  y,
  z,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  ...
)

## S3 method for class 'formula'
rlassoLATET(
  formula,
  data,
  bootstrap = "none",
  nRep = 500,
  post = TRUE,
  intercept = TRUE,
  always_takers = TRUE,
  ...
)

Arguments

`x`	exogenous variables
`...`	arguments passed, e.g. `intercept` and `post`
`d`	treatment variable (binary)
`y`	outcome variable / dependent variable
`bootstrap`	boostrap method which should be employed: 'none', 'Bayes', 'normal', 'wild'
`nRep`	number of replications for the bootstrap
`formula`	An object of class `Formula` of the form " y ~ x + d \| x" with y the outcome variable, d treatment variable, and x exogenous variables.
`data`	An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which `rlassoATE` is called.
`z`	instrumental variables (binary)
`post`	logical. If `TRUE`, post-lasso estimation is conducted.
`intercept`	logical. If `TRUE`, intercept is included which is not
`always_takers`	option to adapt to cases with (default) and without always-takers. If `FALSE`, the estimator is adapted to a setting without always-takers.
`never_takers`	option to adapt to cases with (default) and without never-takers. If `FALSE`, the estimator is adapted to a setting without never-takers.

Details

Details can be found in Belloni et al. (2014).

Value

Functions return an object of class rlassoTE with estimated effects, standard errors and individual effects in the form of a list.

References

A. Belloni, V. Chernozhukov, I. Fernandez-Val, and C. Hansen (2014). Program evaluation with high-dimensional data. Working Paper.

rigorous Lasso for Linear Models: Inference

Description

Estimation and inference of (low-dimensional) target coefficients in a high-dimensional linear model.

Usage

rlassoEffects(x, ...)

## Default S3 method:
rlassoEffects(
  x,
  y,
  index = c(1:ncol(x)),
  method = "partialling out",
  I3 = NULL,
  post = TRUE,
  ...
)

## S3 method for class 'formula'
rlassoEffects(
  formula,
  data,
  I,
  method = "partialling out",
  included = NULL,
  post = TRUE,
  ...
)

rlassoEffect(x, y, d, method = "double selection", I3 = NULL, post = TRUE, ...)
rlassoEffects(x, ...)

## Default S3 method:
rlassoEffects(
  x,
  y,
  index = c(1:ncol(x)),
  method = "partialling out",
  I3 = NULL,
  post = TRUE,
  ...
)

## S3 method for class 'formula'
rlassoEffects(
  formula,
  data,
  I,
  method = "partialling out",
  included = NULL,
  post = TRUE,
  ...
)

rlassoEffect(x, y, d, method = "double selection", I3 = NULL, post = TRUE, ...)

Arguments

`x`	matrix of regressor variables serving as controls and potential treatments. For `rlassoEffect` it contains only controls, for `rlassoEffects` both controls and potential treatments. For `rlassoEffects` it must have at least two columns.
`...`	parameters passed to the `rlasso` function.
`y`	outcome variable (vector or matrix)
`index`	vector of integers, logicals or variables names indicating the position (column) of variables (integer case), logical vector of length of the variables (TRUE or FALSE) or the variable names of `x` which should be used for inference / as treatment variables.
`method`	method for inference, either 'partialling out' (default) or 'double selection'.
`I3`	For the 'double selection'-method the logical vector `I3` has same length as the number of variables in `x`; indicates if variables (TRUE) should be included in any case to the model and they are exempt from selection. These variables should not be included in the `index`; hence the intersection with `index` must be the empty set. In the case of partialling out it is ignored.
`post`	logical, if post Lasso is conducted with default `TRUE`.
`formula`	An element of class `formula` specifying the linear model.
`data`	an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called.
`I`	An one-sided formula specifying the variables for which inference is conducted.
`included`	One-sided formula of variables which should be included in any case (only for method="double selection").
`d`	variable for which inference is conducted (treatment variable)

Details

The functions estimates (low-dimensional) target coefficients in a high-dimensional linear model. An application is e.g. estimation of a treatment effect $\alpha_0$ in a setting of high-dimensional controls. The user can choose between the so-called post-double-selection method and partialling-out. The idea of the double selection method is to select variables by Lasso regression of the outcome variable on the control variables and the treatment variable on the control variables. The final estimation is done by a regression of the outcome on the treatment effect and the union of the selected variables in the first two steps. In partialling-out first the effect of the regressors on the outcome and the treatment variable is taken out by Lasso and then a regression of the residuals is conducted. The resulting estimator for $\alpha_0$ is normal distributed which allows inference on the treatment effect. It presents a wrap function for rlassoEffect which does inference for a single variable.

Value

The function returns an object of class rlassoEffects with the following entries:

`coefficients`	vector with estimated values of the coefficients for each selected variable
`se`	standard error (vector)
`t`	t-statistic
`pval`	p-value
`samplesize`	sample size of the data set
`index`	index of the variables for which inference is performed

References

A. Belloni, V. Chernozhukov, C. Hansen (2014). Inference on treatment effects after selection among high-dimensional controls. The Review of Economic Studies 81(2), 608-650.

Examples

library(hdm); library(ggplot2)
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 3 # number of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
fm = paste("y ~", paste(colnames(X), collapse="+"))
fm = as.formula(fm)                 
lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50))
lasso.effect = rlassoEffects(fm, I = ~ X1 + X2 + X3 + X50, data=data)
print(lasso.effect)
summary(lasso.effect)
confint(lasso.effect)
plot(lasso.effect)
library(hdm); library(ggplot2)
set.seed(1)
n = 100 #sample size
p = 100 # number of variables
s = 3 # number of non-zero variables
X = matrix(rnorm(n*p), ncol=p)
colnames(X) <- paste("X", 1:p, sep="")
beta = c(rep(3,s), rep(0,p-s))
y = 1 + X%*%beta + rnorm(n)
data = data.frame(cbind(y,X))
colnames(data)[1] <- "y"
fm = paste("y ~", paste(colnames(X), collapse="+"))
fm = as.formula(fm)                 
lasso.effect = rlassoEffects(X, y, index=c(1,2,3,50))
lasso.effect = rlassoEffects(fm, I = ~ X1 + X2 + X3 + X50, data=data)
print(lasso.effect)
summary(lasso.effect)
confint(lasso.effect)
plot(lasso.effect)

Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments

Description

The function estimates a treatment effect in a setting with very many controls and very many instruments (even larger than the sample size).

Usage

rlassoIV(x, ...)

## Default S3 method:
rlassoIV(x, d, y, z, select.Z = TRUE, select.X = TRUE, post = TRUE, ...)

## S3 method for class 'formula'
rlassoIV(formula, data, select.Z = TRUE, select.X = TRUE, post = TRUE, ...)

rlassoIVmult(x, d, y, z, select.Z = TRUE, select.X = TRUE, ...)
rlassoIV(x, ...)

## Default S3 method:
rlassoIV(x, d, y, z, select.Z = TRUE, select.X = TRUE, post = TRUE, ...)

## S3 method for class 'formula'
rlassoIV(formula, data, select.Z = TRUE, select.X = TRUE, post = TRUE, ...)

rlassoIVmult(x, d, y, z, select.Z = TRUE, select.X = TRUE, ...)

Arguments

`x`	matrix of exogenous variables
`...`	arguments passed to the function `rlasso`
`d`	endogenous variable
`y`	outcome / dependent variable (vector or matrix)
`z`	matrix of instrumental variables
`select.Z`	logical, indicating selection on the instruments.
`select.X`	logical, indicating selection on the exogenous variables.
`post`	logical, wheter post-Lasso should be conducted (default=`TRUE`)
`formula`	An object of class `Formula` of the form " y ~ x + d \| x + z" with y the outcome variable, d endogenous variable, z instrumental variables, and x exogenous variables.
`data`	an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which `rlassoIV` is called.

Details

The implementation for selection on x and z follows the procedure described in Chernozhukov et al. (2015) and is built on 'triple selection' to achieve an orthogonal moment function. The function returns an object of S3 class rlassoIV. Moreover, it is wrap function for the case that selection should be done only with the instruments Z (rlassoIVselectZ) or with the control variables X (rlassoIVselectX) or without selection (tsls). Exogenous variables x are automatically used as instruments and added to the instrument set z.

Value

an object of class rlassoIV containing at least the following components:

`coefficients`	estimated parameter value
`se`	variance-covariance matrix

References

V. Chernozhukov, C. Hansen, M. Spindler (2015). Post-selection and post-regularization inference in linear models with many controls and instruments. American Economic Review: Paper & Proceedings 105(5), 486–490.

Examples

## Not run: 
data(EminentDomain)
z <- EminentDomain$logGDP$z # instruments
x <- EminentDomain$logGDP$x # exogenous variables
y <- EminentDomain$logGDP$y # outcome varialbe
d <- EminentDomain$logGDP$d # treatment / endogenous variable
lasso.IV.Z = rlassoIV(x=x, d=d, y=y, z=z, select.X=FALSE, select.Z=TRUE) 
summary(lasso.IV.Z)
confint(lasso.IV.Z)

## End(Not run)
## Not run: 
data(EminentDomain)
z <- EminentDomain$logGDP$z # instruments
x <- EminentDomain$logGDP$x # exogenous variables
y <- EminentDomain$logGDP$y # outcome varialbe
d <- EminentDomain$logGDP$d # treatment / endogenous variable
lasso.IV.Z = rlassoIV(x=x, d=d, y=y, z=z, select.X=FALSE, select.Z=TRUE) 
summary(lasso.IV.Z)
confint(lasso.IV.Z)

## End(Not run)

Instrumental Variable Estimation with Selection on the exogenous Variables by Lasso

Description

This function estimates the coefficient of an endogenous variable by employing Instrument Variables in a setting where the exogenous variables are high-dimensional and hence selection on the exogenous variables is required. The function returns an element of class rlassoIVselectX

Usage

rlassoIVselectX(x, ...)

## Default S3 method:
rlassoIVselectX(x, d, y, z, post = TRUE, ...)

## S3 method for class 'formula'
rlassoIVselectX(formula, data, post = TRUE, ...)
rlassoIVselectX(x, ...)

## Default S3 method:
rlassoIVselectX(x, d, y, z, post = TRUE, ...)

## S3 method for class 'formula'
rlassoIVselectX(formula, data, post = TRUE, ...)

Arguments

`x`	exogenous variables in the structural equation (matrix)
`...`	arguments passed to the function `rlasso`
`d`	endogenous variables in the structural equation (vector or matrix)
`y`	outcome or dependent variable in the structural equation (vector or matrix)
`z`	set of potential instruments for the endogenous variables.
`post`	logical. If `TRUE`, post-lasso estimation is conducted.
`formula`	An object of class `Formula` of the form " y ~ x + d \| x + z" with y the outcome variable, d endogenous variable, z instrumental variables, and x exogenous variables.
`data`	An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which `rlassoIVselectX` is called.

Details

The implementation is a special case of of Chernozhukov et al. (2015). The option post=TRUE conducts post-lasso estimation for the Lasso estimations, i.e. a refit of the model with the selected variables. Exogenous variables x are automatically used as instruments and added to the instrument set z.

Value

An object of class rlassoIVselectX containing at least the following components:

`coefficients`	estimated parameter vector
`vcov`	variance-covariance matrix
`residuals`	residuals
`samplesize`	sample size

References

Chernozhukov, V., Hansen, C. and M. Spindler (2015). Post-Selection and Post-Regularization Inference in Linear Models with Many Controls and Instruments American Economic Review, Papers and Proceedings 105(5), 486–490.

Examples

library(hdm)
data(AJR); y = AJR$GDP; d = AJR$Exprop; z = AJR$logMort
x = model.matrix(~ -1 + (Latitude + Latitude2 + Africa + 
                           Asia + Namer + Samer)^2, data=AJR)
dim(x)
  #AJR.Xselect = rlassoIV(x=x, d=d, y=y, z=z, select.X=TRUE, select.Z=FALSE)
  AJR.Xselect = rlassoIV(GDP ~ Exprop +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2 |
             logMort +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2,
             data=AJR, select.X=TRUE, select.Z=FALSE)
summary(AJR.Xselect)
confint(AJR.Xselect)
library(hdm)
data(AJR); y = AJR$GDP; d = AJR$Exprop; z = AJR$logMort
x = model.matrix(~ -1 + (Latitude + Latitude2 + Africa + 
                           Asia + Namer + Samer)^2, data=AJR)
dim(x)
  #AJR.Xselect = rlassoIV(x=x, d=d, y=y, z=z, select.X=TRUE, select.Z=FALSE)
  AJR.Xselect = rlassoIV(GDP ~ Exprop +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2 |
             logMort +  (Latitude + Latitude2 + Africa + Asia + Namer + Samer)^2,
             data=AJR, select.X=TRUE, select.Z=FALSE)
summary(AJR.Xselect)
confint(AJR.Xselect)

Instrumental Variable Estimation with Lasso

Description

This function selects the instrumental variables in the first stage by Lasso. First stage predictions are then used in the second stage as optimal instruments to estimate the parameter vector. The function returns an element of class rlassoIVselectZ

Usage

rlassoIVselectZ(x, ...)

## Default S3 method:
rlassoIVselectZ(x, d, y, z, post = TRUE, intercept = TRUE, ...)

## S3 method for class 'formula'
rlassoIVselectZ(formula, data, post = TRUE, intercept = TRUE, ...)
rlassoIVselectZ(x, ...)

## Default S3 method:
rlassoIVselectZ(x, d, y, z, post = TRUE, intercept = TRUE, ...)

## S3 method for class 'formula'
rlassoIVselectZ(formula, data, post = TRUE, intercept = TRUE, ...)

Arguments

`x`	exogenous variables in the structural equation (matrix)
`...`	arguments passed to the function `rlasso`.
`d`	endogenous variables in the structural equation (vector or matrix)
`y`	outcome or dependent variable in the structural equation (vector or matrix)
`z`	set of potential instruments for the endogenous variables. Exogenous variables serve as their own instruments.
`post`	logical. If `TRUE`, post-lasso estimation is conducted.
`intercept`	logical. If `TRUE`, intercept is included in the second stage equation.
`formula`	An object of class `Formula` of the form " y ~ x + d \| x + z" with y the outcome variable, d endogenous variable, z instrumental variables, and x exogenous variables.
`data`	An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which `rlassoIVselectZ` is called.

Details

The implementation follows the procedure described in Belloni et al. (2012). Option post=TRUE conducts post-lasso estimation, i.e. a refit of the model with the selected variables, to estimate the optimal instruments. The parameter vector of the structural equation is then fitted by two-stage least square (tsls) estimation.

Value

An object of class rlassoIVselectZ containing at least the following components:

`coefficients`	estimated parameter vector
`vcov`	variance-covariance matrix
`residuals`	residuals
`samplesize`	sample size
`selection.matrix`	matrix of selected variables in the first stage for each endogenous variable

References

D. Belloni, D. Chen, V. Chernozhukov and C. Hansen (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica 80 (6), 2369–2429.

rlassologit: Function for logistic Lasso estimation

Description

The function estimates the coefficients of a logistic Lasso regression with data-driven penalty. The method of the data-driven penalty can be chosen. The object which is returned is of the S3 class rlassologit

Usage

rlassologit(x, ...)

## S3 method for class 'formula'
rlassologit(
  formula,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(threshold = NULL),
  ...
)

## S3 method for class 'character'
rlassologit(
  x,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(threshold = NULL),
  ...
)

## Default S3 method:
rlassologit(
  x,
  y,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(threshold = NULL),
  ...
)
rlassologit(x, ...)

## S3 method for class 'formula'
rlassologit(
  formula,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(threshold = NULL),
  ...
)

## S3 method for class 'character'
rlassologit(
  x,
  data = NULL,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(threshold = NULL),
  ...
)

## Default S3 method:
rlassologit(
  x,
  y,
  post = TRUE,
  intercept = TRUE,
  model = TRUE,
  penalty = list(lambda = NULL, c = 1.1, gamma = 0.1/log(n)),
  control = list(threshold = NULL),
  ...
)

Arguments

`x`	regressors (matrix)
`...`	further parameters passed to glmnet
`formula`	an object of class 'formula' (or one that can be coerced to that class): a symbolic description of the model to be fitted in the form `y~x`.
`data`	an optional data frame, list or environment.
`post`	logical. If `TRUE`, post-lasso estimation is conducted.
`intercept`	logical. If `TRUE`, intercept is included which is not penalized.
`model`	logical. If `TRUE` (default), model matrix is returned.
`penalty`	list with options for the calculation of the penalty. `c` and `gamma` constants for the penalty.
`control`	list with control values. `threshold` is applied to the final estimated lasso coefficients. Absolute values below the threshold are set to zero.
`y`	dependent variable (vector or matrix)

Details

The function estimates the coefficients of a Logistic Lasso regression with data-driven penalty. The option post=TRUE conducts post-lasso estimation, i.e. a refit of the model with the selected variables.

Value

rlassologit returns an object of class rlassologit. An object of class rlassologit is a list containing at least the following components:

`coefficients`	parameter estimates
`beta`	parameter estimates (without intercept)
`intercept`	value of intercept
`index`	index of selected variables (logicals)
`lambda`	penalty term
`residuals`	residuals
`sigma`	root of the variance of the residuals
`call`	function call
`options`	options

References

Belloni, A., Chernozhukov and Y. Wei (2013). Honest confidence regions for logistic regression with a large number of controls. arXiv preprint arXiv:1304.3969.

Examples

## Not run: 
library(hdm)
## DGP
set.seed(2)
n <- 250
p <- 100
px <- 10
X <- matrix(rnorm(n*p), ncol=p)
beta <- c(rep(2,px), rep(0,p-px))
intercept <- 1
P <- exp(intercept + X %*% beta)/(1+exp(intercept + X %*% beta))
y <- rbinom(length(y), size=1, prob=P)
## fit rlassologit object
rlassologit.reg <- rlassologit(y~X)
## methods
summary(rlassologit.reg, all=F)
print(rlassologit.reg)
predict(rlassologit.reg, type='response')
X3 <- matrix(rnorm(n*p), ncol=p)
predict(rlassologit.reg, newdata=X3)

## End(Not run)
## Not run: 
library(hdm)
## DGP
set.seed(2)
n <- 250
p <- 100
px <- 10
X <- matrix(rnorm(n*p), ncol=p)
beta <- c(rep(2,px), rep(0,p-px))
intercept <- 1
P <- exp(intercept + X %*% beta)/(1+exp(intercept + X %*% beta))
y <- rbinom(length(y), size=1, prob=P)
## fit rlassologit object
rlassologit.reg <- rlassologit(y~X)
## methods
summary(rlassologit.reg, all=F)
print(rlassologit.reg)
predict(rlassologit.reg, type='response')
X3 <- matrix(rnorm(n*p), ncol=p)
predict(rlassologit.reg, newdata=X3)

## End(Not run)

rigorous Lasso for Logistic Models: Inference

Description

The function estimates (low-dimensional) target coefficients in a high-dimensional logistic model.

Usage

rlassologitEffects(x, ...)

## Default S3 method:
rlassologitEffects(x, y, index = c(1:ncol(x)), I3 = NULL, post = TRUE, ...)

## S3 method for class 'formula'
rlassologitEffects(formula, data, I, included = NULL, post = TRUE, ...)

rlassologitEffect(x, y, d, I3 = NULL, post = TRUE)
rlassologitEffects(x, ...)

## Default S3 method:
rlassologitEffects(x, y, index = c(1:ncol(x)), I3 = NULL, post = TRUE, ...)

## S3 method for class 'formula'
rlassologitEffects(formula, data, I, included = NULL, post = TRUE, ...)

rlassologitEffect(x, y, d, I3 = NULL, post = TRUE)

Arguments

`x`	matrix of regressor variables serving as controls and potential treatments. For `rlassologitEffect` it contains only controls, for `rlassologitEffects` both controls and potential treatments. For `rlassologitEffects` it must have at least two columns.
`...`	additional parameters
`y`	outcome variable
`index`	vector of integers, logical or names indicating the position (column) or name of variables of x which should be used as treatment variables.
`I3`	logical vector with same length as the number of controls; indicates if variables (TRUE) should be included in any case.
`post`	logical. If `TRUE`, post-Lasso estimation is conducted.
`formula`	An element of class `formula` specifying the linear model.
`data`	an optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which the function is called.
`I`	An one-sided formula specifying the variables for which inference is conducted.
`included`	One-sided formula of variables which should be included in any case.
`d`	variable for which inference is conducted (treatment variable)

Details

The functions estimates (low-dimensional) target coefficients in a high-dimensional logistic model. An application is e.g. estimation of a treatment effect $\alpha_0$ in a setting of high-dimensional controls. The function is a wrap function for rlassologitEffect which does inference for only one variable (d).

Value

The function returns an object of class rlassologitEffects with the following entries:

`coefficients`	estimated value of the coefficients
`se`	standard errors
`t`	t-statistics
`pval`	p-values
`samplesize`	sample size of the data set
`I`	index of variables of the union of the lasso regressions

References

A. Belloni, V. Chernozhukov, Y. Wei (2013). Honest confidence regions for a regression parameter in logistic regression with a loarge number of controls. cemmap working paper CWP67/13.

Examples

## Not run: 
library(hdm)
## DGP
set.seed(2)
n <- 250
p <- 100
px <- 10
X <- matrix(rnorm(n*p), ncol=p)
colnames(X) = paste("V", 1:p, sep="")
beta <- c(rep(2,px), rep(0,p-px))
intercept <- 1
P <- exp(intercept + X %*% beta)/(1+exp(intercept + X %*% beta))
y <- rbinom(n, size=1, prob=P)
xd <- X[,2:50]
d <- X[,1]
logit.effect <- rlassologitEffect(x=xd, d=d, y=y)
logit.effects <- rlassologitEffects(X,y, index=c(1,2,40))
logit.effects.f <- rlassologitEffects(y ~ X, I = ~ V1 + V2)

## End(Not run)
## Not run: 
library(hdm)
## DGP
set.seed(2)
n <- 250
p <- 100
px <- 10
X <- matrix(rnorm(n*p), ncol=p)
colnames(X) = paste("V", 1:p, sep="")
beta <- c(rep(2,px), rep(0,p-px))
intercept <- 1
P <- exp(intercept + X %*% beta)/(1+exp(intercept + X %*% beta))
y <- rbinom(n, size=1, prob=P)
xd <- X[,2:50]
d <- X[,1]
logit.effect <- rlassologitEffect(x=xd, d=d, y=y)
logit.effects <- rlassologitEffects(X,y, index=c(1,2,40))
logit.effects.f <- rlassologitEffects(y ~ X, I = ~ V1 + V2)

## End(Not run)

Summarizing rlassoEffects fits

Description

Summary method for class rlassoEffects

Usage

## S3 method for class 'rlassoEffects'
summary(object, ...)

## S3 method for class 'summary.rlassoEffects'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
## S3 method for class 'rlassoEffects'
summary(object, ...)

## S3 method for class 'summary.rlassoEffects'
print(x, digits = max(3L, getOption("digits") - 3L), ...)

Arguments

`object`	an object of class `rlassoEffects`, usually a result of a call to `rlassoEffects`
`...`	further arguments passed to or from other methods.
`x`	an object of class `summary.rlassoEffects`, usually a result of a call or `summary.rlassoEffects`
`digits`	the number of significant digits to use when printing.

Details

Summary of objects of class rlassoEffects

Two-Stage Least Squares Estimation (TSLS)

Description

The function does Two-Stage Least Squares Estimation (TSLS).

Usage

tsls(x, ...)

## Default S3 method:
tsls(x, d, y, z, intercept = TRUE, homoscedastic = TRUE, ...)

## S3 method for class 'formula'
tsls(formula, data, intercept = TRUE, homoscedastic = TRUE, ...)
tsls(x, ...)

## Default S3 method:
tsls(x, d, y, z, intercept = TRUE, homoscedastic = TRUE, ...)

## S3 method for class 'formula'
tsls(formula, data, intercept = TRUE, homoscedastic = TRUE, ...)

Arguments

`x`	exogenous variables
`...`	further arguments (only for consistent defintion of methods)
`d`	endogenous variables
`y`	outcome variable
`z`	instruments
`intercept`	logical, if intercept should be included
`homoscedastic`	logical, if homoscedastic (`TRUE`, default) or heteroscedastic erros (`FALSE`) should be calculated.
`formula`	An object of class `Formula` of the form " y ~ x + d \| x + z" with y the outcome variable, d endogenous variable, z instrumental variables, and x exogenous variables.
`data`	An optional data frame, list or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which `tsls` is called.

Details

The function computes tsls estimate (coefficients) and variance-covariance-matrix assuming homoskedasticity for outcome variable y where d are endogenous variables in structural equation, x are exogensous variables in structural equation and z are instruments. It returns an object of class tsls for which the methods print and summary are provided.

Value

The function returns a list with the following elements

`coefficients`	coefficients
`vcov`	variance-covariance matrix
`residuals`	outcome minus predicted values
`call`	function call
`samplesize`	sample size
`se`	standard error

Package 'hdm'

Help Index

hdm: High-Dimensional Metrics

Description

Details

Author(s)

References

AJR data set

Description

Format

Details

References

Examples

BLP data set

Description

Format

Details

References

Examples

Coefficients from S3 objects rlassoEffects

Description

Usage

Arguments

Details

Examples

Coefficients from S3 objects rlassoIV

Description

Usage

Arguments

Details

Value

Examples

Coefficients from S3 objects rlassoIVselectX

Description

Usage

Arguments

Details

Examples

Coefficients from S3 objects rlassoIVselectZ

Description

Usage

Arguments

Details

Examples

cps2012 data set

Description

Format

Details

References

Examples

Eminent Domain data set

Description

Format

Details

References

Examples

Growth data set

Description

Format

Details

Source

References

Examples

Function for Calculation of the penalty parameter

Description

Usage

Arguments

Value

Shooting Lasso

Description

Usage

Arguments

Details

Value

References

Multiple Testing Adjustment of p-values for S3 objects rlassoEffects and lm

Description

Usage

Arguments

Details

Coefficients from S3 objects `rlassoEffects`

Coefficients from S3 objects `rlassoIV`

Coefficients from S3 objects `rlassoIVselectX`

Coefficients from S3 objects `rlassoIVselectZ`

Multiple Testing Adjustment of p-values for S3 objects `rlassoEffects` and `lm`

Methods for S3 object `rlassologit`

Printing coefficients from S3 objects `rlassoEffects`

Methods for S3 object `rlasso`

Methods for S3 object `rlassoEffects`

Methods for S3 object `rlassoIV`

Methods for S3 object `rlassoIVselectX`

Methods for S3 object `rlassoIVselectZ`

Methods for S3 object `rlassologitEffects`

Methods for S3 object `rlassoTE`

Methods for S3 object `tsls`