TFRE package¶

TFRE.TFRE module¶

A Tuning-free Robust and Efficient (TFRE) Approach to High-dimensional Regression

class TFRE.TFRE.TFRE[source]¶

A class used to perform TFRE regrssions

Returns:

model (TFRE.model class) – The class used to record the regression details.
TFRE_Lasso (TFRE.Lasso class) – The class used to record the results of the TFRE regrssion with Lasso penalty.
TFRE_scad (TFRE.SCAD class) – The class used to record the results of the TFRE regrssion with SCAD penalty. None if second_stage is not scad.
TFRE_mcp (TFRE.MCP class) – The class used to record the results of the TFRE regrssion with MCP penalty. None if second_stage is not mcp.

class Lasso(beta_TFRE_Lasso, tfre_lambda)[source]¶

a class used to record the results of the TFRE regrssion with Lasso penalty.

Returns:

beta_TFRE_Lasso (np.ndarray([p+1,])) – The estimated coefficient vector of the TFRE Lasso regression. The first element is the estimated intercept.
tfre_lambda (np.ndarray([1,])) – The estimated tuning parameter of the TFRE Lasso regression.

class MCP(Beta_TFRE_mcp, df_TFRE_mcp, eta_list, hbic, min_ind)[source]¶

a class used to record the results of the TFRE regrssion with MCP penalty. None if second_stage is not mcp.

Returns:

Beta_TFRE_mcp (np.ndarray([k,p+1])) – The estimated coefficient matrix of the TFRE MCP regression. The diminsion is k x (p+1) with the first column to be the intercepts, where k is the length of eta_list vector.
df_TFRE_mcp (np.ndarray([k,])) – The number of nonzero coefficients (intercept excluded) for each value in eta_list.
eta_list (np.ndarray([k,])) – The tuning parameter vector used in the TFRE MCP regressions.
hbic (np.ndarray([k,])) – A numerical vector of HBIC values for the TFRE MCP model corresponding to each value in eta_list.
eta_min (float) – The eta value which yields the smallest HBIC value in the TFRE MCP regression.
beta_TFRE_mcp_min (np.ndarray([p+1,])) – The estimated coefficient vector which employs eta_min as the eta value in the TFRE MCP regression.

class SCAD(Beta_TFRE_scad, df_TFRE_scad, eta_list, hbic, min_ind)[source]¶

a class used to record the results of the TFRE regrssion with SCAD penalty. None if second_stage is not scad.

Returns:

Beta_TFRE_scad (np.ndarray([k,p+1])) – The estimated coefficient matrix of the TFRE SCAD regression. The diminsion is k x (p+1) with the first column to be the intercepts, where k is the length of eta_list vector.
df_TFRE_scad (np.ndarray([k,])) – The number of nonzero coefficients (intercept excluded) for each value in eta_list.
eta_list (np.ndarray([k,])) – The tuning parameter vector used in the TFRE SCAD regressions.
hbic (np.ndarray([k,])) – A numerical vector of HBIC values for the TFRE SCAD model corresponding to each value in eta_list.
eta_min (float) – The eta value which yields the smallest HBIC value in the TFRE SCAD regression.
beta_TFRE_scad_min (np.ndarray([p+1,])) – The estimated coefficient vector which employs eta_min as the eta value in the TFRE SCAD regression.

coef(s)[source]¶

Extract coefficients from a fitted TFRE class.

Parameters:: s (str, optional) – Regression model to use for coefficient extraction. Should be one of "1st" and "2nd".
Returns:: The coefficient vector from the fitted TFRE class, with the first element as the intercept.
Return type:: np.ndarray([p+1,])

Notes

If second_stage = None, s cannot be "2nd". If second_stage = None and s = "2nd", the function will return the coefficient vector from the TFRE Lasso regression. If second_stage = "scad" or "mcp", and s = "2nd", the function will return the coefficient vector from the TFRE SCAD or MCP regression with the smallest HBIC.

Examples

>>> import numpy as np
>>> from TFRE import TFRE
>>> n = 100
>>> p = 400
>>> X = np.random.normal(0,1,size=(n,p))
>>> beta =  np.append([1.5,-1.25,1,-0.75,0.5],np.zeros(p-5))
>>> y = X.dot(beta) + np.random.normal(0,1,n)
>>>
>>> obj = TFRE()
>>> obj.fit(X,y,eta_list=np.arange(0.09,0.51,0.03))
>>>
>>> obj..coef("1st")[:10]
array([-0.12943468,  1.21390299, -0.82102807,  0.56632981, -0.20740154,
        0.        ,  0.        ,  0.        ,  0.        ,  0.        ])
>>> obj..coef("2nd")[:10]
array([-0.13552865,  1.63426996, -1.13200778,  1.1699545 , -0.47397631,
        0.17350995,  0.        ,  0.        ,  0.        ,  0.        ])

est_lambda(X=None, alpha0=0.1, const_lambda=1.01, times=500)[source]¶

Estimate the tuning parameter for a TFRE Lasso regression given the covariate matrix X.

Parameters:

X (np.ndarray([n,p])) – Input matrix of the regression.
alpha0 (float, optional, default = 0.1) – The level to estimate the tuning parameter.
const_lambda (float, optional, default = 1.01) – The constant to estimate the tuning parameter, should be greater than 1.
times (int, optional, default = 500) – The size of simulated samples to estimate the tuning parameter.

Returns:

The estimated tuning parameter of the TFRE Lasso regression given X.

Return type:

float

Examples

>>> import numpy as np
>>> from TFRE import TFRE
>>> n = 100
>>> p = 400
>>> X = np.random.normal(0,1,size=(n,p))
>>> obj = TFRE()
>>> obj.est_lambda(X)
[0.43150559039112646]

fit(X=None, y=None, alpha0=0.1, const_lambda=1.01, times=500, incomplete=True, const_incomplete=10, thresh=1e-06, maxin=100, maxout=20, second_stage='scad', a=3.7, eta_list=None, const_hbic=6)[source]¶

Fit a TFRE regression model with Lasso, SCAD or MCP regularization.

Parameters:

X (np.ndarray([n,p])) – Input matrix of the regression.
y (np.ndarray([n,])) – Response vector of the regression.
alpha0 (float, optional, default = 0.1) – The level to estimate the tuning parameter.
const_lambda (float, optional, default = 1.01) – The constant to estimate the tuning parameter, should be greater than 1.
times (int, optional, default = 500) – The size of simulated samples to estimate the tuning parameter.
incomplete (bool, optional, defaule = True) – If True, the Incomplete U-statistics resampling technique would be applied in computation. If False, the complete U-statistics would be used in computation.
const_incomplete (int, optional, default = 10) – The constant for the Incomplete U-statistics technique. If ` incomplete = TRUE`, const_incomplete x n samples will be randomly selected in the coefficient estimation.
thresh (float, optional, default = 1e-6) – Convergence threshold for QICD algorithm.
maxin (int, optional, default = 100) – Maximum number of inner coordiante descent iterations in QICD algorithm.
maxout (int, optional, default = 20) – Maximum number of outter Majoriaztion Minimization step (MM) iterations in QICD algorithm.
second_stage (str, optional, default = "scad") – Penalty function for the second stage model. One of "scad", "mcp" and "none".
a (float, optional, default = 3.7, suggested by Fan and Li (2001)) – an unknown parameter in SCAD and MCP penalty functions.
eta_list (float, optional, default = 3.7, suggested by Fan and Li (2001)) – A numerical vector for the tuning parameters to be used in the TFRE S CAD or MCP regression. Cannot be None if second_stage = "scad" or "mcp".
const_hbic (int, optional, default = 6) – The constant to be used in calculating HBIC in the TFRE SCAD regression.

Returns:

self – a fitted TFRE class with attributes “model”, “TFRE_Lasso”, “TFRE_scad” (if second_stage = "scad"), and “TFRE_mcp”(if second_stage = "mcp").

Return type:

object

class model(X, y, incomplete, second_stage)[source]¶

a class used to record the regression details.

Returns:

X (np.ndarray([n,p])) – Input matrix of the regression.
y (np.ndarray([n,])) – Response vector of the regression.
incomplete (bool) – If True, the Incomplete U-statistics resampling technique would be applied in computation. If False, the complete U-statistics would be used in computation.
second_stage (str) – Penalty function for the second stage model. One of "scad", "mcp" and "none".

plot()[source]¶

Plot the second stage model curve for a fitted TFRE class.

Returns:: This function plots the HBIC curve and the model size curve as a function of the eta values used, from a fitted TFRE SCAD or MCP model.
Return type:: Figure

Notes

In the output plot, the red line represents the HBIC curve as a function of eta values, the blue line represents the number of nonzero coefficients as a function of eta values, and the purple vertical dashed line denotes the model selected with the smallest HBIC.

This function cannot plot the object if second_stage = None.

Examples

>>> import numpy as np
>>> from TFRE import TFRE
>>> n = 100
>>> p = 400
>>> X = np.random.normal(0,1,size=(n,p))
>>> beta =  np.append([1.5,-1.25,1,-0.75,0.5],np.zeros(p-5))
>>> y = X.dot(beta) + np.random.normal(0,1,n)
>>>
>>> obj = TFRE()
>>> obj.fit(X,y,eta_list=np.arange(0.09,0.51,0.03))
>>> obj.plot()

predict(newX, s)[source]¶

Make predictions from a fitted TFRE class for new X values.

Parameters:

newX (np.ndarray([\(n_0\),p])) – Matrix of new values for X at which predictions are to be made.
s (str, optional) – Regression model to use for prediction. Should be one of "1st" and "2nd".

Returns:

The vector of predictions for the new X values given the fitted TFRE class.

Return type:

np.ndarray([\(n_0\),])

Notes

If second_stage = None, s cannot be "2nd". If second_stage = None and s = "2nd", the function will return the predictions based on the TFRE Lasso regression. If second_stage = "scad" or "mcp", and s = "2nd", the function will return the predictions based on the T FRE SCAD or MCP regression with the smallest HBIC.

Examples

>>> import numpy as np
>>> from TFRE import TFRE
>>> n = 100
>>> p = 400
>>> X = np.random.normal(0,1,size=(n,p))
>>> beta =  np.append([1.5,-1.25,1,-0.75,0.5],np.zeros(p-5))
>>> y = X.dot(beta) + np.random.normal(0,1,n)
>>>
>>> obj = TFRE()
>>> obj.fit(X,y,eta_list=np.arange(0.09,0.51,0.03))
>>>
>>> newX = np.random.normal(0,1,size=(10,p))
>>> obj.predict(newX,"2nd")
array([ 2.61684897,  2.66548778, -0.13456993, -0.67466848,  3.92941648,
        1.21428428, -1.66033086, -2.13238483,  0.95340816, -2.32122001])

TFRE package¶

TFRE.TFRE module¶

Table of Contents

Previous topic

This Page