Tutorial on Effective ITR-Learning

This tutorial provides the main code for implementing effective and robust Individualized Treatment Rule (ITR)-Learning as presented in the paper: Joowon Lee, Jared Davis Huling, Guanhua Chen [“An Effective Framework for Estimating Individualized Treatment Rules”] (To appear in NeurIPS 2024).

It guides users on how to implement the ITR-Learning approach introduced in the paper, specifically designed for continuous outcomes.

To begin, load the necessary source files.

source("methods.R")
source("itr_simu_data.R")

Next, to provide an example, we generate a simulated dataset using the generate_data function. The case argument specifies eight different scenarios: odd values represent randomized trials, while even values indicate observational studies.

The generated dat object includes training and test datasets. Each dataset contains a covariate data matrix (x), treatment vector (trt), and outcome vector (y). In the test dataset, true_opt_trt_test is an additional vector specifying the true optimal treatment.

dat <- generate_data(case = 6, n_train = 500, p = 60, no.seed = 1)
names(dat)

## [1] "x_training"        "trt_training"      "y_training"       
## [4] "x_test"            "trt_test"          "y_test"           
## [7] "true_opt_trt_test"

In this example, we set p = 60 and n_train = 500, resulting in an x_training matrix with dimensions 500, 60. Below are the first five rows and columns of x_training:

dim(dat$x_training)

## [1] 500  60

dat$x_training[1:5, 1:5]

##             [,1]        [,2]       [,3]       [,4]       [,5]
## [1,]  0.32950777 -0.34506926 -1.4859739  1.4417764  0.3307054
## [2,] -0.01619026 -1.62648398  1.2820115 -0.3886282  1.0861696
## [3,] -0.47815006 -1.34932438  1.8458675  0.2080957  0.7687478
## [4,] -0.70749516  0.06611383  0.3059743 -0.9781807 -1.9243468
## [5,] -1.04413463  0.41581373 -0.9744211  0.1348427 -1.3489353

Here are the first six elements of the treatment and outcome vectors in the training set. The treatment vector includes four unique values, which should range from 1 to K, where K represents the number of treatments.

head(dat$trt_training)

## [1] 4 4 1 4 3 1

head(dat$y_training)

## [1] -2.8702273 -2.8372866 -1.6412037 -1.5760288 -0.7790825 -5.4711601

unique(dat$trt_training)

## [1] 4 1 3 2

The all_evaluation function estimates the ITR using the proposed effective ITR-Learning method along with other benchmark approaches. This function can handle both real and simulated data. For real datasets, where the true optimal treatment is unknown, only estimated ITRs are provided. When simulated data is used, and true_opt_trt_test is available, accuracy is also reported. For more information on the benchmark methods, please refer to our paper.

Options for all_evaluation include:

true_opt_trt_test: Default is NA for real datasets. If the true optimal treatment is given, accuracy will be calculated.
randomized_trial: Set to TRUE if the data comes from a randomized trial. Default is FALSE.
binary: Set to TRUE if the outcome of interest is binary (not yet supported).

results <- all_evaluation(dat$x_training, dat$trt_training, dat$y_training,
               dat$x_test, dat$trt_test, dat$y_test, dat$true_opt_trt_test,
               randomized_trial = FALSE)
results$accuracy

##         AD       SABD   Proposed policytree  causalDML 
##     0.5152     0.6471     0.7617     0.4432     0.4637

The estimated ITR is available under estimated_ITR. For example, to view the estimated ITR for our proposed method:

results$estimated_ITR$Proposed[1:50]

##  [1] 2 2 4 3 2 3 3 1 3 3 3 4 4 3 3 2 3 3 3 2 3 4 3 1 2 4 2 2 3 1 2 4 4 4 2 3 3 2
## [39] 2 1 3 2 2 3 1 3 3 3 1 4

To obtain the estimated ITR and accuracy using our proposed method, implemented through the Projected Gradient Descent (PGD) algorithm, you can use the fit_PGD function. For optimal results, tuning the hyperparameters through cross-validation is recommended. Below are the adjustable hyperparameters:

iter0: Number of iterations for the PGD algorithm.
alpha: Step size of the PGD algorithm, with support for a diminishing step size.
pre_lambda1: Intermediate \(L_{1}\)-ball radius for the initial constraint set, used in AD-Learning.
lambda1: Final \(L_{1}\)-ball radius for the constraint set on the model parameter.
lambda2: Additional \(L_{2}\) penalty parameter.

results0 <- fit_PGD(dat$x_training, dat$trt_training, dat$y_training,
                    dat$x_test, dat$true_opt_trt_test,
                    iter0 = 1000, alpha = 1, pre_lambda1 = 10, lambda1 = 1, lambda2 = 0)
results0$accuracy

## Proposed_Constrained 
##               0.8175

If you’re using real data without known true optimal treatment information, leave true_opt_trt_test unspecified as below. In this case, the function will return only the estimated ITR without accuracy.

results_real <- all_evaluation(dat$x_training, dat$trt_training, dat$y_training,
               dat$x_test, dat$trt_test, dat$y_test,
               randomized_trial = FALSE)
names(results_real)

## [1] "AD"         "SABD"       "Proposed"   "policytree" "causalDML"

To directly fit the proposed method and obtain the estimated ITR, use the following:

proposed_fit <- ITR_Learning(dat$x_training, dat$trt_training, dat$y_training, binary = FALSE)
estimated_ITR2 <- predict_ITR(proposed_fit, dat$x_test)
estimated_ITR2[1:50]

##  [1] 2 2 4 3 2 3 3 1 3 3 3 4 4 3 3 2 3 3 3 2 3 4 3 1 2 4 2 2 3 1 2 4 4 4 2 3 3 2
## [39] 2 1 3 2 2 3 1 3 3 3 1 4

By following this guide, you’ve learned to implement our proposed ITR-Learning framework for effective and robust ITR estimation. If you have any questions, please feel free to contact me at ljw9510@gmail.com.

The projected gradient descent algorithm is based on the algorithm in Efficient Projections onto the L1-Ball for Learning in High Dimensions. The code for AD and SABD-Learning is based on the work from Stabilized direct learning for efficient estimation of individualized treatment rules.