Publication

Please feel free to email me c.shi7@lse.ac.uk if you have any comments.

Some Preprints

* indicates equal contribution

Wang, J., Wen, Q., Zhang, Y., Yan, X. and Shi, C. A Two-armed Bandit Framework for A/B Testing .

Xu, E*., Ye, K*., Zhou, H*., Zhu, L., Quinzan, F. and Shi, C. Doubly Robust Alignment for Large Language Models Python module DRPO4LLM
slides video presented Tsinghua Statistics + AI Frontier Summit.

Ye, K*., Zhou, H*., Zhu, J*., Quinzan, F. and Shi, C. Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning Python module VRPO

Zhu, J., Zhou, X., Yao, J., Aminian, G., Rivasplata, O., Little, S., Li, L. and Shi, C. Semi-pessimistic Reinforcement Learning.

Shen, G., Dai, R., Wu, G., Luo, S., Shi, C. and Zhu, H. Deep Distributional Learning with Non-crossing Quantile Network

Shi, C. Statistical Inference in Reinforcement Learning: A Selective Survey

Wang, J., Shi, C., Piette, J., Loftus, J., Zeng, D. and Wu, Z. Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing

Liu, P., Shi, C. and Sun, W. Dual Active Learning for Reinforcement Learning from Human Feedback

Sun, K., Kong, L., Zhu, H. and Shi, C. ARMA-Design: Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments Python module ARMAdesign
slides

Hao, M*., Su, P*., Hu, L., Szabó, Z., Zhao, Q. and Shi, C. Off-policy Evaluation with Deeply-abstracted States . Python module state-abstraction

Dai, R*., Wang, J*., Zhou, F*., Luo, S., Qin, Q., Shi, C., and Zhu, H. Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

Yang, Y., Shi, C., Yao, F., Wang, S. and Zhu, H. Spatially Randomized Designs Can Enhance Policy Evaluation

Wang, D., Shi, C., Luo, S. and Sun, W. Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data

Ma, T*., Zhu, J*., Cai, H., Qi, Z., Chen, Y., Shi, C. and Laber, E. Sequential Knockoffs for Variable Selection in Reinforcement Learning (SEEK)

Yang, X., Shi, C., Luo, S., Wang, L. and Song, R. Quantile Off-Policy Evaluation via Deep Conditional Generative Learning
2023 JSM Student Paper Award

Hu, L*., Li, M*., Shi, C., Wu, Z. and Fryzlewicz, P. Doubly Inhomogeneous Reinforcement Learning. Python module DIRL
slides presented at CMStatistics 2022.

Wang, J., Qi, Z. and Shi, C. Blessing from Experts: Super Reinforcement Learning in Confounded Environments

Publications/accepted manuscripts

Zhu, J*., Li, J*., Zhou, H., Lin, Y., Lin, Z., Shi, C. (2025). Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach, ICML. Python module CausalGraphCut

Wen, Q*., Shi, C*., Yang, Y., Tang, N., Zhu, H. (2025). Unraveling the Interplay between Carryover Effects and Reward Autocorrelations in Switchback Experiments, ICML. Python module SwitchMDP.

Zhou, H., Hanna, J., Zhu, J., Yang, Y., Shi, C. (2025). Demystifying the Paradox of Importance Sampling with an Estimated History-Dependent Behavior Policy in Off-Policy Evaluation, ICML.

Behnamnia, A., Aminian, G., Aghaei, A., Shi, C., Tan, V.Y., Rabiee, H. (2025). Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning, ICML (spotlight, top 2.6% of submissions).

StatsUpAI Interest Group (2025). Statistics and AI: A Fireside Conversation, Harvard Data Science Review.

Uehara, M., Shi, C. and Kallus, N. (2025+). A Review of Off-Policy Evaluation in Reinforcement Learning, Statistical Science, accepted.

Li, M., Shi, C., Wu, Z. and Fryzlewicz, P. (2025+). Testing Stationarity and Change Point Detection in Reinforcement Learning (CUSUM-RL) , Annals of Statistics, accepted. Python module CUSUM-RL
slides video presented at JSM 2022.

Luo, L*., Shi, C*., Wang, J*, Wu, Z. and Li, L. (2024+). Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework, Annals of Statistics, accepted. Python module MedtimeRL

Yu, S., Fang, S., Peng, R., Qi, Z., Zhou, F. and Shi, C. (2024). Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning, NeurIPS. Python module Two-way-deconfounder

Bian, Z., Shi, C., Qi, Z. and Wang, L. (2024+). Off-policy Evaluation in Doubly Inhomogeneous Environments, Journal of the American Statistical Association, accepted. Python module 2FEOPE

Li, T*., Shi, C*., Wen, Q., Sui, Y., Qin, Y., Lai, C. and Zhu, H. (2024). Combining Experimental and Historical Data for Policy Evaluation, ICML. Python module Data_Combination

Li, J., Shi, C., Li, L. and Collins, A. (2024). Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making, Journal of Mathematical Psychology, 119, 102842. Python module dynamic_noise_estimation

Shi, C*., Qi, Z*., Wang, J. and Zhou, F. (2024). Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization, Journal of the American Statistical Association, 119, 2011-2025. Python module VEPO

Shi, C., Zhou, Y. and Li, L. (2024). Testing Directed Acyclic Graph via Structural, Supervised and Generative Adversarial Learning (SUGAR), Journal of the American Statistical Association, 119, 1833-1846. Python module SUGAR
slides presented at JSM 2021.

Li, T*., Shi, C*., Lu, Z., Li, Y. and Zhu, H. (2024). Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing, Journal of the American Statistical Association, 119, 1736-1750. Python module CQSTVCM

Shi, C., Zhu, J., Shen, Y., Luo, S., Zhu, H. and Song, R. (2024). Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process (COPE), Journal of the American Statistical Association, 119, 273-284. Python module COPE

Shi, C., Luo, S., Le, Y., Zhu, H. and Song, R. (2024). Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons (SEAL), Journal of the American Statistical Association, 119, 232-245. Python module SEAL

Luo, S*., Yang, Y*., Shi, C*., Yao, F., Ye, J. and Zhu, H. (2024). Policy Evaluation for Temporal and/or Spatial Dependent Experiments, Journal of the Royal Statistical Society, Series B, 86, 623–649. Python module STVCM.

Zhu, J*., Wan, R*., Qi, Z., Luo, S. and Shi, C. (2024). Robust Offline Reinforcement Learning with Heavy-Tailed Rewards, AISTATS. Python module ROOM.

Uehara, M., Kiyohara, H., Bennett, A., Chernozhukov, V., Jiang, N., Kallus, N., Shi, C. and Sun, W. (2023) Future-Dependent Value-Based Off-Policy Evaluation in POMDPs, NeurIPS (spotlight).

Li, T*., Shi, C*., Wang, J., Zhou, F. and Zhu, H. (2023). Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making, NeurIPS. Python module MDPdesign.

Zhou, Y., Shi, C., Li, L. and Yao, Q. (2023). Testing for the Markov Property in Time Series via Deep Conditional Generative Learning, Journal of the Royal Statistical Society, Series B, 85, 1204–1222. Python module markov_test

Shi, C., Wan, R., Song, G., Luo, S., Zhu, H. and Song, R. (2023). A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets, Annals of Applied Statistics, 17, 2701-2722. Python module CausalMARL

Shi, C*., Wang, X*., Luo, S., Zhu, H., Ye, J. and Song, R. (2023). Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework. Journal of the American Statistical Association, 108, 2059-2071. Python module CausalRL
slides video presented at Online Causal Inference Seminar.

Wu, G., Song, G., Lv, X., Luo, S., Shi, C. and Zhu, H. (2023). DNet: Distributional Network for Distributional Individualized Treatment Effects, KDD.

Ge, L., Wang, J., Shi, C., Wu, Z. and Song, R. (2023). A Reinforcement Learning Framework for Dynamic Mediation Analysis, ICML. Python module MediationRL
2023 ICSA Student Paper Award

Yang, X., Zhu, J., Shi, C., Luo, S. and Song, R. (2023). An Instrumental Variable Approach to Confounded Off-Policy Evaluation, ICML. Python module IVMDP

Wang, J., Shi, C. and Wu, Z. (2023). A Robust Test for the Stationarity Assumption in Sequential Decision Making, ICML. Python module Double-CUSUM-RL

Li, J., Shi, C., Li, L. and Collins, A. (2023). A Generalized Method for Dynamic Noise Inference in Modeling Sequential Decision-making, CogSci.

Shi, C. (2023). The Impact of David Cox’s Work and Leadership on My Research, Harvard Data Science Review.

Gao. Y., Shi, C. and Song, R. (2023). Deep Spectral Q-learning with Application to Mobile Health, STAT, 12, e564.
2022 JSM Student Paper Award

Cai. H*., Shi, C*., Song, R. and Lu, W. (2023). Jump Interval-Learning for Individualized Decision Making with Continuous Treatments, Journal of Machine Learning Research, 24, 1–92. R Package JQL

Zhou, Y., Qi, Z., Shi, C. and Li, L. (2023). Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach, AISTATS. Python module PBL

Zhang, Y., Shi, C. and Luo, S. (2023). Conformal Off-Policy Prediction (COPP), AISTATS. R code COPP

Shi, C. and Li, L. (2022). Testing Mediation Effects Using Logic of Boolean Matrices (LOGAN), Journal of the American Statistical Association, 117, 2014-2027. Python module LOGAN
slides presented at JSM 2021.

Shi, C., Zhang, S., Song, R. and Lu, W. (2022). Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings, Journal of the Royal Statistical Society, Series B, 84, 765-793. Python module SAVE
slides presented at ICSA 2019.

Shi, C*., Uehara, M*., Huang, J. and Jiang, N. (2022). A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes, ICML (long talk, top 2% of submissions). Python module Confounded-POMDP-OPE
video presented at ICML.

Li, L., Shi, C., Guo, T. and Jagust, W. (2022). Sequential Pathway Inference for Multimodal Neuroimaging Analysis, Stat, 11, e433. Python module LOGAN
slides presented at JSM 2021.

Shi, C., Xu, T., Bergsma, W. and Li, L. (2021) Double Generative Adversarial Networks for Conditional Independence Testing. Journal of Machine Learning Research, 22, 1-32. Python module dgcit

Shi, C., Luo, S., Zhu, H. and Song, R. (2021). An Online Sequential Test for Qualitative Treatment Effects. Journal of Machine Learning Research, 22, 1-51.

Cai, H*. Shi, C.*, Song, R. and Lu, W. (2021). Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings, NeurIPS.
2021 ENAR Distinguished Student Paper Awards Python module DJL
video presented at NeurIPS.

Wan, R*., Zhang, S*., Shi, C., Luo, S. and Song, R. (2021) Pattern Transfer Learning for Reinforcement Learning in Order Dispatching, IJCAI Reinforcement Learning for Intelligent Transportation Systems Workshop (best paper, spotlight).
video presented at the workshop.

Shi, C*., Wan, R*., Chernozhukov, V. and Song, R. (2021). Deeply-Debiased Off-Policy Interval Estimation, ICML (long talk, top 3% of submissions). Python module D2OPE
video presented at ICML.

Shi, C., Song, R., Lu, W. and Li. R. (2021). Statistical Inference for High-Dimensional Models via Recursive Online-Score Estimation (ROSE), Journal of the American Statistical Association, 116, 1307-1318. R code for linear/logistic regression

Shi, C., Song, R. and Lu, W. (2021). Concordance and Value Information Criteria for Optimal Treatment Decision (CIVIC), Annals of Statistics, 49, 49-75.

Shi, C., Lu, W. and Song, R. (2020). Breaking the Curse of Nonregularity with Subagging — Inference of the Mean Outcome under Optimal Treatment Regimes, Journal of Machine Learning Research, 21, 1−67. R and C sample code subagging2.cpp sb.r

Shi, C., Wan, R., Song, R., Lu, W. and Leng, L. (2020). Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making. ICML. Python module TestMDP
slides video presented at CMStatistics 2020, ICML 2020, JSM 2020 and EYSM 2021.

Shi, C., Lu, W. and Song, R. (2020). A Sparse Random Projection-based Test for Overall Qualitative Treatment Effects, Journal of the American Statistical Association, 115, 1201-1213.

Shi, C., Song, R., Chen, Z. and Li, R. (2019). Linear Hypothesis Testing for High Dimensional Generalized Linear Models. Annals of Statistics, 47, 2671-2703.
2018 IMS travel award
R code for linear/logistic/Poisson regression

Shi, C., Lu, W., and Song, R. (2019). On Testing Conditional Qualitative Treatment Effects. Annals of Statistics, 47, 2348-2377.
2017 IMS travel award
slides presented at JSM 2017.

Shi, C., Lu, W. and Song, R. (2019). Determining the Number of Latent Factors in Multirelational Learning, Journal of Machine Learning Research, 20, 1-38.

Shi, C., Lu, W., and Song, R. (2018). A Massive Data Framework for M-estimators with Cubic-Rate. Journal of the American Statistical Association, 113, 1698-1709.

Shi, C., Song, R., Lu, W., and Fu, B. (2018). Maximin Projection Learning for Optimal Treatment Decision with Heterogeneous Individualized Treatment Effects. Journal of the Royal Statistical Society, Series B, 80, 681-702. R package ITRLearn
slides presented at JSM 2016, poster presented at 2018 NCSU research symposium.

Shi, C., Fan, A., Song, R., and Lu, W. (2018). High-Dimensional A-Learning for Optimal Dynamic Treatment Regimes. Annals of Statistics, 46, 925-957. R package ITRSelect
slides presented at ENAR 2016

Shi, C., Song, R. and Lu, W. (2018). Discussion of “Optimal Treatment Allocations in Space and Time for On-Line Control of an Emerging Infectious Disease”, Journal of the Royal Statistical Society, Series C, 67, 743-789.

Shi, C., Song, R. and Lu, W. (2017). Discussion of “Random Projection Ensemble Classification”, Journal of the Royal Statistical Society, Series B, 79, 959-1035.

Shi, C., Song, R. and Lu, W. (2016). Robust Learning for Optimal Treatment Decision with NP-Dimensionality, Electronic Journal of Statistics, 10, 2894-2921.

Zhang, P., Qiu, Z. and Shi, C. (2016). simplexreg: An R Package for Regression Analysis of Proportional Data Using Simplex Distribution, Journal of Statistical Software, 71, 1-21. R package simplexreg