Please feel free to email me c.shi7@lse.ac.uk if you have any comments.

Some Preprints

* indicates equal contribution

Wang, J., Wen, Q., Zhang, Y., Yan, X. and Shi, C. A Two-armed Bandit Framework for A/B Testing.

Ye, K*., Zhou, H*., Zhu, J*., Quinzan, F. and Shi, C. Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning Python module VRPO

Zhu, J., Zhou, X., Yao, J., Aminian, G., Rivasplata, O., Little, S., Li, L. and Shi, C. Semi-pessimistic Reinforcement Learning.

Shen, G., Dai, R., Wu, G., Luo, S., Shi, C. and Zhu, H. Deep Distributional Learning with Non-crossing Quantile Network

Shi, C. Statistical Inference in Reinforcement Learning: A Selective Survey

Wang, J., Shi, C., Piette, J., Loftus, J., Zeng, D. and Wu, Z. Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing

Liu, P., Shi, C. and Sun, W. Dual Active Learning for Reinforcement Learning from Human Feedback

Sun, K., Kong, L., Zhu, H. and Shi, C. ARMA-Design: Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments Python module ARMAdesign
slides

Hao, M*., Su, P*., Hu, L., Szabó, Z., Zhao, Q. and Shi, C. Off-policy Evaluation with Deeply-abstracted States . Python module state-abstraction

Dai, R*., Wang, J*., Zhou, F*., Luo, S., Qin, Q., Shi, C., and Zhu, H. Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

Yang, Y., Shi, C., Yao, F., Wang, S. and Zhu, H. Spatially Randomized Designs Can Enhance Policy Evaluation

Wang, D., Shi, C., Luo, S. and Sun, W. Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data

Ma, T*., Zhu, J*., Cai, H., Qi, Z., Chen, Y., Shi, C. and Laber, E. Sequential Knockoffs for Variable Selection in Reinforcement Learning (SEEK)

Hu, L*., Li, M*., Shi, C., Wu, Z. and Fryzlewicz, P. Doubly Inhomogeneous Reinforcement Learning. Python module DIRL
slides presented at CMStatistics 2022.

Publications/accepted manuscripts

STAT
Hu, L., Wang, J., Wu, Z. and Shi, C. (2025+) Generalized Fitted Q-Iteration with Clustered Data.
JASA
Wang, W. and Shi, C. (2025+) From Authors to Reviewers: Leveraging Rankings to Improve Peer Review.
Discussion of "Analysis of the ICML 2023 Ranking Data: Can Authors’ Opinions of Their Own Papers Assist Peer Review in Machine Learning?"
JASA
NeurIPS
Xu, E*., Ye, K*., Zhou, H*., Zhu, L., Quinzan, F. and Shi, C. (2025). Doubly Robust Alignment for Large Language Models. Python module DRPO4LLM
slides video presented at Tsinghua Statistics + AI Frontier Summit
NeurIPS
Zhou, H*., Zhu, J*., Su, P., Ye, K., Yang, Y., Gavioli-Akilagun SA. and Shi, C. (2025). AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees. Python module AdaDetectGPT
video presented at 狗熊会
NeurIPS
Wu, X*., Li, T*., Aminian, G., Behnamnia, A., Rabiee, H. and Shi, C. (2025). Pessimistic Data Integration for Policy Evaluation.
NeurIPS
Feng, J., Zhao, W., Wu, Z., Shi, C. and Yan, X. (2025). Beyond Average Value Function in Precision Medicine: Maximum Probability-Driven Reinforcement Learning for Survival Analysis.
TMLR
Yang, X., Shi, C., Luo, S., Wang, L. and Song, R. (2025). Quantile Off-Policy Evaluation via Deep Conditional Generative Learning. 2023 JSM Student Paper Award
ICML
Zhu, J*., Li, J*., Zhou, H., Lin, Y., Lin, Z., Shi, C. (2025). Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach. Python module CausalGraphCut
ICML
ICML
ICML
Behnamnia, A., Aminian, G., Aghaei, A., Shi, C., Tan, V.Y., Rabiee, H. (2025). Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning (spotlight, top 2.6% of submissions).
HDSR
StatsUpAI Interest Group (2025). Statistics and AI: A Fireside Conversation.
Stat Sci
Uehara, M., Shi, C. and Kallus, N. (2025+). A Review of Off-Policy Evaluation in Reinforcement Learning.
AOS
Li, M., Shi, C., Wu, Z. and Fryzlewicz, P. (2025). Testing Stationarity and Change Point Detection in Reinforcement Learning. Python module CUSUM-RL
slides video presented at JSM 2022.
AOS
Luo, L*., Shi, C*., Wang, J*, Wu, Z. and Li, L. (2025). Multivariate Dynamic Mediation Analysis under a Reinforcement Learning Framework. Python module MedtimeRL
JASA
Bian, Z., Shi, C., Qi, Z. and Wang, L. (2025). Off-policy Evaluation in Doubly Inhomogeneous Environments. Python module 2FEOPE
NeurIPS
Yu, S., Fang, S., Peng, R., Qi, Z., Zhou, F. and Shi, C. (2024). Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning. Python module Two-way-deconfounder
ICML
Li, T*., Shi, C*., Wen, Q., Sui, Y., Qin, Y., Lai, C. and Zhu, H. (2024). Combining Experimental and Historical Data for Policy Evaluation. Python module Data_Combination
J Math Psychol
Li, J., Shi, C., Li, L. and Collins, A. (2024). Dynamic noise estimation: A generalized method for modeling noise fluctuations in decision-making, Journal of Mathematical Psychology, 119, 102842. Python module dynamic_noise_estimation
JASA
Shi, C*., Qi, Z*., Wang, J. and Zhou, F. (2024). Value Enhancement of Reinforcement Learning via Efficient and Robust Trust Region Optimization. Python module VEPO
JASA
Shi, C., Zhou, Y. and Li, L. (2024). Testing Directed Acyclic Graph via Structural, Supervised and Generative Adversarial Learning. Python module SUGAR
slides presented at JSM 2021
JASA
Li, T*., Shi, C*., Lu, Z., Li, Y. and Zhu, H. (2024). Evaluating Dynamic Conditional Quantile Treatment Effects with Applications in Ridesharing. Python module CQSTVCM
JASA
Shi, C., Zhu, J., Shen, Y., Luo, S., Zhu, H. and Song, R. (2024). Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process. Python module COPE
JASA
Shi, C., Luo, S., Le, Y., Zhu, H. and Song, R. (2024). Statistically Efficient Advantage Learning for Offline Reinforcement Learning in Infinite Horizons. Python module SEAL
JRSS-B
Luo, S*., Yang, Y*., Shi, C*., Yao, F., Ye, J. and Zhu, H. (2024). Policy Evaluation for Temporal and/or Spatial Dependent Experiments. Python module STVCM
AISTATS
Zhu, J*., Wan, R*., Qi, Z., Luo, S. and Shi, C. (2024). Robust Offline Reinforcement Learning with Heavy-Tailed Rewards. Python module ROOM
NeurIPS
Uehara, M., Kiyohara, H., Bennett, A., Chernozhukov, V., Jiang, N., Kallus, N., Shi, C. and Sun, W. (2023) Future-Dependent Value-Based Off-Policy Evaluation in POMDPs (spotlight)
NeurIPS
Li, T*., Shi, C*., Wang, J., Zhou, F. and Zhu, H. (2023). Optimal Treatment Allocation for Efficient Policy Evaluation in Sequential Decision Making. Python module MDPdesign
JRSS-B
AOAS
Shi, C., Wan, R., Song, G., Luo, S., Zhu, H. and Song, R. (2023). A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets. Python module CausalMARL
JASA
Shi, C*., Wang, X*., Luo, S., Zhu, H., Ye, J. and Song, R. (2023). Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework. Python module CausalRL
slides video presented at Online Causal Inference Seminar
KDD
Wu, G., Song, G., Lv, X., Luo, S., Shi, C. and Zhu, H. (2023). DNet: Distributional Network for Distributional Individualized Treatment Effects.
ICML
Ge, L., Wang, J., Shi, C., Wu, Z. and Song, R. (2023). A Reinforcement Learning Framework for Dynamic Mediation Analysis. Python module MediationRL
2023 ICSA Student Paper Award
ICML
Yang, X., Zhu, J., Shi, C., Luo, S. and Song, R. (2023). An Instrumental Variable Approach to Confounded Off-Policy Evaluation. Python module IVMDP
ICML
CogSci
HDSR
STAT
Gao, Y., Shi, C. and Song, R. (2023). Deep Spectral Q-learning with Application to Mobile Health. 2022 JSM Student Paper Award
JMLR
Cai, H*., Shi, C*., Song, R. and Lu, W. (2023). Jump Interval-Learning for Individualized Decision Making with Continuous Treatments. R Package JQL
AISTATS
Zhou, Y., Qi, Z., Shi, C. and Li, L. (2023). Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach. Python module PBL
AISTATS
Zhang, Y., Shi, C. and Luo, S. (2023). Conformal Off-Policy Prediction. R code COPP
JASA
Shi, C. and Li, L. (2022). Testing Mediation Effects Using Logic of Boolean Matrices. Python module LOGAN
slides presented at JSM 2021.
JRSS-B
Shi, C., Zhang, S., Song, R. and Lu, W. (2022). Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings. Python module SAVE
slides presented at ICSA 2019.
ICML
Shi, C*., Uehara, M*., Huang, J. and Jiang, N. (2022). A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes. (long talk, top 2%). Python module Confounded-POMDP-OPE
video presented at ICML.
STAT
Li, L., Shi, C., Guo, T. and Jagust, W. (2022). Sequential Pathway Inference for Multimodal Neuroimaging Analysis. Python module LOGAN
slides presented at JSM 2021.
JMLR
Shi, C., Xu, T., Bergsma, W. and Li, L. (2021). Double Generative Adversarial Networks for Conditional Independence Testing. Python module dgcit
JMLR
Shi, C., Luo, S., Zhu, H. and Song, R. (2021). An Online Sequential Test for Qualitative Treatment Effects.
NeurIPS
Cai, H*. Shi, C*., Song, R. and Lu, W. (2021). Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings. 2021 ENAR Distinguished Student Paper Awards
Python module DJL
video presented at NeurIPS.
IJCAI workshop
Wan, R*., Zhang, S*., Shi, C., Luo, S. and Song, R. (2021). Pattern Transfer Learning for Reinforcement Learning in Order Dispatching (best paper).
video presented at the workshop.
ICML
Shi, C*., Wan, R*., Chernozhukov, V. and Song, R. (2021). Deeply-Debiased Off-Policy Interval Estimation (long talk, top 3%). Python module D2OPE
video presented at ICML.
JASA
Shi, C., Song, R., Lu, W. and Li. R. (2021). Statistical Inference for High-Dimensional Models via Recursive Online-Score Estimation (ROSE). R code for linear/logistic regression
AOS
JMLR
ICML
Shi, C., Wan, R., Song, R., Lu, W. and Leng, L. (2020). Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making, ICML. Python module TestMDP
slides video presented at CMStatistics 2020, ICML 2020, JSM 2020 and EYSM 2021.
JASA
AOS
Shi, C., Song, R., Chen, Z. and Li, R. (2019). Linear Hypothesis Testing for High Dimensional Generalized Linear Models.
2018 IMS travel award
R code for linear/logistic/Poisson regression
AOS
Shi, C., Lu, W., and Song, R. (2019). On Testing Conditional Qualitative Treatment Effects.
2017 IMS travel award
slides presented at JSM 2017.
JMLR
JASA
Shi, C., Lu, W., and Song, R. (2018). A Massive Data Framework for M-estimators with Cubic-Rate.
JRSS-B
Shi, C., Song, R., Lu, W., and Fu, B. (2018). Maximin Projection Learning for Optimal Treatment Decision with Heterogeneous Individualized Treatment Effects. R package ITRLearn
slides presented at JSM 2016, poster presented at 2018 NCSU research symposium.
AOS
Shi, C., Fan, A., Song, R., and Lu, W. (2018). High-Dimensional A-Learning for Optimal Dynamic Treatment Regimes. R package ITRSelect
slides presented at ENAR 2016
JRSS-C
JRSS-B
EJS
J Stat Soft