Please feel free to email me c.shi7@lse.ac.uk if you have any comments.
Some Preprints
* indicates equal contribution
Wang, J., Wen, Q., Zhang, Y., Yan, X. and Shi, C. A Two-armed Bandit Framework for A/B Testing.
Ye, K*., Zhou, H*., Zhu, J*., Quinzan, F. and Shi, C. Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning Python module VRPO
Zhu, J., Zhou, X., Yao, J., Aminian, G., Rivasplata, O., Little, S., Li, L. and Shi, C. Semi-pessimistic Reinforcement Learning.
Shen, G., Dai, R., Wu, G., Luo, S., Shi, C. and Zhu, H. Deep Distributional Learning with Non-crossing Quantile Network
Shi, C. Statistical Inference in Reinforcement Learning: A Selective Survey
Wang, J., Shi, C., Piette, J., Loftus, J., Zeng, D. and Wu, Z. Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing
Liu, P., Shi, C. and Sun, W. Dual Active Learning for Reinforcement Learning from
Human Feedback
Sun, K., Kong, L., Zhu, H. and Shi, C. ARMA-Design: Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments Python module ARMAdesign
slides
Hao, M*., Su, P*., Hu, L., Szabó, Z., Zhao, Q. and Shi, C. Off-policy Evaluation with Deeply-abstracted States
. Python module state-abstraction
Dai, R*., Wang, J*., Zhou, F*., Luo, S., Qin, Q., Shi, C., and Zhu, H. Causal Deepsets for Off-policy Evaluation under Spatial or
Spatio-temporal Interferences
Yang, Y., Shi, C., Yao, F., Wang, S. and Zhu, H. Spatially Randomized Designs Can Enhance Policy Evaluation
Wang, D., Shi, C., Luo, S. and Sun, W. Pessimistic Causal Reinforcement Learning with
Mediators for Confounded Offline Data
Ma, T*., Zhu, J*., Cai, H., Qi, Z., Chen, Y., Shi, C. and Laber, E. Sequential Knockoffs for Variable Selection in
Reinforcement Learning (SEEK)
Hu, L*., Li, M*., Shi, C., Wu, Z. and Fryzlewicz, P. Doubly Inhomogeneous Reinforcement Learning. Python module DIRL
slides presented at CMStatistics 2022.
Publications/accepted manuscripts
STAT
Hu, L., Wang, J., Wu, Z. and Shi, C. (2025+) Generalized Fitted Q-Iteration with Clustered Data.
JASA
Wang, W. and Shi, C. (2025+) From Authors to Reviewers: Leveraging Rankings to Improve Peer Review.
Discussion of "Analysis of the ICML 2023 Ranking Data: Can Authors’ Opinions of Their Own Papers Assist Peer Review in Machine Learning?"
NeurIPS
Xu, E*., Ye, K*., Zhou, H*., Zhu, L., Quinzan, F. and
Shi, C. (2025).
Doubly Robust Alignment for Large Language Models.
Python module DRPO4LLM slides video presented at Tsinghua Statistics + AI Frontier Summit
NeurIPS
Zhou, H*., Zhu, J*., Su, P., Ye, K., Yang, Y., Gavioli-Akilagun SA. and
Shi, C. (2025).
AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees.
Python module AdaDetectGPT video presented at 狗熊会
NeurIPS
Wu, X*., Li, T*., Aminian, G., Behnamnia, A., Rabiee, H. and Shi, C. (2025). Pessimistic Data Integration for Policy Evaluation.
NeurIPS
Feng, J., Zhao, W., Wu, Z., Shi, C. and Yan, X. (2025). Beyond Average Value Function in Precision Medicine: Maximum Probability-Driven Reinforcement Learning for Survival Analysis.
TMLR
Yang, X.,
Shi, C., Luo, S., Wang, L. and Song, R. (2025).
Quantile Off-Policy Evaluation via Deep Conditional Generative Learning. 2023
JSM Student Paper Award
ICML
Zhu, J*., Li, J*., Zhou, H., Lin, Y., Lin, Z.,
Shi, C. (2025).
Balancing Interference and Correlation in Spatial Experimental Designs: A Causal Graph Cut Approach.
Python module CausalGraphCut
ICML
Behnamnia, A., Aminian, G., Aghaei, A.,
Shi, C., Tan, V.Y., Rabiee, H. (2025).
Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning (
spotlight, top 2.6% of submissions).
AOS
Li, M.,
Shi, C., Wu, Z. and Fryzlewicz, P. (2025).
Testing Stationarity and Change Point Detection in Reinforcement Learning.
Python module CUSUM-RL slides video presented at JSM 2022.
NeurIPS
Yu, S., Fang, S., Peng, R., Qi, Z., Zhou, F. and
Shi, C. (2024).
Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning.
Python module Two-way-deconfounder
ICML
Li, T*.,
Shi, C*., Wen, Q., Sui, Y., Qin, Y., Lai, C. and Zhu, H. (2024).
Combining Experimental and Historical Data for Policy Evaluation.
Python module Data_Combination
JASA
Shi, C., Zhu, J., Shen, Y., Luo, S., Zhu, H. and Song, R. (2024).
Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process.
Python module COPE
JRSS-B
Luo, S*., Yang, Y*.,
Shi, C*., Yao, F., Ye, J. and Zhu, H. (2024).
Policy Evaluation for Temporal and/or Spatial Dependent Experiments.
Python module STVCM
AISTATS
Zhu, J*., Wan, R*., Qi, Z., Luo, S. and
Shi, C. (2024).
Robust Offline Reinforcement Learning with Heavy-Tailed Rewards.
Python module ROOM
NeurIPS
Uehara, M., Kiyohara, H., Bennett, A., Chernozhukov, V., Jiang, N., Kallus, N.,
Shi, C. and Sun, W. (2023)
Future-Dependent Value-Based Off-Policy Evaluation in POMDPs (
spotlight)
AOAS
Shi, C., Wan, R., Song, G., Luo, S., Zhu, H. and Song, R. (2023).
A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets.
Python module CausalMARL
JASA
Shi, C*., Wang, X*., Luo, S., Zhu, H., Ye, J. and Song, R. (2023).
Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework.
Python module CausalRL slides video presented at Online Causal Inference Seminar
KDD
Wu, G., Song, G., Lv, X., Luo, S.,
Shi, C. and Zhu, H. (2023).
DNet: Distributional Network for Distributional Individualized Treatment Effects.
ICML
Ge, L., Wang, J.,
Shi, C., Wu, Z. and Song, R. (2023).
A Reinforcement Learning Framework for Dynamic Mediation Analysis.
Python module MediationRL2023
ICSA Student Paper Award
STAT
Gao, Y.,
Shi, C. and Song, R. (2023).
Deep Spectral Q-learning with Application to Mobile Health. 2022
JSM Student Paper Award
AISTATS
Zhang, Y.,
Shi, C. and Luo, S. (2023).
Conformal Off-Policy Prediction.
R code COPP
STAT
Li, L.,
Shi, C., Guo, T. and Jagust, W. (2022).
Sequential Pathway Inference for Multimodal Neuroimaging Analysis.
Python module LOGAN slides presented at JSM 2021.
NeurIPS
Cai, H*.
Shi, C*., Song, R. and Lu, W. (2021).
Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings. 2021
ENAR Distinguished Student Paper Awards Python module DJL video presented at NeurIPS.
IJCAI workshop
Wan, R*., Zhang, S*.,
Shi, C., Luo, S. and Song, R. (2021).
Pattern Transfer Learning for Reinforcement Learning in Order Dispatching (
best paper).
video presented at the workshop.
ICML
Shi, C*., Wan, R*., Chernozhukov, V. and Song, R. (2021).
Deeply-Debiased Off-Policy Interval Estimation (
long talk, top 3%).
Python module D2OPE video presented at ICML.
ICML
Shi, C., Wan, R., Song, R., Lu, W. and Leng, L. (2020).
Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making,
ICML.
Python module TestMDP slides video presented at CMStatistics 2020, ICML 2020, JSM 2020 and EYSM 2021.
AOS
Shi, C., Song, R., Chen, Z. and Li, R. (2019).
Linear Hypothesis Testing for High Dimensional Generalized Linear Models.
2018
IMS travel award
R code for
linear/
logistic/
Poisson regression
AOS
Shi, C., Lu, W., and Song, R. (2019).
On Testing Conditional Qualitative Treatment Effects.
2017
IMS travel award slides presented at JSM 2017.
AOS
Shi, C., Fan, A., Song, R., and Lu, W. (2018).
High-Dimensional A-Learning for Optimal Dynamic Treatment Regimes.
R package ITRSelect slides presented at ENAR 2016