Please feel free to email me c.shi7@lse.ac.uk if you have any comments.

Some Preprints

* indicates equal contribution

Liu, Z., Guo, X., Yang, Z., Lou, F., Zeng, L., Li, M., Qi, Q., Liu, Z., Han, Y., Cheng, D., Feng, X., Wang, H., Shi, C. and Zhang, L. Fin-R1: A Large Language Model for Financial Reasoning through Reinforcement Learning

Zhang, J., Wang, J., Shi, C., Piette, J., Zeng, D., Wu, Z. PyCFRL: A Python library for counterfactually fair offline reinforcement learning via sequential data preprocessing

Wang, J., Wen, Q., Zhang, Y., Yan, X. and Shi, C. A Two-armed Bandit Framework for A/B Testing.

Ye, K*., Zhou, H*., Zhu, J*., Quinzan, F. and Shi, C. Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning VRPO

Zhu, J., Zhou, X., Yao, J., Aminian, G., Rivasplata, O., Little, S., Li, L. and Shi, C. Semi-pessimistic Reinforcement Learning.

Shen, G., Dai, R., Wu, G., Luo, S., Shi, C. and Zhu, H. Deep Distributional Learning with Non-crossing Quantile Network

Shi, C. Statistical Inference in Reinforcement Learning: A Selective Survey

Wang, J., Shi, C., Piette, J., Loftus, J., Zeng, D. and Wu, Z. Counterfactually Fair Reinforcement Learning via Sequential Data Preprocessing

Liu, P., Shi, C. and Sun, W. Dual Active Learning for Reinforcement Learning from Human Feedback

Sun, K., Kong, L., Zhu, H. and Shi, C. ARMA-Design: Optimal Treatment Allocation Strategies for A/B Testing in Partially Observable Time Series Experiments ARMAdesign
slides

Hao, M*., Su, P*., Hu, L., Szabó, Z., Zhao, Q. and Shi, C. Off-policy Evaluation with Deeply-abstracted States . state-abstraction

Dai, R*., Wang, J*., Zhou, F*., Luo, S., Qin, Q., Shi, C., and Zhu, H. Causal Deepsets for Off-policy Evaluation under Spatial or Spatio-temporal Interferences

Yang, Y., Shi, C., Yao, F., Wang, S. and Zhu, H. Spatially Randomized Designs Can Enhance Policy Evaluation

Wang, D., Shi, C., Luo, S. and Sun, W. Pessimistic Causal Reinforcement Learning with Mediators for Confounded Offline Data

Hu, L*., Li, M*., Shi, C., Wu, Z. and Fryzlewicz, P. Doubly Inhomogeneous Reinforcement Learning. DIRL
slides presented at CMStatistics 2022.

Publications/accepted manuscripts

ICLR
Zhou, H*, Zhu, J*., Ye., K., Yang, Y., Xu, E. and Shi, C. (2026). Learn-to-Distance: Distance Learning for Detecting LLM-Generated Text
ICLR
Wu, X*., Wen, Q*., Zhang, Y., Zhu, H., Li, T. and Shi, C. (2026). Designing Time Series Experiments in A/B Testing with Transformer Reinforcement Learning
Brain
Lawrence, D., Avraham, G.,Yao J., Li, L., Shi, C., Starr, P.A. and Little, S. (2025) Cortico-basal oscillations index naturalistic movements during deep brain stimulation.
JASA
Ma, T*., Zhu, J*., Cai, H., Qi, Z., Chen, Y., Shi, C. and Laber, E. (2025+) Sequential Knockoffs for Variable Selection in Reinforcement Learning SEEK
STAT
Hu, L., Wang, J., Wu, Z. and Shi, C. (2025) Generalized Fitted Q-Iteration with Clustered Data.
JASA
Wang, W. and Shi, C. (2025+) From Authors to Reviewers: Leveraging Rankings to Improve Peer Review
Discussion of "Analysis of the ICML 2023 Ranking Data: Can Authors’ Opinions of Their Own Papers Assist Peer Review in Machine Learning?"
JASA
NeurIPS
Xu, E*., Ye, K*., Zhou, H*., Zhu, L., Quinzan, F. and Shi, C. (2025). Doubly Robust Alignment for Large Language Models DRPO4LLM
slides video presented at Tsinghua Statistics + AI Frontier Summit
NeurIPS
Zhou, H*., Zhu, J*., Su, P., Ye, K., Yang, Y., Gavioli-Akilagun SA. and Shi, C. (2025). AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees AdaDetectGPT
video presented at 狗熊会
NeurIPS
Wu, X*., Li, T*., Aminian, G., Behnamnia, A., Rabiee, H. and Shi, C. (2025). Pessimistic Data Integration for Policy Evaluation.
NeurIPS
TMLR
Yang, X., Shi, C., Luo, S., Wang, L. and Song, R. (2025). Doubly Robust Uncertainty Quantification for Quantile Treatment Effects in Sequential Decision Making. 2023 JSM Student Paper Award
ICML
ICML
ICML
ICML
Behnamnia, A., Aminian, G., Aghaei, A., Shi, C., Tan, V.Y., Rabiee, H. (2025). Log-Sum-Exponential Estimator for Off-Policy Evaluation and Learning (spotlight, top 2.6% of submissions).
HDSR
StatsUpAI Interest Group (2025). Statistics and AI: A Fireside Conversation.
Stat Sci
Uehara, M., Shi, C. and Kallus, N. (2025+). A Review of Off-Policy Evaluation in Reinforcement Learning.
AOS
Li, M., Shi, C., Wu, Z. and Fryzlewicz, P. (2025). Testing Stationarity and Change Point Detection in Reinforcement Learning CUSUM-RL
slides video presented at JSM 2022.
AOS
JASA
Bian, Z., Shi, C., Qi, Z. and Wang, L. (2025). Off-policy Evaluation in Doubly Inhomogeneous Environments 2FEOPE
NeurIPS
Yu, S., Fang, S., Peng, R., Qi, Z., Zhou, F. and Shi, C. (2024). Two-way Deconfounder for Off-policy Evaluation in Causal Reinforcement Learning Two-way-deconfounder
ICML
Li, T*., Shi, C*., Wen, Q., Sui, Y., Qin, Y., Lai, C. and Zhu, H. (2024). Combining Experimental and Historical Data for Policy Evaluation Data_Combination
J Math Psychol
JASA
JASA
JASA
JASA
Shi, C., Zhu, J., Shen, Y., Luo, S., Zhu, H. and Song, R. (2024). Off-Policy Confidence Interval Estimation with Confounded Markov Decision Process COPE
JASA
JRSS-B
Luo, S*., Yang, Y*., Shi, C*., Yao, F., Ye, J. and Zhu, H. (2024). Policy Evaluation for Temporal and/or Spatial Dependent Experiments STVCM
AISTATS
Zhu, J*., Wan, R*., Qi, Z., Luo, S. and Shi, C. (2024). Robust Offline Reinforcement Learning with Heavy-Tailed Rewards ROOM
NeurIPS
Uehara, M., Kiyohara, H., Bennett, A., Chernozhukov, V., Jiang, N., Kallus, N., Shi, C. and Sun, W. (2023) Future-Dependent Value-Based Off-Policy Evaluation in POMDPs (spotlight) future-dependent-ope
NeurIPS
JRSS-B
AOAS
Shi, C., Wan, R., Song, G., Luo, S., Zhu, H. and Song, R. (2023). A Multi-Agent Reinforcement Learning Framework for Off-Policy Evaluation in Two-sided Markets CausalMARL
JASA
Shi, C*., Wang, X*., Luo, S., Zhu, H., Ye, J. and Song, R. (2023). Dynamic Causal Effects Evaluation in A/B Testing with a Reinforcement Learning Framework CausalRL
slides video presented at Online Causal Inference Seminar
KDD
Wu, G., Song, G., Lv, X., Luo, S., Shi, C. and Zhu, H. (2023). DNet: Distributional Network for Distributional Individualized Treatment Effects.
ICML
Ge, L., Wang, J., Shi, C., Wu, Z. and Song, R. (2023). A Reinforcement Learning Framework for Dynamic Mediation Analysis MediationRL
2023 ICSA Student Paper Award
ICML
Yang, X., Zhu, J., Shi, C., Luo, S. and Song, R. (2023). An Instrumental Variable Approach to Confounded Off-Policy Evaluation IVMDP
ICML
CogSci
HDSR
STAT
Gao, Y., Shi, C. and Song, R. (2023). Deep Spectral Q-learning with Application to Mobile Health. 2022 JSM Student Paper Award
JMLR
AISTATS
AISTATS
Zhang, Y., Shi, C. and Luo, S. (2023). Conformal Off-Policy Prediction. R code COPP
JASA
Shi, C. and Li, L. (2022). Testing Mediation Effects Using Logic of Boolean Matrices. LOGAN
slides presented at JSM 2021.
JRSS-B
Shi, C., Zhang, S., Song, R. and Lu, W. (2022). Statistical Inference of the Value Function for Reinforcement Learning in Infinite Horizon Settings. SAVE
slides presented at ICSA 2019.
ICML
Shi, C*., Uehara, M*., Huang, J. and Jiang, N. (2022). A Minimax Learning Approach to Off-Policy Evaluation in Confounded Partially Observable Markov Decision Processes. (long talk, top 2%). Confounded-POMDP-OPE
video presented at ICML.
STAT
Li, L., Shi, C., Guo, T. and Jagust, W. (2022). Sequential Pathway Inference for Multimodal Neuroimaging Analysis. LOGAN
slides presented at JSM 2021.
JMLR
JMLR
Shi, C., Luo, S., Zhu, H. and Song, R. (2021). An Online Sequential Test for Qualitative Treatment Effects.
NeurIPS
Cai, H*. Shi, C*., Song, R. and Lu, W. (2021). Deep Jump Learning for Off-Policy Evaluation in Continuous Treatment Settings. 2021 ENAR Distinguished Student Paper Awards
DJL
video presented at NeurIPS.
IJCAI workshop
Wan, R*., Zhang, S*., Shi, C., Luo, S. and Song, R. (2021). Pattern Transfer Learning for Reinforcement Learning in Order Dispatching (best paper).
video presented at the workshop.
ICML
Shi, C*., Wan, R*., Chernozhukov, V. and Song, R. (2021). Deeply-Debiased Off-Policy Interval Estimation (long talk, top 3%). D2OPE
video presented at ICML.
JASA
Shi, C., Song, R., Lu, W. and Li. R. (2021). Statistical Inference for High-Dimensional Models via Recursive Online-Score Estimation (ROSE). R code for linear/logistic regression
AOS
JMLR
ICML
Shi, C., Wan, R., Song, R., Lu, W. and Leng, L. (2020). Does the Markov Decision Process Fit the Data: Testing for the Markov Property in Sequential Decision Making. TestMDP
slides video presented at CMStatistics 2020, ICML 2020, JSM 2020 and EYSM 2021.
JASA
AOS
Shi, C., Song, R., Chen, Z. and Li, R. (2019). Linear Hypothesis Testing for High Dimensional Generalized Linear Models. 2018 IMS travel award
R code for linear/ logistic/ Poisson regression
AOS
Shi, C., Lu, W., and Song, R. (2019). On Testing Conditional Qualitative Treatment Effects. 2017 IMS travel award
slides presented at JSM 2017
JMLR
JASA
Shi, C., Lu, W., and Song, R. (2018). A Massive Data Framework for M-estimators with Cubic-Rate.
JRSS-B
Shi, C., Song, R., Lu, W., and Fu, B. (2018). Maximin Projection Learning for Optimal Treatment Decision with Heterogeneous Individualized Treatment Effects. ITRLearn
slides presented at JSM 2016, poster presented at 2018 NCSU research symposium.
AOS
Shi, C., Fan, A., Song, R., and Lu, W. (2018). High-Dimensional A-Learning for Optimal Dynamic Treatment Regimes. ITRSelect
slides presented at ENAR 2016
JRSS-C
JRSS-B
EJS
J Stat Soft