Publications

Research Overview

My research is primarily focused on Natural Language Processing and Machine Learning, particularly in the area of Dialogue and Structured Prediction. As AI challenges become more complex, I’m keen on developing modularized AI systems tailored for future large-scale projects that demand Human-AI Teaming, which requires complex control, reasoning, collaboration, and adaptation.

Papers

(See full list in Google Scholar)

2026

EACL

Zilong Li and Jie Cao. 2026. Translation via Annotation: A Computational Study of Translating Classical Chinese into Japanese. In Vera Demberg, Kentaro Inui, and Lluís Marquez, editors, Proceedings of the 19th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6031–6045, Rabat, Morocco, March. Association for Computational Linguistics. BibTeX

@inproceedings{li2025translation,
  title = {Translation via Annotation: A Computational Study of Translating Classical {C}hinese into {J}apanese},
  author = {Li, Zilong and Cao, Jie},
  editor = {Demberg, Vera and Inui, Kentaro and Marquez, Llu{\'i}s},
  booktitle = {Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 1: Long Papers)},
  month = mar,
  year = {2026},
  address = {Rabat, Morocco},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2026.eacl-long.285/},
  pages = {6031--6045},
  isbn = {979-8-89176-380-7}
}

| URL

2025

arXiv

Zhichao Xu, Shengyao Zhuang, Xueguang Ma, Bingsen Chen, Yijun Tian, Fengran Mo, Jie Cao, and Vivek Srikumar. 2025. Rethinking On-policy Optimization for Query Augmentation. arXiv preprint arXiv:2510.17139, October. BibTeX

@article{xu2025rethinking,
  title = {Rethinking On-policy Optimization for Query Augmentation},
  author = {Xu, Zhichao and Zhuang, Shengyao and Ma, Xueguang and Chen, Bingsen and Tian, Yijun and Mo, Fengran and Cao, Jie and Srikumar, Vivek},
  journal = {arXiv preprint arXiv:2510.17139},
  year = {2025},
  month = oct,
  url = {https://arxiv.org/abs/2510.17139}
}

| URL

TSAR

Cuong Huynh and Jie Cao. 2025. OUNLP at TSAR 2025 Shared Task Multi-Round Text Simplifier via Code Generation. In Matthew Shardlow, Fernando Alva-Manchego, Kai North, Regina Stodden, Horacio Saggion, Nouran Khallaf, and Akio Hayakawa, editors, Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025), pages 223–230, Suzhou, China, November. Association for Computational Linguistics. BibTeX

@inproceedings{huynh-cao-2025-ounlp,
  title = {{OUNLP} at {TSAR} 2025 Shared Task Multi-Round Text Simplifier via Code Generation},
  author = {Huynh, Cuong and Cao, Jie},
  editor = {Shardlow, Matthew and Alva-Manchego, Fernando and North, Kai and Stodden, Regina and Saggion, Horacio and Khallaf, Nouran and Hayakawa, Akio},
  booktitle = {Proceedings of the Fourth Workshop on Text Simplification, Accessibility and Readability (TSAR 2025)},
  month = nov,
  year = {2025},
  address = {Suzhou, China},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2025.tsar-1.19/},
  pages = {223--230},
  isbn = {979-8-89176-176-6}
}

| PDF | URL

J. Hydrol.

Jiaorui Zhang, Haowen Yue, Milad Basirifard, Jie Cao, and Tiantian Yang. 2025. A Mamba-type of Deep State Space Model for Reservoir Release Simulation with a Large-Scale Verification over 441 Dams across CONUS. Journal of Hydrology:134145. BibTeX

@article{ZHANG2025134145,
  title = {A {{Mamba-type}} of Deep State Space Model for Reservoir Release Simulation with a Large-Scale Verification over 441 Dams across {{CONUS}}},
  author = {Zhang, Jiaorui and Yue, Haowen and Basirifard, Milad and Cao, Jie and Yang, Tiantian},
  year = {2025},
  journal = {Journal of Hydrology},
  pages = {134145},
  issn = {0022-1694},
  doi = {10.1016/j.jhydrol.2025.134145},
  url = {https://www.sciencedirect.com/science/article/pii/S0022169425014830},
  keywords = {Large scale,Release simulation,SHAP,Structured State Space Model,Water management}
}

| URL

EMNLP

Jayanth Krishna Chundru, Rudrashis Poddar, Jie Cao, and Tianyu Jiang. 2025. Do LLMs Encode Frame Semantics? Evidence from Frame Identification. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, November. BibTeX

@inproceedings{jayanth2025emnlp,
  title = {Do LLMs Encode Frame Semantics? Evidence from Frame Identification},
  booktitle = {Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  author = {Chundru, Jayanth Krishna and Poddar, Rudrashis and Cao, Jie and Jiang, Tianyu},
  year = {2025},
  month = nov,
  publisher = {Association for Computational Linguistics},
  venue = {Suzhou, China},
  url = {https://aclanthology.org/2025.emnlp-main.1499/}
}

| PDF | URL

ICCV Workshop

Songkun Yan, Zhi Li, Siyu Zhu, Yixin Wen, Mofan Zhang, Mengye Chen, Jie Cao, and Yang Hong. 2025. AQUAH: Automatic Quantification and Unified Agent in Hydrology. 1st Workshop on Sustainability with Earth observation and AI (co-located with ICCV), 2025 (To Appear). BibTeX

@article{songkun2025sea,
  title = {AQUAH: Automatic Quantification and Unified Agent in Hydrology},
  author = {Yan, Songkun and Li, Zhi and Zhu, Siyu and Wen, Yixin and Zhang, Mofan and Chen, Mengye and Cao, Jie and Hong, Yang},
  journal = {1st Workshop on Sustainability with Earth observation and AI (co-located with ICCV), 2025 (To Appear)},
  url = {https://arxiv.org/abs/2508.02936},
  year = {2025}
}

| PDF | URL

EDM

Jannatun Naim, Jie Cao, Fareen Tasneem, Jennifer Jacobs, Brent Milne, James Martin, and Tamara Sumner. 2025. Towards Actionable Pedagogical Feedback: A Multi- Perspective Analysis of Mathematics Teaching and Tutoring Dialogue. In Proceedings of the 18th International Conference on Educational Data Mining, pages 328–341. International Educational Data Mining Society, July. BibTeX

@inproceedings{naim2025edm,
  title = {Towards Actionable Pedagogical Feedback: A Multi- Perspective Analysis of Mathematics Teaching and Tutoring Dialogue},
  booktitle = {Proceedings of the 18th International Conference on Educational Data Mining},
  author = {Naim, Jannatun and Cao, Jie and Tasneem, Fareen and Jacobs, Jennifer and Milne, Brent and Martin, James and Sumner, Tamara},
  year = {2025},
  month = jul,
  pages = {328--341},
  publisher = {International Educational Data Mining Society},
  doi = {10.5281/zenodo.15870177},
  venue = {Palermo, Italy},
  url = {https://educationaldatamining.org/EDM2025/proceedings/2025.EDM.long-papers.201/index.html}
}

| PDF | URL

arXiv

Yingheng Tang, Wenbin Xu, Jie Cao, Weilu Gao, Steve Farrell, Benjamin Erichson, Michael W Mahoney, Andy Nonaka, and Zhi Yao. 2025. MatterChat: A Multi-Modal LLM for Material Science. arXiv preprint arXiv:2502.13107. BibTeX

@article{tang2025matterchat,
  title = {MatterChat: A Multi-Modal LLM for Material Science},
  author = {Tang, Yingheng and Xu, Wenbin and Cao, Jie and Gao, Weilu and Farrell, Steve and Erichson, Benjamin and Mahoney, Michael W and Nonaka, Andy and Yao, Zhi},
  journal = {arXiv preprint arXiv:2502.13107},
  year = {2025}
}

| PDF

COLING

Jie Cao, Abhijit Suresh, Jennifer Jacobs, Charis Clevenger, Amanda Howard, Chelsea Brown, Brent Milne, Tom Fischaber, Tamara Sumner, and James H. Martin. 2025. Enhancing Talk Moves Analysis in Mathematics Tutoring through Classroom Teaching Discourse. In The 31st International Conference on Computational Linguistics. BibTeX

@inproceedings{talkmove-coling-2024,
  title = {Enhancing Talk Moves Analysis in Mathematics Tutoring through Classroom
      Teaching Discourse},
  author = {Cao, Jie and Suresh, Abhijit and Jacobs, Jennifer and Clevenger, Charis and Howard, Amanda and Brown, Chelsea and Milne, Brent and Fischaber, Tom and Sumner, Tamara and Martin, James H.},
  booktitle = {The 31st International Conference on Computational Linguistics},
  year = {2025}
}

| PDF

2024

IEEE TVCG

Zhimin Li, Shusen Liu, Xin Yu, Kailkhura Bhavya, Jie Cao, Diffenderfer James Daniel, Peer-Timo Bremer, and Valerio Pascucci. 2024. "Understanding Robustness Lottery": A Geometric Visual Comparative Analysis of Neural Network Pruning Approaches. IEEE Transactions on Visualization and Computer Graphics. BibTeX

@article{vis-pruning2024,
  title = {"Understanding Robustness Lottery": A Geometric Visual Comparative
      Analysis of Neural Network Pruning Approaches},
  author = {Li, Zhimin and Liu, Shusen and Yu, Xin and Bhavya, Kailkhura and Cao, Jie and Daniel, Diffenderfer James and Bremer, Peer-Timo and Pascucci, Valerio},
  url = {https://doi.org/10.1109/tvcg.2024.3514996},
  journal = {IEEE Transactions on Visualization and Computer Graphics},
  year = {2024}
}

| PDF | URL

L@S

Baptiste Moreau-Pernet, Yu Tian, Sandra Sawaya, Peter Foltz, Jie Cao, Brent Milne, and Thomas Christie. 2024. Classifying Tutor Discursive Moves at Scale in Mathematics Classrooms with Large Language Models. In Proceedings of the Eleventh ACM Conference on Learning @ Scale, pages 361–365. Association for Computing Machinery. BibTeX

@inproceedings{talkmove-llm-2024,
  author = {Moreau-Pernet, Baptiste and Tian, Yu and Sawaya, Sandra and Foltz, Peter and Cao, Jie and Milne, Brent and Christie, Thomas},
  title = {Classifying Tutor Discursive Moves at Scale in Mathematics Classrooms with Large Language Models},
  year = {2024},
  isbn = {9798400706332},
  publisher = {Association for Computing Machinery},
  url = {https://doi.org/10.1145/3657604.3664664},
  doi = {10.1145/3657604.3664664},
  booktitle = {Proceedings of the Eleventh ACM Conference on Learning @ Scale},
  pages = {361–365},
  numpages = {5},
  keywords = {discourse analysis, llm classification, math tutor training},
  location = {Atlanta, GA, USA},
  series = {L@S '24}
}

| PDF | URL

2023

BEA

E. Margaret Perkoff, Abhidip Bhattacharyya, Jon Cai, and Jie Cao. 2023. Comparing Neural Question Generation Architectures for Reading Comprehension. In Ekaterina Kochmar, Jill Burstein, Andrea Horbach, Ronja Laarmann-Quante, Nitin Madnani, Anaïs Tack, Victoria Yaneva, Zheng Yuan, and Torsten Zesch, editors, Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023), pages 556–566, Toronto, Canada, July. Association for Computational Linguistics. BibTeX

@inproceedings{qg-bea23,
  title = {Comparing Neural Question Generation Architectures for Reading Comprehension},
  author = {Perkoff, E. Margaret and Bhattacharyya, Abhidip and Cai, Jon and Cao, Jie},
  editor = {Kochmar, Ekaterina and Burstein, Jill and Horbach, Andrea and Laarmann-Quante, Ronja and Madnani, Nitin and Tack, Ana{\"i}s and Yaneva, Victoria and Yuan, Zheng and Zesch, Torsten},
  booktitle = {Proceedings of the 18th Workshop on Innovative Use of NLP for Building Educational Applications (BEA 2023)},
  month = jul,
  year = {2023},
  address = {Toronto, Canada},
  publisher = {Association for Computational Linguistics},
  url = {https://aclanthology.org/2023.bea-1.47/},
  doi = {10.18653/v1/2023.bea-1.47},
  pages = {556--566}
}

| PDF | URL

ACL

Ananya Ganesh, Jie Cao, E. Magerate Perkoff, Rosy Southwell, Martha Palmer, and Katharina Kann. 2023. Mind the Gap between the Application Track and the Real World. Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics, 2023. BibTeX

@article{ananya-acl23,
  title = {Mind the Gap between the Application Track and the Real World},
  author = {Ganesh, Ananya and Cao, Jie and Perkoff, E. Magerate and Southwell, Rosy and Palmer, Martha and Kann, Katharina},
  journal = {Proceedings of the 61th Annual Meeting of the Association for
      Computational Linguistics, 2023},
  year = {2023}
}

| PDF

UMAP

Jie Cao, Ananya Ganesh, Jon Cai, Rosy Southwell, Magerate Perkoff, Michael Regan, Katharina Kann, James Martin, Martha Palmer, and Sideny D’Mello. 2023. A Comparative Analysis of Automatic Speech Recognition Errors in Small Group Classroom Discourse. Proceedings of the 31st ACM Conference on User Modeling Adaptation and Personalization. BibTeX

@article{cao-umap23,
  title = {A Comparative Analysis of Automatic Speech Recognition Errors in Small
    Group Classroom Discourse},
  author = {Cao, Jie and Ganesh, Ananya and Cai, Jon and Southwell, Rosy and Perkoff, Magerate and Regan, Michael and Kann, Katharina and Martin, James and Palmer, Martha and D'Mello, Sideny},
  journal = {Proceedings of the 31st ACM Conference on User Modeling Adaptation
      and Personalization},
  year = {2023}
}

| PDF

AIAIC

Jie Cao, Rachel Dickler, Marie Grace, Alessandro Roncone, Leanne Hirshfield, Marilyn Walker, and Martha Palmer. 2023. Designing an AI Partner for Jigsaw Classrooms. Workshop on Language-Based AI Character Interation with Children. BibTeX

@article{cao-jigsaw23,
  title = {Designing an AI Partner for Jigsaw Classrooms},
  author = {Cao, Jie and Dickler, Rachel and Grace, Marie and Roncone, Alessandro and Hirshfield, Leanne and Walker, Marilyn and Palmer, Martha},
  journal = {Workshop on Language-Based AI Character Interation with Children},
  year = {2023}
}

| PDF

2022

IWSDS

Jon Cai, Brendan D. King, Margaret Perkoff, Shiran Dudy, Jie Cao, Marie Grace, Natalia Wojarnik, Ganesh Ananya, James Martin, Martha Palmer, Marilyn Walker, and Jeffrey Flanigan. 2022. Dependency Dialogue Acts — Annotation Scheme and Case Study. The 13th International Workshop on Spoken Dialogue Systems Technology. BibTeX

@article{jon-dda2022,
  title = {Dependency Dialogue Acts — Annotation Scheme and Case Study},
  author = {Cai, Jon and King, Brendan D. and Perkoff, Margaret and Dudy, Shiran and Cao, Jie and Grace, Marie and Wojarnik, Natalia and Ananya, Ganesh and Martin, James and Palmer, Martha and Walker, Marilyn and Flanigan, Jeffrey},
  journal = {The 13th International Workshop on Spoken Dialogue Systems Technology},
  year = {2022}
}

| PDF

Ph.D. Dissertation

Jie Cao. 2022. Inductive Biases for Deep Linguistic Structured Prediction with Independent Factorization. Available from ProQuest Dissertations & Theses A&I;ProQuest Dissertations & Theses Global. (2777357718). BibTeX

@article{dissertation-proquest,
  title = {Inductive Biases for Deep Linguistic Structured Prediction with
      Independent Factorization},
  author = {Cao, Jie},
  journal = {Available from ProQuest Dissertations & Theses A&I;ProQuest
      Dissertations & Theses Global. (2777357718)},
  year = {2022}
}

| PDF

2021

VLDB

Debjyoti Paul*, Jie Cao*, Feifei Li, and Vivek Srikumar. 2021. Database Workload Characterization with Query Plan Encoders. Proceedings of the VLDB Endowment, 15(4):923–935. BibTeX

@article{cao2021dbqencoder,
  title = {Database Workload Characterization with Query Plan Encoders},
  author = {{Debjyoti Paul}* and {Jie Cao}* and Li, Feifei and Srikumar, Vivek},
  journal = {Proceedings of the VLDB Endowment},
  volume = {15},
  number = {4},
  pages = {923--935},
  year = {2021},
  publisher = {VLDB Endowment}
}

| PDF

NAACL

Jie Cao and Yi Zhang. 2021. A Comparative Study on Schema-Guided Dialogue State Tracking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 782–796. BibTeX

@inproceedings{cao2021comparative,
  title = {A Comparative Study on Schema-Guided Dialogue State Tracking},
  author = {Cao, Jie and Zhang, Yi},
  booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  pages = {782--796},
  year = {2021}
}

| PDF | Poster

Earlier

CoNLL

Jie Cao, Yi Zhang, Adel Youssef, and Vivek Srikumar. 2019. Amazon at MRP 2019: Parsing Meaning Representations with Lexical and Phrasal Anchoring. In Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the Conference on Natural Language Learning, pages 138–148. BibTeX

@inproceedings{cao2019amazon,
  title = {Amazon at MRP 2019: Parsing Meaning Representations with Lexical and
      Phrasal Anchoring},
  author = {Cao, Jie and Zhang, Yi and Youssef, Adel and Srikumar, Vivek},
  booktitle = {Proceedings of the Shared Task on Cross-Framework Meaning
      Representation Parsing at the Conference on Natural Language
        Learning},
  pages = {138--148},
  year = {2019}
}

| PDF

ACL

Jie Cao, Michael Tanana, Zac Imel, Eric Poitras, David Atkins, and Vivek Srikumar. 2019. Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. BibTeX

@inproceedings{cao2019observing,
  title = {Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes},
  author = {Cao, Jie and Tanana, Michael and Imel, Zac and Poitras, Eric and Atkins, David and Srikumar, Vivek},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  year = {2019},
  location = {Florence, Italy}
}

| PDF | Slides

ACL

Zhiqiang Liu, Zuohui Fu, Jie Cao, Gerard de Melo, Yik-Cheung Tam, Cheng Niu, and Jie Zhou. 2019. Rhetorically Controlled Encoder-Decoder for Modern Chinese Poetry Generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. BibTeX

@inproceedings{rhetorical-poetry2019,
  title = {Rhetorically Controlled Encoder-Decoder for Modern Chinese Poetry Generation},
  author = {Liu, Zhiqiang and Fu, Zuohui and Cao, Jie and {de Melo}, Gerard and Tam, Yik-Cheung and Niu, Cheng and Zhou, Jie},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  year = {2019},
  location = {Florence, Italy}
}

| PDF

DSTC

Shuo, Sun*, Yik-Cheung Tam*, Jie Cao*, Canxiang Yan, Zuohui Fu, Cheng Niu, and Jie Zhou. 2019. End-to-end Gated Self-attentive Memory Network for Dialog Response Selection. In AAAI DSTC7 Workshop (Equal Contribution). BibTeX

@inproceedings{jie2019dstc,
  title = {End-to-end Gated Self-attentive Memory Network for Dialog Response Selection},
  author = {{Shuo, Sun}* and {Yik-Cheung Tam}* and {Jie Cao*} and {Canxiang Yan} and {Zuohui Fu} and {Cheng Niu} and {Jie Zhou}},
  booktitle = {AAAI DSTC7 Workshop (Equal Contribution)},
  year = {2019},
  location = {Honolulu, American}
}

| PDF | Poster

IEEE ICSC

Xijiang Ke, Hai Jin, Xia Xie, and Jie Cao. 2015. A Distributed SVM Method Based on the Iterative MapReduce. In Semantic Computing (ICSC), IEEE International Conference on, pages 116–119. IEEE. BibTeX

@inproceedings{ke2015distributed,
  title = {A Distributed SVM Method Based on the Iterative MapReduce},
  author = {Ke, Xijiang and Jin, Hai and Xie, Xia and Cao, Jie},
  booktitle = {Semantic Computing (ICSC), IEEE International Conference on},
  pages = {116--119},
  year = {2015},
  organization = {IEEE}
}

| PDF

IEEE APSCC

Xia Xie, Jie Cao, Hai Jin, Xijiang Ke, and Wenzhi Cao. 2012. JRBridge: A framework of large-scale statistical computing for R. In Services Computing Conference (APSCC), IEEE Asia-Pacific, pages 27–34. IEEE. BibTeX

@inproceedings{xie2012jrbridge,
  title = {JRBridge: A framework of large-scale statistical computing for R},
  author = {Xie, Xia and Cao, Jie and Jin, Hai and Ke, Xijiang and Cao, Wenzhi},
  booktitle = {Services Computing Conference (APSCC), IEEE Asia-Pacific},
  pages = {27--34},
  year = {2012},
  organization = {IEEE}
}

| PDF