Jie Cao is an Assistant Professor in the School of Computer Science at the University of Oklahoma, leading the OUNLP lab. I am also affiliated with the Data Science and Analytics Institute at OU. Before joining OU, he spent two years as a post-doctoral researcher at the NSF AI Institute for Student-AI Teaming (iSAT) at the University of Colorado Boulder, where he mainly worked with Dr. James Martin and Dr. Martha Palmer. He obtained his Ph.D. from the Kahlert School of Computing at the University of Utah, where he worked with Dr. Vivek Srikumar. Earlier in his academic journey, he completed his M.S. and B.S. in Computer Science at Huazhong University of Science of Technology~(HUST) in China, and he has also worked/interned in industrial companies including Alibaba, Baidu, Sohu, WeChat(@Palo Alto), and Amazon, etc.

Research Interests

I work on Natural Language Processing and Machine Learning. Current research interests include:

Multi-party Multi-modal Dialogue/Discourse Analysis on Mental Health, Education, etc
LLM/MLLM Alignement and Agents for Science, etc
Efficient Structured Prediction and Symbolic Methods for Controlling and Augmenting Neural Networks
Robust Deployment, and Evaluation of Trustworthy AI

News

07/2025: One paper on “AQUAH: Automatic Quantification and Unified Agent in Hydrology” got accepted to ICCV Workshop on Sustainability with Earth observation and AI.
06/2025: One paper on “Adversarial Attacks on Cooperative Spectrum Sensing: A LLM-Powered Multi-Agent Approach” got accepted to IEEE SPAWC 2025.
05/2025: Congratulations to Masiko Mamba on being awarded the Undergraduate Engineering Research Fellowship for Summer 2025.
04/2025: Our paper on multi-perspective discourse analysis on teaching and tutoring dialogue got accepted to EDM2025.
04/2025: I was awarded Alternative Textbook Grant on practical course content on recent advances in LLM and Agentic AI.
03/2025: Invited Talk in Graduate Student Community@Gallogly College of Engineering: Advances in Open LLM
02/2025: New preprint on multimodal LLM on material science MatterChat.
11/2024: Our paper “Enhancing Talk Moves Analysis in Mathematics Tutoring through Classroom Teaching Discourse” got accepted to COLING’2025.
11/2024: Our paper on visualization for network pruning got accepted to TVCG.
09/2024: Talk with students on “History of NLP” at the OU AI/ML Club.
07/2024: Our paper on dialogue classification via LLM finetuning is accepted to L@S’24.
02/2024: Invited Talk on “Modularized Conversational Modeling” at Emory University, Georgia State University.
11/2023: In Fall 2023, I taught NLP class~(CSCI-LING 5832) with James Martin. I newly created course materials on LLMs, In-Context Learning, Dialogue Generation, etc.
05/2023: Our paper on Question Generation accepted to BEA’23
05/2023: A short paper on “Mind the Gap between the Application Track and the Real World” got accepted to ACL’23
04/2023: Our paper on “A Comparative Analysis of Automatic Speech Recognition Errors in Small Group Classroom Discourse” got accepted to UMAP’23.
03/2023: My research on conversational simulation on small-group discussion got awarded by iSAT Trainee Grant.
02/2023: Our paper on AI agent for Jigsaw Classrooms got accepted on AIAIC’23.
12/2022: Our paper on Dependency Dialog Act got accepted on IWSDS’23.
12/2022: Invited Talk on Database Workload Characterization work at Microsoft’s Gray Systems Lab. Slides.
08/2022: I joined NSF AI Institute for Student-AI Teaming (iSAT) as a post-doctoral researcher.
06/2022: New preprint on visual analysis of neural network pruning.

Selected Publications

(See full list in Publication Page or Google Scholar)

Songkun Yan, Zhi Li, Siyu Zhu, Yixin Wen, Mofan Zhang, Mengye Chen, Jie Cao, and Yang Hong. 2025. AQUAH: Automatic Quantification and Unified Agent in Hydrology. 1st workshop on Sustainability with Earth observation and AI (co-located with ICCV), 2025 (To Appear). BibTeX

@article{songkun2025sea,
  title = {AQUAH: Automatic Quantification and Unified Agent in Hydrology},
  author = {Yan, Songkun and Li, Zhi and Zhu, Siyu and Wen, Yixin and Zhang, Mofan and Chen, Mengye and Cao, Jie and Hong, Yang},
  journal = {1st workshop on Sustainability with Earth observation and AI (co-located with ICCV), 2025 (To Appear)},
  year = {2025}
}

Yu Cai, Ziang He, Yanyan Luo, Jie Cao, Yuchen Liu, Zhengping Luo, and Shangqing Zhao. 2025. Adversarial Attacks on Cooperative Spectrum Sensing: A LLM-Powered Multi-Agent Approach. IEEE 26th International Workshop on Signal Processing and Artificial Intelligence for Wireless Communications (SPAWC), 2025 (To Appear). BibTeX

@article{yu2025spawc,
  title = {Adversarial Attacks on Cooperative Spectrum Sensing: A LLM-Powered Multi-Agent Approach},
  author = {Cai, Yu and He, Ziang and Luo, Yanyan and Cao, Jie and Liu, Yuchen and Luo, Zhengping and Zhao, Shangqing},
  journal = {IEEE 26th International Workshop on Signal Processing and Artificial Intelligence for Wireless Communications (SPAWC), 2025 (To Appear)},
  year = {2025}
}

Jannatun Naim, Jie Cao, Fareen Tasneem, Jennifer Jacobs, Brent Milne, James Martin, and Tamara Sumner. 2025. Towards Actionable Pedagogical Feedback: A Multi-Perspective Analysis of Mathematics Teaching and Tutoring Dialogue. Proceedings of the 18th International Conference on Educational Data Mining, 2025 (To Appear). BibTeX

@article{naim2025edm,
  title = {Towards Actionable Pedagogical Feedback: A Multi-Perspective Analysis of Mathematics Teaching and Tutoring Dialogue},
  author = {Naim, Jannatun and Cao, Jie and Tasneem, Fareen and Jacobs, Jennifer and Milne, Brent and Martin, James and Sumner, Tamara},
  journal = {Proceedings of the 18th International Conference on Educational Data Mining, 2025 (To Appear)},
  year = {2025}
}

| PDF

Yingheng Tang, Wenbin Xu, Jie Cao, Weilu Gao, Steve Farrell, Benjamin Erichson, Michael W Mahoney, Andy Nonaka, and Zhi Yao. 2025. MatterChat: A Multi-Modal LLM for Material Science. arXiv preprint arXiv:2502.13107. BibTeX

@article{tang2025matterchat,
  title = {MatterChat: A Multi-Modal LLM for Material Science},
  author = {Tang, Yingheng and Xu, Wenbin and Cao, Jie and Gao, Weilu and Farrell, Steve and Erichson, Benjamin and Mahoney, Michael W and Nonaka, Andy and Yao, Zhi},
  journal = {arXiv preprint arXiv:2502.13107},
  year = {2025}
}

| PDF

Jie Cao, Abhijit Suresh, Jennifer Jacobs, Charis Clevenger, Amanda Howard, Chelsea Brown, Brent Milne, Tom Fischaber, Tamara Sumner, and James H. Martin. 2025. Enhancing Talk Moves Analysis in Mathematics Tutoring through Classroom Teaching Discourse. In The 31st International Conference on Computational Linguistics (COLING 2025). BibTeX

@inproceedings{talkmove-coling-2024,
  title = {Enhancing Talk Moves Analysis in Mathematics Tutoring through Classroom
      Teaching Discourse},
  author = {Cao, Jie and Suresh, Abhijit and Jacobs, Jennifer and Clevenger, Charis and Howard, Amanda and Brown, Chelsea and Milne, Brent and Fischaber, Tom and Sumner, Tamara and Martin, James H.},
  booktitle = {The 31st International Conference on Computational Linguistics (COLING 2025)},
  year = {2025}
}

| PDF

Baptiste Moreau-Pernet, Yu Tian, Sandra Sawaya, Peter Foltz, Jie Cao, Brent Milne, and Thomas Christie. 2024. Classifying Tutor Discursive Moves at Scale in Mathematics Classrooms with Large Language Models. In Proceedings of the Eleventh ACM Conference on Learning @ Scale, pages 361–365. Association for Computing Machinery. BibTeX

@inproceedings{talkmove-llm-2024,
  author = {Moreau-Pernet, Baptiste and Tian, Yu and Sawaya, Sandra and Foltz, Peter and Cao, Jie and Milne, Brent and Christie, Thomas},
  title = {Classifying Tutor Discursive Moves at Scale in Mathematics Classrooms with Large Language Models},
  year = {2024},
  isbn = {9798400706332},
  publisher = {Association for Computing Machinery},
  url = {https://doi.org/10.1145/3657604.3664664},
  doi = {10.1145/3657604.3664664},
  booktitle = {Proceedings of the Eleventh ACM Conference on Learning @ Scale},
  pages = {361–365},
  numpages = {5},
  keywords = {discourse analysis, llm classification, math tutor training},
  location = {Atlanta, GA, USA},
  series = {L@S '24}
}

| PDF | URL

Zhimin Li, Shusen Liu, Xin Yu, Kailkhura Bhavya, Jie Cao, Diffenderfer James Daniel, Peer-Timo Bremer, and Valerio Pascucci. 2024. “Understanding Robustness Lottery”: A Geometric Visual Comparative Analysis of Neural Network Pruning Approaches. IEEE Transactions on Visualization and Computer Graphics. BibTeX

@article{vis-pruning2024,
  title = {“Understanding Robustness Lottery”: A Geometric Visual Comparative
      Analysis of Neural Network Pruning Approaches},
  author = {Li, Zhimin and Liu, Shusen and Yu, Xin and Bhavya, Kailkhura and Cao, Jie and Daniel, Diffenderfer James and Bremer, Peer-Timo and Pascucci, Valerio},
  url = {https://doi.org/10.1109/tvcg.2024.3514996},
  journal = {IEEE Transactions on Visualization and Computer Graphics},
  year = {2024}
}

| PDF | URL

Jie Cao, Ananya Ganesh, Jon Cai, Rosy Southwell, Magerate Perkoff, Michael Regan, Katharina Kann, James Martin, Martha Palmer, and Sideny D’Mello. 2023. A Comparative Analysis of Automatic Speech Recognition Errors in Small Group Classroom Discourse. Proceedings of the 31st ACM Conference on User Modeling Adaptation and Personalization (ACM UMAP 2023). BibTeX

@article{cao-umap23,
  title = {A Comparative Analysis of Automatic Speech Recognition Errors in Small
    Group Classroom Discourse},
  author = {Cao, Jie and Ganesh, Ananya and Cai, Jon and Southwell, Rosy and Perkoff, Magerate and Regan, Michael and Kann, Katharina and Martin, James and Palmer, Martha and D'Mello, Sideny},
  journal = {Proceedings of the 31st ACM Conference on User Modeling Adaptation
      and Personalization (ACM UMAP 2023)},
  year = {2023}
}

| PDF

Jon Cai, Brendan D. King, Margaret Perkoff, Shiran Dudy, Jie Cao, Marie Grace, Natalia Wojarnik, Ganesh Ananya, James Martin, Martha Palmer, Marilyn Walker, and Jeffrey Flanigan. 2022. Dependency Dialogue Acts — Annotation Scheme and Case Study. The 13th International Workshop on Spoken Dialogue Systems Technology. BibTeX

@article{jon-dda2022,
  title = {Dependency Dialogue Acts — Annotation Scheme and Case Study},
  author = {Cai, Jon and King, Brendan D. and Perkoff, Margaret and Dudy, Shiran and Cao, Jie and Grace, Marie and Wojarnik, Natalia and Ananya, Ganesh and Martin, James and Palmer, Martha and Walker, Marilyn and Flanigan, Jeffrey},
  journal = {The 13th International Workshop on Spoken Dialogue Systems Technology},
  year = {2022}
}

| PDF

Jie Cao. 2022. Inductive Biases for Deep Linguistic Structured Prediction with Independent Factorization. Available from ProQuest Dissertations & Theses A&I;ProQuest Dissertations & Theses Global. (2777357718). BibTeX

@article{dissertation-proquest,
  title = {Inductive Biases for Deep Linguistic Structured Prediction with
      Independent Factorization},
  author = {Cao, Jie},
  journal = {Available from ProQuest Dissertations & Theses A&I;ProQuest
      Dissertations & Theses Global. (2777357718)},
  year = {2022}
}

| PDF

Debjyoti Paul*, Jie Cao*, Feifei Li, and Vivek Srikumar. 2021. Database Workload Characterization with Query Plan Encoders. Proceedings of the VLDB Endowment, 15(4):923–935. BibTeX

@article{cao2021dbqencoder,
  title = {Database Workload Characterization with Query Plan Encoders},
  author = {{Debjyoti Paul}* and {Jie Cao}* and Li, Feifei and Srikumar, Vivek},
  journal = {Proceedings of the VLDB Endowment},
  volume = {15},
  number = {4},
  pages = {923--935},
  year = {2021},
  publisher = {VLDB Endowment}
}

| PDF

Jie Cao and Yi Zhang. 2021. A Comparative Study on Schema-Guided Dialogue State Tracking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 782–796. BibTeX

@inproceedings{cao2021comparative,
  title = {A Comparative Study on Schema-Guided Dialogue State Tracking},
  author = {Cao, Jie and Zhang, Yi},
  booktitle = {Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies},
  pages = {782--796},
  year = {2021}
}

| PDF | Poster

Zhiqiang Liu, Zuohui Fu, Jie Cao, Gerard de Melo, Yik-Cheung Tam, Cheng Niu, and Jie Zhou. 2019. Rhetorically Controlled Encoder-Decoder for Modern Chinese Poetry Generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. BibTeX

@inproceedings{rhetorical-poetry2019,
  title = {Rhetorically Controlled Encoder-Decoder for Modern Chinese Poetry Generation},
  author = {Liu, Zhiqiang and Fu, Zuohui and Cao, Jie and {de Melo}, Gerard and Tam, Yik-Cheung and Niu, Cheng and Zhou, Jie},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  year = {2019},
  location = {Florence, Italy}
}

| PDF

Jie Cao, Yi Zhang, Adel Youssef, and Vivek Srikumar. 2019. Amazon at MRP 2019: Parsing Meaning Representations with Lexical and Phrasal Anchoring. In Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the Conference on Natural Language Learning(CoNLL), pages 138–148. BibTeX

@inproceedings{cao2019amazon,
  title = {Amazon at MRP 2019: Parsing Meaning Representations with Lexical and
      Phrasal Anchoring},
  author = {Cao, Jie and Zhang, Yi and Youssef, Adel and Srikumar, Vivek},
  booktitle = {Proceedings of the Shared Task on Cross-Framework Meaning
      Representation Parsing at the Conference on Natural Language
        Learning(CoNLL)},
  pages = {138--148},
  year = {2019}
}

| PDF

Jie Cao, Michael Tanana, Zac Imel, Eric Poitras, David Atkins, and Vivek Srikumar. 2019. Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. BibTeX

@inproceedings{cao2019observing,
  title = {Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes},
  author = {Cao, Jie and Tanana, Michael and Imel, Zac and Poitras, Eric and Atkins, David and Srikumar, Vivek},
  booktitle = {Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics},
  year = {2019},
  location = {Florence, Italy}
}

| PDF | Slides

Academic Service

Standing Reviewer for Journals: Computational Linguistics
Area Chair for ACL ARR
PC Member / Reviewer for Conferences and Workshops: ACL, EMNLP, NAACL, EACL, COLING, CoNLL, COLM, AAAI, ACL Rolling Review, AIED, EDM, MRP’2019, BEA, NLP4ConvAI, AmericasNLP’23, SLaTE’23