Publications

Research Overview

My research is primarily focused on Natural Language Processing, particularly in the area of Dialogue and Structured Prediction. As AI challenges become more complex, I’m keen on developing modularized AI systems tailored for future large-scale projects that demand Human-AI Teaming, which requires complex control, reasoning, collaboration, and adaptation.

  • Modularized NLP. To decompose and integrate submodules~(e.g., structured knowledge, mixture of experts, multi-modalities) into controllable AI systems, my research spans symbolic language representations, neuro-symbolic interfaces, which have been applied to deep linguistic structured prediction from sentence to dialogue~(Dissertation’22, CoNLL’19, ACL’19, NAACL’21, IWSDS’23), controllable generation with constraints (DSTC7’19, ACL’19,BEA’23).
  • Controllable Learning. I investigated foundational models that integrating data-driven with various inductive biases, which are critical for creating efficient and trustworthy AI systems~(especially for Education and Health). I studied graph-based parsing via latent anchoring analysis(CoNLL’19), tree-structure database query plan characterization via self-supervised contrastive learning (VLDB’22), zero-shot dialogue state tracking via description-driven learning and supplementary pretraining (NAACL’21), and LLM prompting and finetuning (L@S’24), etc.
  • Robust Deployment. I studied language modeling under the complex environment of the real world, such as distribution shift(ACL’23), unnoticed non-verbal behavior (AIAIC’23), noisy speech in small-group classroom (UMAP’23), multi-party, multi-modal dynamics (IWSDS’23), and robust evaluations with conversational simulation.

Conference Papers

  • Baptiste Moreau-Pernet, Yu Tian, Sandra Sawaya, Peter Foltz, Jie Cao, Brent Milne, and Thomas Christie. 2024. Classifying Tutor Discursive Moves at Scale in Mathematics Classrooms with Large Language Models. In Proceedings of the Eleventh ACM Conference on Learning @ Scale, pages 361–365. Association for Computing Machinery.    BibTeX |  PDF  |  URL
  • E. Margaret Perkoff, Abhidip Bhattacharyya, Jon Cai, and Jie Cao. 2023. Comparing Neural Question Generation Architectures for Reading Comprehension. 18th Workshop on Innovative Use of NLP for Building Educational Applications, 2023.    BibTeX |  PDF
  • Ananya Ganesh, Jie Cao, E. Magerate Perkoff, Rosy Southwell, Martha Palmer, and Katharina Kann. 2023. Mind the Gap between the Application Track and the Real World. Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics, 2023.    BibTeX |  PDF
  • Jie Cao, Ananya Ganesh, Jon Cai, Rosy Southwell, Magerate Perkoff, Michael Regan, Katharina Kann, James Martin, Martha Palmer, and Sideny D’Mello. 2023. A Comparative Analysis of Automatic Speech Recognition Errors in Small Group Classroom Discourse. Proceedings of the 31st ACM Conference on User Modeling Adaptation and Personalization (ACM UMAP 2023).    BibTeX |  PDF
  • Jie Cao, Rachel Dickler, Marie Grace, Alessandro Roncone, Leanne Hirshfield, Marilyn Walker, and Martha Palmer. 2023. Designing an AI Partner for Jigsaw Classrooms. Workshop on Language-Based AI Character Interation with Children.    BibTeX |  PDF
  • Jon Cai, Brendan D. King, Margaret Perkoff, Shiran Dudy, Jie Cao, Marie Grace, Natalia Wojarnik, Ganesh Ananya, James Martin, Martha Palmer, Marilyn Walker, and Jeffrey Flanigan. 2022. Dependency Dialogue Acts — Annotation Scheme and Case Study. The 13th International Workshop on Spoken Dialogue Systems Technology.    BibTeX |  PDF
  • Jie Cao. 2022. Inductive Biases for Deep Linguistic Structured Prediction with Independent Factorization. Available from ProQuest Dissertations & Theses A&I;ProQuest Dissertations & Theses Global. (2777357718).    BibTeX |  PDF
  • Zhimin Li, Shusen Liu, Xin Yu, Kailkhura Bhavya, Jie Cao, Diffenderfer James Daniel, Peer-Timo Bremer, and Valerio Pascucci. 2022. "Understanding Robustness Lottery": A Comparative Visual Analysis of Neural Network Pruning Approaches. arXiv preprint arXiv:2206.07918.    BibTeX |  PDF
  • Debjyoti Paul*, Jie Cao*, Feifei Li, and Vivek Srikumar. 2021. Database Workload Characterization with Query Plan Encoders. Proceedings of the VLDB Endowment, 15(4):923–935.    BibTeX |  PDF
  • Jie Cao and Yi Zhang. 2021. A Comparative Study on Schema-Guided Dialogue State Tracking. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 782–796.    BibTeX |  PDF  |  Poster
  • Jie Cao, Yi Zhang, Adel Youssef, and Vivek Srikumar. 2019. Amazon at MRP 2019: Parsing Meaning Representations with Lexical and Phrasal Anchoring. In Proceedings of the Shared Task on Cross-Framework Meaning Representation Parsing at the Conference on Natural Language Learning(CoNLL), pages 138–148.    BibTeX |  PDF
  • Jie Cao, Michael Tanana, Zac Imel, Eric Poitras, David Atkins, and Vivek Srikumar. 2019. Observing Dialogue in Therapy: Categorizing and Forecasting Behavioral Codes. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.    BibTeX |  PDF  |  Slides
  • Zhiqiang Liu, Zuohui Fu, Jie Cao, Gerard de Melo, Yik-Cheung Tam, Cheng Niu, and Jie Zhou. 2019. Rhetorically Controlled Encoder-Decoder for Modern Chinese Poetry Generation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics.    BibTeX |  PDF
  • Shuo, Sun*, Yik-Cheung Tam*, Jie Cao*, Canxiang Yan, Zuohui Fu, Cheng Niu, and Jie Zhou. 2019. End-to-end Gated Self-attentive Memory Network for Dialog Response Selection. In AAAI DSTC7 Workshop (Equal Contribution).    BibTeX |  PDF  |  Poster
  • Xijiang Ke, Hai Jin, Xia Xie, and Jie Cao. 2015. A Distributed SVM Method Based on the Iterative MapReduce. In Semantic Computing (ICSC), IEEE International Conference on, pages 116–119. IEEE.    BibTeX |  PDF
  • Xia Xie, Jie Cao, Hai Jin, Xijiang Ke, and Wenzhi Cao. 2012. JRBridge: A framework of large-scale statistical computing for R. In Services Computing Conference (APSCC), IEEE Asia-Pacific, pages 27–34. IEEE.    BibTeX |  PDF