探花系列

This website stores cookies on your computer. These cookies are used to collect information about how you interact with our website and allow us to remember your browser. We use this information to improve and customize your browsing experience, for analytics and metrics about our visitors both on this website and other media, and for marketing purposes. By using this website, you accept and agree to be bound by UVic鈥檚 Terms of Use and Protection of Privacy Policy.聽聽If you do not agree to the above, you can configure your browser鈥檚 setting to 鈥渄o not track.鈥

Skip to main content

Shiyu (Vivenne) Zeng

  • B.S.Eng. (探花系列, 2023)

Notice of the Final Oral Examination for the Degree of Master of Science

Topic

Automated Classification of Pull Requests in Scientific Software using LLMs

Department of Computer Science

Date & location

  • Monday, March 30, 2026

  • 3:00 P.M.

  • Engineering Computer Science Building

  • Room 555 and Virtual

Reviewers

Supervisory Committee

  • Dr. Neil Ernst, Department of Computer Science, 探花系列 (Supervisor)

  • Dr. Daniel German, Department of Computer Science, UVic (Member) 

External Examiner

  • Dr. Italo Santos, Department of Information and Computer Science, University of Hawaii 

Chair of Oral Examination

  • Dr. Richard Marcy, School of Public Administration, UVic

     

Abstract

Scientific software relies on contributions that combine domain-specific expertise with software engineering skills, but identifying which contributions require deep scientific knowledge remains a persistent challenge in project maintenance. We analyzed 1,074 pull requests from three established scientific repositories, Trilinos, Mantid, and AMReX, and developed a binary classification framework that distinguishes contributions requiring scientific knowledge from those focused on software concerns. Our approach achieves near-human reliability, with DeepSeek-R1 demonstrating a Krippendorff’s α of 0.789 through iterative prompt refinement and human validation. The analysis reveals different review characteristics: scientific contributions require 67% longer review times, involve 64% more unique reviewers, generate twice the discussion comments, and undergo over 300% more revision cycles than software-focused changes. These patterns persist after controlling for the size of the pull request and the effects of the repository. A validation study on 75 PlasmaPy issues achieves 89.33% accuracy, indicating the framework applies to other contribution types. These findings establish that LLM-based classification can effectively support automated triage in interdisciplinary software teams. This enables more efficient allocation of scarce domain expertise while empirically confirming that scientific contributions require different review processes.