This talk discusses the challenge of determining what should be released in large-scale software development, such as at Meta’s scale. To address this, we developed models to determine the risk of a pull request (diff) causing an outage (aka SEV). We trained the models on historical data and used different types of gating to predict the riskiness of an outgoing diff. The models were able to capture a significant percentage of SEVs while gating a relatively small percentage of risky diffs. We also compared different models, including logistic regression, BERT-based models, and generative LLMs, and found that the generative LLMs performed the best.
Rui Abreu holds a Ph.D. in Computer Science – Software Engineering from the Delft University of Technology, The Netherlands, and a M.Sc. in Computer and Systems Engineering from the University of Minho, Portugal. His research revolves around software quality, with emphasis in automating the testing and debugging phases of the software development life-cycle as well as self-adaptation. He has extensive expertise in both static and dynamic analysis algorithms for improving software quality. He is the recipient of 5 Best Paper Awards, and his work has attracted considerable attention. Before joining the Instituto Superior Técnico of University of Lisbon as an Associate Professor, he was a member of the Model-Based Reasoning group at PARC’s System and Sciences Laboratory. He is also with DashDash, a startup that aims to automate workflows using Excel skills only, as Head of Engineering.