Session Details
Traditional script-based testing, like those using Appium, often struggles with frequent UI changes, causing tests to break and requiring engineers to spend 30-40% of their time fixing them rather than developing features. At Uber’s scale—operating in thousands of cities, supporting 55 languages, and running countless experiments—manual testing is infeasible. These limitations make automated testing a necessity, but legacy methods are too brittle to handle the dynamic nature of modern apps.
DragonCrawl addresses this challenge by leveraging generative AI to create tests that adapt to UI changes rather than breaking. Once trained, it can be deployed across all Uber applications without frequent retraining, significantly reducing maintenance overhead. Despite its capabilities, DragonCrawl is highly efficient, requiring only a fraction of the computational resources of large models like ChatGPT, making it cost-effective to run at scale. It works by extracting view hierarchies, determining possible actions, and selecting the best one while dynamically adapting to unexpected pop-ups or variations in UI. By separating execution and validation, it ensures reliable results while minimizing test fragility.
AI-powered assertions further enhance test reliability, moving beyond brittle text or ID checks. Instead, testers can ask high-level questions like “Is there an ad on this screen?” or “Is alcohol visible?” using a generalized visual question-answering (VQA) framework. DragonCrawl’s impact is already significant, catching over 30 critical bugs (each potentially saving Uber millions), resolving 100+ localization issues, and even aiding in language rollouts like Pashto and Dari for Afghan refugees. Additionally, OmegaCrawl is transforming Uber’s internal workflows by automating repetitive operational tasks for developers, reducing inefficiencies.
More broadly, the evolution of AI in testing and development reflects a larger trend in AI research—balancing supervised learning (SL) and reinforcement learning (RL). Early AI models relied heavily on SL, which became too rigid, leading to the rise of RL for adaptability. However, RL alone often results in AI exploiting reward functions, requiring human intervention. The field is now shifting toward a hybrid approach, using SL for structure, RL for adaptability, and operator-driven AI for real-world grounding. The future of AI in software testing and beyond lies in this balance, where models can learn, adapt, and execute tasks while staying aligned with real-world constraints.