More Integrated Build and CI to accelerate build side - Developer Productivity Engineering Summit 2025

Summit Producer’s Highlight
Netflix shares observability metrics they captured to identify common problems on local and CI tool experience. They also share how remote test execution got PR build times down from 50 to 5 min.

What’s inside?
Find out how Netflix’s DevProd organization concentrates on the local development loop and ways to shrink the time spent purely waiting. Points of discussion will include:

Common problems with long local development loops
Improving self serviceability and the build performance and consistency experience
CI observability and insights using tools like Gradle Build Scan ™
Pipeline acceleration using remote test execution with Gradle Enterprise Test Distribution
Flaky test detection and other avoidable failures using test and failure analytics

About Aubrey
Aubrey Chipman loves to work on developer tooling/developer experience. She loves being part of a team where they can grow, support, and elevate each other to help others. She has a passion for development work and the associated coaching, influencing, and generally making ecosystems better

About Roberto
Roberto Perez Alcolea is an experienced software engineer focused on microservices, cloud, developer productivity and continuous delivery. He’s a self-motivated, success-driven, and detail-oriented professional interested in solving unique and challenging problems.

Gradle Enterprise Solutions for CI Observability and Build Optimization
Gradle Enterprise build and test acceleration technologies address the pain of unnecessary developer idle time waiting for builds and tests to complete. Build Scan gives you granular analytic information for every build, so you can quickly fix problems and debug performance issues. Test and Failure Analytics leverages historical data to proactively find unreliable builds and flaky tests, report on the people and environments affected by the problem, and guide teams to the root cause efficiently.

Interested in CI Observability and Build Optimization? Try these next steps:

See how Netflix took a 62-minute Java build down to less than 5 minutes using Remote Test Execution with Gradle Enterprise Test Distribution
See this short video about how Test and Failure Analytics can make your test sets more reliable and consistent
Sign up for our free training class, Gradle Enterprise for Developers to learn how to debug build issues locally and in CI faster and more efficiently

Aubrey: Hello. Hello. Hello. Although. Hi. And welcome, everybody. So. Oh, yeah. By the podium. Yep. Hopefully you can see all the things. So we're going to be talking to you today about a more integrated build in CI to accelerate builds at Netflix. So first of all, Rooz was a wonderful introduction for us. We're both from the productivity engineering team like you was just talking about run the JVM Ecosystem Team together. And I'm Aubrey Chipman. This is my colleague, Roberto Perez Alcolea. So we wanted to give an overview. First of all, the introduction we're going to talk about where we're coming from, what we're typically doing over at Netflix. Go to some common problems on local and CI experiences. Talk about enabling and managing constant change because it keeps on going. We want to talk about improving the build performance and consistency experience and improving self-serviceability. After that, we'll talk about things that are on the horizon that we're getting into and want to have more of. And then have a Q&A session. So. Well, let me go back over here. So, first of all, we want to talk about just a baseline of how build and CI are talking together. What do we mean by CI? In this case, the practice of continuous integration. Just a reminder, just to set this baseline, there's people who are committing changes. They're going to be pushing it over to a repository of some sort. They want to see. Then, is it going to be building successfully? Is it going to be testing successfully and then get that feedback nice and quickly? So the way that varies is you're going to be doing this in a more isolated kind of environment rather than your local build. Overall, Netflix and the JVM landscape, we have 3.2 thousand active repositories, so we're not going to be talking to you today very much about mono repo versus multi repo, but we have chosen multi repo. So we do tons of binary integration as opposed to getting in the code changes just in one spot and then merging it all together. We also have a bunch of microservices with clients. We have hundreds of thousands of engineers who are supporting this. We have about 191,000 JVM builds per week. So there's a bunch of them. This is things that are happening locally, things that are happening in CI and we publish about 63,000 artifacts. This is just taking a look at the jars that we're producing, not things that are going to be packaged up for later deployment necessarily, but just these jars published per week. And so you can take a look at this picture over here, or you might wonder, what is that funny, splotchy thing going on? So this is a picture of a dependency graph. We have a few of them out there, a few similar kinds of pictures. This one is from a few years ago, just taking a look at how the various clients are all interacting together. So we've got really complicated ones. One of our related teams was trying to figure out how to map it as it looks today and then run into a kind of a buffer overflow problem with whatever tool they were using. As well. When we take a look at the CI landscape, we have 35 Jenkins controllers, about 45,000 job definitions for these various controllers. About 600,000 builds per week. A bunch of different agents that are out there. This is managed with combination of by a central team and then by individual teams that have very specific setup needs and about 1 to 100 executors per agent. So this brings us to.

Roberto: Common problems. So first of all, usually people go and talk on Twitter or any other social platform that they are not Netflix. And in reality, nobody is Netflix and we are not your company. So we have different, different issues like we tried to tackle for the business and you might have the same in your own company, but at the very end uncommon problems that we have seen the local and CI tooling that talking to our engineers in different areas, different organizations, different business units they suffer from same. So we would like to chat a little bit about them. And after that we will talk about the things that we have been doing the last few years to try to reduce the pain in there, because the reality is that pain will always be there, like dependency hell. So we have tried to reduce that and try to make it better for engineers and make them more happy, joyful as Hans said in the previous talk. So the first one is a slow feedback loop. We have been talking about feedback loop in the last few minutes, one of the things that usually impacted is a slow test. Dependency downloads. If I have a build, I need to download the whole internet just to build my project. That's always painful, flaky tests. Is slowing down people to get faster feedback or just things like compute resources. But if we don't have on standardized ways to deliver computers to engineers? But if you are using a PC with Windows versus a Linux versus OSX and you may have a modern computer with different architecture that is faster than the others. So you have different experiences for every single thing you need if you don't have some sort of standardization. And test times are really important. We introduced Gradle Enterprise a few years ago and what we found because we really weren't measuring this, is that 80% of our build times is spent on tests. So it was the low-hanging fruit that didn't turn out to be low-hanging is really, really a big thing that we needed to address. And that's one of the things that we have been working on in the last few years. Inconsistencies across environments. As I mentioned, compute resources can be different and they are always different between local and CI. But if you have something locally that is not available in your CI agent like Jenkins or vice versa. What if you are building something that requires some network access or security policies? Or what about Docker like Docker store, it's really difficult these days where there is different architectures in your machine building images for that architecture is sometimes difficult for engineers, and understanding for the common engineer might not be necessary. So we try to work in a way that we avoid the worst of my machine or worse on CI only so people build confidence to build a product locally or in CI and get the same results. And constant change. Typically, people talk about CI as okay. A group of engineers is working on a project making sure that continuous delivery something into the main branch. But in reality, as a platform engineer, we also want to get them to the latest and greatest opinions or frameworks that we provide to them so that for us, that's also part of the CI thing. If I'm owning some sort of Java framework opinion for engineers, how can I take them from version 1 to 2? How can I make sure that they can get to the latest and greatest without so much hassle or without friction? And obviously, security vulnerabilities are a huge topic these days. If there isn't a security vulnerability, what can we do as platform engineers to make that change for them without engineers having to just go and rebuild every single time manually?

Aubrey: All right. So just like that. Talking about enabling constant change, one of the very first things that we do over at Netflix to help out with this is we want to make sure that people are going to be using the latest version of their Gradle wrapper. We have another item around it called Nebula, and we want people to be on the latest version of that. We want people to be locking their dependencies, have consistent and replicable builds. We want to also make sure that people are going to be updating their build files in a nice, manageable way. So they're using the latest versions of plugins are using the latest DSLs, these kinds of items. So I talk about that because in the past every particular project had their own Jenkins job to be able to do this. They would have their own Jenkins Gradle wrapper update and dependency lock job update. It looks kind of like this over here. Whenever you generated your project with the project generator, it would create these for you. And we had about 6.5 thousand Jenkins jobs with no centralized view. One of the issues that we ran into here is that there would be a lot of skew if something needed to change for all of them, then people needed to go modify either the XML that was relayed over there or click through the UI and it was not streamlined and really difficult to manage from a centralized perspective. So what we did from here is we wanted to keep the same basic ideas to keep everyone up to date automatically. But we introduced Rocket CI, so this is a custom Jenkins plugin and Bitbucket plugin. It's currently internal. It combines a suite of tools, help out with continuous integration. So every JVM project that's newly initialized is going to come and automatically be enrolled for Rocket CI. You can see up there on the Rocket features that we've got a few different enrollments that are available. So there's things like keeping your Nebula wrapper, Gradle wrapper up to date, keeping your dependency locks up to date, and making sure that your build hygiene, your DSLs, kinds of things are getting up to date as well. These will run nightly for everybody and it has aspects of configuration as code in there as well, so that if people want to make adjustments, it's centralized with their own repository and we can submit a big batch pull request as opposed to modifying things in the Jenkins UI itself. So the way this works is that Rocket CI is going to send updates via pull request. The very first step is it's going to go ahead, grab the source code, repository, make a fork, and then try to run the change by itself. If everything doesn't finish at that point, we know there's some sort of infrastructure failure. Maybe a container failed for some reason. Maybe something ran out of memory, maybe your project just couldn't configure. It was in a red state to begin with. And it's got some problems right there. So alot of these will self-resolve in the next day or so whenever it runs again. But a number of times it won't. So we send people notifications saying, please go take a look at this because we currently can't even send you a pull request. The build is not in a good enough state to get there. In the good case though, if everything is successful, we have run all of the updates on the branch, on the fork. Then we can open up a pull request to the repository and then auto merge things in if everything looks good. This is an example of the pull request where we can modify to have non default things that talk about like we have updated your dependencies. This is how you can understand what your dependency diff is. Here's why different things were updated in this particular way, and it really helps for people to understand why their pull request is what it is, when it has this additional information and additional tools inside of there. As well with Rocket CI as a platform team, we have much more visibility into how this is going. We can see the number of projects that are enrolled for each of these different enrollments. You can see the different lines on this chart over here. We can see how many enrollments were successful the past day, the past X number of days. We can see how many of them failed, how many are in an odd state that need to go be manually investigated. And there's just a lot of centralized information that we can see and we can figure out do we need to make changes in our process to help streamline this for everyone else?

Roberto: But not everything is a happy path, right? One of the things that we have been suffering or we have failed, honestly, is how can we validate a change before opening up a pull request. So years ago, we started working on this project called Niagara. And Niagara basically allows you to go and say, hey, I want to clone all these repositories based on some information in our case, build matrix, and then I want to run a particular script. It could be a bash script, it could be anything. And this bash script usually is like, Oh, I want to run a full Gradle build. And what Niagara does is it talks to titles, which is our orchestration engine for containers on top of Kubernetes. So it goes and say, oh, you know what, I'm just going to create thousands of containers. Each one of them will go on a different repo and run your bash script, and then give you back the results. So this is like a noble idea, really nice. But the reality is, when you follow this pattern, it's really difficult to validate changes because sometimes a given build requires network access, security policies or databases. We really don't know how developers approach the way they tested applications. So we found that testing in this layer, it's practically impossible, at least at this point. So we only can do like a specific checks, like can I compile the code? And that's good enough for some sort of information that we can open a pull request and then see what happens with the pull request level. In reality, what we want to do in the future is okay, how can we make sure that if we clone all these repos, can we really run all the tests? Can we operate a library, are running the tests and then if everything looks great, then just open up pull request because the last thing we want to do for customers is open a broken pull request. They will be like, Okay, why are you opening this thing that is broken? Like, it's noisy and I don't trust your tooling. It's really difficult in life when you lose confidence in someone or something to get it back right? So we can have a lot of winnings and then make a mistake with a change and people will unenroll from this and probably we will never get them back. So we need to spend more time on this. And it was interesting for us to share that even when you tried to clone multiple repos, there might be interesting things in your company that would prevent you to just run a full build. So that leads us to other things, which is how do we improve the performance and consistency of experience across local and CI, which is the most important thing on this talk, I believe. So one of the things that we found over the years, is people tend to use CI agents their own way. They might decide to say, you know, every single time I run a build, I just want to clean up my whole workspace. I want to use a new Gradle home, things like that. I'm using Gradle because we are very into using Gradle. With this we'll apply definitely with our build tools like you might have dependency caching, binary integration is hard, honestly. So what we have found is, okay, most of times people decide to just clean their plugin because they don't trust in the build to some capacity. And that means we are going to download again every single thing that you need to build a project. And when you think about like a modern application using Java framework, like a spring boot, micronaut, you name it, you're kind of downloading a huge chunk of the internet just to do some sort of work. So we decided to start distributing a common use dependency cache across all the CI agents. What that means is that internally we have wrapper on top of a spring boot that basically publishes a bomb, that contains all the dependencies that the developers should be using in order to build applications. And what we do is we analyze that bomb. We also look at all the downloaded dependencies that people get from Maven Central and get the most popular ones, and then create a read-only cache that we ship to every single CI agent. So and also Docker images. So for the process that I'll be mentioning around opening pull request, we also have the destroyed cache. So what happens is when you have a new project that doesn't have any cache. We have seen people that takes three or 4 minutes to download dependencies. It's probably 400, 500 megabytes. You can see that it's hundreds of files that you need to download when you have something that is shared across multiple agents and is also consistent because you know that all your team should be building against a specific version of a platform. Then we were able to reduce on CI the downloading to just a few seconds. So we gave back engineers 3 minutes when they had like a brand new build or a brand new Jenkins job. So when they decide to just wipe out the whole workspace.

Aubrey: Thank you. Yeah. So that's talking about the most common artifacts. But what about artifacts that are brand new, artifacts that only a few projects need, artifacts that are just more rare in the area? We're going to talk about a different kind of artifact caching right now. So we use this tool called Varnish, which is really wonderful, to help us lighten the load on our artifact storage. Another one that's really big for us is it helps cache 404s. We have a number of virtual repositories under the hood for when you're requesting a particular artifact. And as you're looking for artifacts, if you say this version should exist, go check all of these different repositories. We're able to use Varnish to say like, Hey, we didn't find it. It's still in the time to live for this particular request, so don't go looking anymore. You don't need to make various different metadata requests or artifact requests to everything else because we know about this. So we're able to use Varnish Cache to be able to request all of the artifacts that are used. We use as a very fast dependency cache, and we're also using it as a build task output cache. So what this is, is for the fingerprinting of your particular build task. Let's say you're running compile Java and or you're running generate some sort of code when you have your task, there's various inputs that you can define for it, such as these are the files that you're going to look at, these are the properties, files, these are environment variables, whatever it is. And each of these you're going to make a hash that's going to be associated with those items, those attributes. As long as you are having a hit of this fingerprint, then you're able to say, hey, I ran this compile Java task before I have all the exact same inputs, so give me the result there so I don't need to do it again myself. This is a really great way to save work. So you're not just redoing the same sort of things over and over, and especially if you have tasks that are very intensive, you can save tons of time by being able to do this. We did this with Varnish Cache because we had this set up prior to running Gradle Enterprise. Gradle Enterprise also does this, but we have it with Varnish. So a couple of the challenges that we ran into when we were doing this is the first one was cross region replication. We have a number of builds that are running in a particular region, but we have builds that are running in other regions or other ones even further away. So if someone requests an artifact, we want to be able to have it equally accessible from wherever it is you're going to be running this build. Similarly, we had challenges with cache eviction, so whenever something has entered the cache, it is now everywhere. And if for some reason we want it to not be everywhere. So for instance, we've had some artifacts published that may have had a really bad performance problem. It was bad enough that we said that, yeah, it's going to impact everyone if we just remove this from artifact storage. But it's worthwhile because we really don't want anyone to use it at all. It's quite bad. So we'll go ahead, remove it from artifact storage, Artifactory and then we want to make sure that it's also not going to be served from the cache itself at this point. Varnish comes with a really wonderful feature called Broadcasting, where we can say, hey, we're going to be purging this and send this to all the instances across all the regions all at once. We just have one request and then it goes and removes everything right away. Similarly, we've also run into the situation where a particular task or plugin didn't necessarily have all the task inputs configured correctly right out of the box. So sometimes people would run a build. They would see that, hey, we used this task cached output, but it's not correct. So I was running on a different flavor of machine. And so it's no longer applicable, right here is one of the areas where we've seen it where something was like, oh, Mac specific versus Linux specific. So we wanted to have a good way to evict many build-specific cache keys. Currently, the way we do it is we have to find each cache key and enter it in as a request to go ahead evict from there. So that's definitely one of the things that we're working on to make that a little bit more streamlined. Oh, goodness. I hope everything's all right there.

Roberto: Yeah. Yeah.

Aubrey: I'll wait for a second till some of these things get quiet once more.

Roberto: Write about them or something happens anyway.

Aubrey: There we go. All right. Well, I hope everything is all right on those alerts, but we're going to continue. So taking a look at the artifact cache effectiveness, this is one of the things that we're really excited about. We're able to see that most of the time we have about a 70% cache hit. You can see this going up and down over time. And then you might notice this thing right in the center where there's a bunch of cache hits. So this is related to one of the things we were talking about earlier where between about 2 a.m. to 6 a.m., we do tons of pull requests to different builds. So it's running the same kinds of things over and over. We're not necessarily going to be changing dependencies, some of them do, but we are going to be resolving a bunch of builds, a bunch of dependencies, and so you'll see that there's a lot more cache effectiveness during that particular time. This brings us to remote test execution.

Roberto: So by now you probably have to sense the need for speeding up things, especially builds. And you might wonder like, okay, how did you survive with this in the last few years until now? So the next thing we are looking into or we are already working on is remote test execution. This is something that we use with Gradle and Gradle Enterprise, but it's something that people can apply in different platforms for sure. People are already doing it with other build tools too. Basically the concept is to take any local CI build and instead of running the tests within the machine that is executing the build, ship that work to remote workers. This way you can have better prioritization, reduce the usage of compute resources locally, and also get like a better experience across local development and CI. We don't want to get into more details on this. I'm going to be speaking with Marc Philipps on how they build the Test Distribution in Gradle and the learnings we have had with this approach at Netflix at 3 p.m. in this room. So if you're interested, definitely come in. But it's remarkable to call out these. When we started looking into this particular feature, we were working with teams that support one team call Open Connect. If you're not familiar with Open Connect. Open Connect is the CDN at Netflix that basically every time you go and watch a movie instead of a streaming that from the cloud, we have really a box close to your home with an ISP and you get the movie from them. So at this point, you can imagine they're highly important to us. So this particular build used to take 15 minutes prior to Test Distribution, once we do this Test Distribution and more parallelization better compute usuage and also the consistency that we have mentioned. We took down to almost 6 minutes. So this is more than a 50% winning, and people are loving it, in the sense that when we have an issue, outage, or something, because we are still learning from the tool, they come to us and tell us, hey, don't take away the nice things from us. Please fix it rather than, oh this doesn't work. Okay. So one very important part is we have to spend time on not only improving the build performance, but also making sure that if you are a engineer or support engineer or just as a customer, you should be able to see what happened and why. Obviously, we enhance the build insight not only by using Gradle Enterprise, but also exporting information in a way that could be more digestible for people. So for example, if you're on a Jenkins job, things like simple things like just sharing a link is always helpful for people. Like they don't have to go and crawl all the logs and try to find where it happened. They can just go on click a Build Scan. This is the small thing that might take a few minutes, but it's really gonna help it to save tons of time. Asking an engineer. It's not for me to a build tool run a particular build configuration that somebody else configured for their team to want to start looking at the logs and try to make some sense of it. So if you kind of spend time on trying to surface as much as possible in a more friendly way, it always paid off. In addition, well, we have Gradle Enterprise. We're able to filter by many facts, many custom things that we create. And we were able to export this data to create a proper dashboard for our team to understand it across like Nebula Gradle versions or just how things are getting built, local versus CI or things like, okay, who is using IntelliJ, what versions that they're using? Can we roll out new plugins, things like that. So really having this visibility as a platform team has been very important to us. In order to take decisions on what to do next, can we introduce making changes? Should we do it? Probably we shouldn't. But having this data is always helpful for us.

Aubrey: All right. So another thing that we want to be taking a look at is flaky tests. So, first of all, what are these flaky tests? Here's a great example where you can see that something was succeeding some of the time, but also failing some of the time. And this has to do when there are no code changes that have been made. Typically, there is some sort of infrastructure problem. Maybe you're talking to a database, but the data has changed. Maybe you're reaching out across a network. Network connectivity drops for some reason. There's a number of reasons that could cause a flaky test, but overall, this is kind of what they all look like. Here's your test report view. So we wanted to be able to see how often is this happening as you're taking a look at these images? I think it's 15% of our builds that were running this particular week, had some sort of failing test in them of all these builds as well. 1% of them had a flaky test someplace in there. So these are things which traditionally would have failed someone's build either locally or in CI. But overall, it doesn't necessarily mean that the test failed because it's flaky. It's giving kind of a weird signal over here. So we want to save people time by defaulting to running these tests three different times. And if by the end it has still failed, well you have a failed test? But if it has succeeded by at least the third time, then you have a flaky test. We're able to see individual breakdowns with the Gradle Enterprise dashboard. And so you can see over the top image. Some things were going pretty wrong for a while. Early in the process. There's a lot of flakiness. Some of the builds actually went red. They didn't succeed at all after the three tries, but later in the week things resolved themselves. So there was probably some sort of network infrastructure issue and this could be fixed in the future if the team will go take a look at it in the second project on there. These are real projects that we have. You can see that there's still some test flakiness that goes across the entire duration of things that we're seeing there. So there's probably some sort of issue that's still going on and perhaps something with timing in the test setup. So ask that we have to our teams is while we make this easier that you don't need to go investigate a test that will succeed on a rerun, we're saving you time by just marking this as flaky. So please just go ahead. Take a look at this. Don't just leave them flaky forever because it's not useful for the team themselves. Even if we're helping them out by not having to go investigate it. Kind of not real red test. All right. Time for us to take a look at what's on the horizon. So one of the things that we're super excited about is predictive test selection. This is a new item from Gradle and we are rolling it out. Right now, we're in the early phases of friends and family and then are taking a look at rolling it out even further. So what is this? We're going to be hearing a lot more about it from Eric and Luke. I think tomorrow. They have a talk, I think, at 3:00. Don't don't hold me to that one. But a predictive test election overall, it helps increase developer productivity by going ahead and figuring out what are the best tests that can be run for this particular code change. It's going to be taking a look at what are tests that have failed in the past, rerun these, what are tests that are related to this particular code change that you have made, run these right here. The predictive model is really going to help figure out what are the best changes, what are the best tests to run for these particular changes. And it seems to have a very powerful outcome on time savings. So when do we use it? Right now. Currently, we have this configured to run on local builds that are not executing a publishing task on CI builds that are pull requests, and in CI where the branch is not the default main branch, and it's not a release branch. So the reason we chose these is we wanted to have areas where people are going to be feeling this pain of I have long tests the most, so if I'm running things locally then I want the feedback to be as fast as humanly possible. I really want to only run the things that are related to my particular code changes and let me know if things look good. Similarly, if it's a pull request, I'm probably iterating on something. I'm waiting to make sure that it goes green before I can merge it in there. We want to just make sure we're providing great feedback as fast as we can. And so far the simulations are very promising. This is one of the really wonderful things that I love here, is we have only rolled this out in reality to just a few different projects of friends and family, but we're able to see across all of our builds what does this look like if we were to turn it on tomorrow? With this, we're able to really see the dramatic savings that can be found. Once we fully roll out this particular feature, we're able to go to teams and say, like, hey, this is upcoming. We're very excited for it and you're going to be saving tons of time. Here's an example of one project with predictive test selection. So you can see these are the things that are selected versus not selected. And it kind of varies over time as whatever code change was put in there. And this brings us over to another item about testing.

Roberto: Yeah. So one of the pain points we have seen at Netflix is integration testing. People need to write this against databases. Let's say you have a Postgres database, Cassandra database, you name it. They have spent time doing like embedded databases. Sometimes they use Docker, compose, sometimes if they are mature enough, they go on to use something like test containers. The reality is the experience, as we mentioned before, is different from local and CI. If you're an M1, OSX, Apple M1, Docker is different from what you might get in Jenkins. Similarly, we have seen issues where IPV6 support is not the same. The CI team doesn't really own the experience around Docker. They provide Docker for you, but you are up to like a spin up a container. Clean it all. Thankfully, things like Testcontainers do that cleanup for you, but if you are using Docker compose, well good luck with that. You need to make sure that you shut down these services. Otherwise, you're just going to kill the agent and be about tenant and be a bad neighbor to other projects if you are sharing that with them. Plus, running Docker on CI on a local requires memory of this usage and as I said, remove the need in our case from Test Distribution to run agents that require Docker. So we are looking into test containers cloud, which is a better project from Atomic Jar if you're not familiar with those containers. Those containers is a library java multiple languages allows you to do spin up containers programmatically you shut them down and then right integration test the proper way I would call. So the thing with this containers cloud is allow you to spin off all these things in the cloud remotely on things that they manage. So we don't need to really care about operating Docker at the scale operating Docker for local or CI, you can be the same experience for engineers, so we're really looking forward to see where are the learnings here.

Aubrey: Another area that we're taking a look at is IDE enhancements. So a few of the things that we are very excited to investigate and proceed forward with are things like shared indexes for internal projects. Right now, there are shared indexes for IntelliJ, for example, that are available for things in Maven Central, but for our internal repositories. As you may recall, we publish a bunch of artifacts. We also want to have the similar kind of shared indexes. So this is something that we haven't rolled out, we haven't started working on yet, but we are looking forward to doing. Another item is about helping to manage the IDE versions in use. Another one of the slides that Roberta was showing earlier was the skew of IDE versions, specifically of IntelliJ IDEA, and there is a lot of them on there, and all of them are various years. We'd really like for people to be able to be using just this year, or this year and last year, and help folks upgrade this as well as their code bases. So one of the tools that we've talked about is Jet Brains Toolbox Enterprise. They have the ability to help manage IDE versions, help manage plugin versions. And we're taking a look at getting this all set up so things are easier for local developing processes. Another one is we have a bunch of teams well, we have a few centralized teams that are rolling out IDE based plugins to help with either code migrations or just help with their particular workflows. And we want to make sure that as this becomes more popular, that we're able to have these teams test against the latest IDE versions. Currently, you have to specify what IDE version you want to test against. When a new version comes out, then you have to update your build and make sure you're still testing against that one there, and there should be some of our shared infrastructure. So we have some key takeaways for you. Want to make sure that as you're running your builds for local and CI, you're avoiding work at every level, wherever possible. You want to be avoiding these things. This can be helped with read-only dependency caches, can be helped with task caches so you don't do the same work over again. You want to be streamlining your work. In our case, we do it with Test Distribution and Test Selection. So these are very wonderful things. Help everything go nice and easy and there's really just always low hanging fruit. Whatever it looks like at your particular company, it could look like these, it could look like other items, but there's things to address. And that the local and CI experience should be consistent, makes for more understandability, better reproducibility. It's a good way to do it. So please also come see us at these other talks as well. Roberto and Marc Phillips are going to be talking about how we built a distributed testing platform at 3:00 right in here. And then later today at 4:00, I'll be part of the live DPE showdown with some other fabulous people right up here on stage. So thank you very much. Hope you have enjoyed.

A More Integrated Build and CI to Accelerate Builds at Netflix

Transcription