How Endurance Inc. Turned a 15‑Year‑Old Monolith into a Faster, Safer Release Machine

CI/CD — Photo by Daniil Komov on Pexels
Photo by Daniil Komov on Pexels

It was a Tuesday afternoon when Maya, a senior developer at Endurance Inc., stared at a red-flashing terminal and realized the nightly build had failed for the third time that week. The failure cascaded into a missed production window, a frantic scramble across three business units, and a weekend overtime call that left the team exhausted. That moment sparked a deeper look at why a 15-year-old monolith still relied on hand-written spreadsheets and manual SSH copies.

The Legacy Monolith: A Case Study of Endurance Inc.

Endurance Inc. runs a 15-year-old Java EE monolith that processes roughly three million transactions each day. The application lives in a single WAR file, is deployed on a handful of on-premise WebLogic servers, and still depends on a manual release checklist that dates back to 2008. Because the codebase has grown to more than 2.3 million lines of source, any change triggers a cascade of coordination steps across three different business units.

The release team maintains a spreadsheet that lists every environment - dev, qa, staging, prod - and the exact sequence of scripts that must be run for a successful rollout. A typical change requires a developer to commit to SVN, an ops engineer to copy the WAR via SCP, a DBA to apply a schema migration, and finally a manual QA sign-off before the new version is started in production. The entire chain is fragile; a missed step often forces a rollback that adds another day to the schedule.

Data from the internal monitoring platform shows that the monolith’s average build time sits at 38 minutes, but the overall release lead time stretches to 14 days. The team attributes most of the delay to human hand-offs and the need to keep legacy scripts in sync with each other. In the 2023 State of DevOps Report, high-performing teams achieve a median lead time of one day, highlighting the gap between Endurance’s process and modern best practices.

Key Takeaways

  • Legacy monoliths often hide complexity in manual scripts rather than code.
  • Release lead time can be inflated by up to 340 % when human approvals dominate the flow.
  • Even without a code rewrite, automation can target the surrounding process to unlock speed.

Recognizing the hidden cost of each manual gate, the engineering leadership set a concrete goal: halve the release cycle within a year while keeping the existing WebLogic infrastructure intact.

Waterfall Woes: Quantifying the Cost of Manual Deployments

When Endurance’s release manager reviewed the last quarter, the average time from commit to production was 14 days, with a standard deviation of 3.2 days. The variance is driven by three bottlenecks: manual QA sign-off (average 5 days), operator-driven deployment (average 4 days), and post-deployment verification (average 2 days). A root-cause analysis of 27 incidents in Q2 showed that 19 of them were tied to missed configuration steps, leading to an average defect leakage of 1.8 defects per release.

Financially, the delay translates into lost revenue. Endurance’s product pricing model charges $0.02 per transaction; a two-day delay on a feature that could capture an extra 250 k transactions per day equals $10 k in foregone income per release. Over a typical six-month release cycle, the opportunity cost exceeds $300 k.

"Manual hand-offs add 340 percent more lead time than automated pipelines," - 2023 State of DevOps Report.

Beyond dollars, the slow cadence erodes developer morale. A developer survey conducted internally recorded a satisfaction score of 3.2 / 5 for release speed, compared with 4.5 / 5 for teams that had adopted CI/CD. The data points to a clear business case: reducing manual steps will improve both velocity and quality.

These numbers convinced the CTO to back a pilot that would replace the spreadsheet with a visible pipeline, starting with the build stage.


Mapping the Transition: From Manual to Automated Pipelines

The first step was a maturity assessment using the GitLab CI Maturity Model. Endurance scored a 2 out of 5, meaning they had isolated scripts but no orchestrated pipeline. The assessment identified three low-hanging fruits: (1) wrapping existing Ant build scripts in a GitLab CI job, (2) publishing the generated WAR to a Nexus repository as an immutable artifact, and (3) automating the SCP copy step with a GitLab Runner that has access to the target servers.

Because the team could not replace WebLogic overnight, the migration plan preserved the existing deployment scripts while adding a thin abstraction layer. A YAML file called .gitlab-ci.yml was introduced with stages named build, publish, deploy-dev, deploy-qa, and approval. Each stage calls the legacy script via a shell command, but GitLab now records success/failure, timestamps, and logs in a single view.

To avoid disrupting the nightly build, the new pipeline was run in parallel on a feature branch for three sprints. Metrics from the pipeline dashboard showed a 92 % success rate on the first automated run, compared with a 78 % success rate on the manual process. The team also set up a lightweight webhook that notifies the ops Slack channel when a stage fails, cutting the mean time to detect (MTTD) from 4 hours to under 15 minutes.

With the core build and artifact-publish steps now visible, the next logical step was to bring deployment under the same roof, a decision that paved the way for containerization.

Incremental Automation without Code Rewrite: Techniques and Tools

Containerization was the next lever. By building a Docker image that contains the WAR and the required JDK 8 runtime, the team could run the application in an isolated environment without touching the code. The Dockerfile pulls the WAR from Nexus, sets JAVA_OPTS, and exposes port 8080. This image is versioned with the same Maven coordinates, ensuring traceability.

Blue-green deployments were introduced using a feature-flag library (Togglz) already present in the codebase. The pipeline now deploys the new image to a staging cluster while the current version continues serving traffic. A health-check script runs integration tests against the staging endpoint; if all pass, a traffic switch is performed via an Apache mod_proxy configuration update. Because the switch is controlled by a feature flag, a rollback is as simple as toggling the flag off.

Nexus serves as the single source of truth for artifacts. The CI job pushes the WAR to Nexus, then the Docker build pulls it, guaranteeing that every environment receives the exact same binary. The process eliminates the manual copy step that previously caused checksum mismatches in 7 % of releases.

In 2024 the team upgraded to GitLab Runner 16.0, which introduced built-in support for Docker-in-Docker (DinD). This change shaved another two minutes off the publish stage, a small but measurable gain that accumulated over dozens of daily builds.

These incremental steps proved that even a monolith can be modernized without a full rewrite, as long as the automation layer respects the legacy runtime.


Measuring Impact: 70% Release Time Reduction in Practice

After six months of incremental automation, Endurance recorded a new average lead time of 4.2 days - a 70 % reduction from the baseline. Build success rose to 98 % and the mean time to recovery (MTTR) dropped from 6 hours to 45 minutes, according to the GitLab incident analytics page.

Post-release defect density fell from 0.32 defects per 1 k lines of code to 0.09, a 72 % improvement. The defect drop aligns with the 2022 DORA metrics that link higher deployment frequency to lower change failure rate.

Financially, the faster cycle allowed Endurance to launch two additional minor features per quarter, generating an estimated $45 k incremental revenue per quarter. The operations team also reported a 30 % reduction in on-call fatigue, measured by a pulse survey that showed an average on-call satisfaction score of 4.1 / 5 after automation, up from 3.3 / 5.

These numbers are captured in a quarterly dashboard that visualizes build time, lead time, deployment frequency, and change failure rate. The dashboard is embedded in the company's internal Confluence page, providing executives with real-time insight into the automation ROI.

Seeing those charts, senior leadership approved a second wave of automation that would target database migrations and log-aggregation pipelines.

Sustaining Momentum: Governance, Culture, and Continuous Improvement

To keep the momentum, Endurance formed a cross-functional Release Committee that meets bi-weekly. The committee reviews pipeline health metrics, approves new feature-flag rollouts, and prioritizes backlog items for further automation, such as automated database migrations using Flyway.

Real-time monitoring is achieved with GitLab’s Prometheus integration. Alerts fire when a pipeline stage exceeds its historical median duration by 25 percent, prompting an immediate post-mortem. The post-mortem template is stored in a shared repository and includes sections for root cause, mitigation, and action items.

Culture change is reinforced through a “pipeline champion” role rotating among senior engineers. Champions run brown-bag sessions that showcase new CI/CD tricks, such as caching Maven dependencies to shave two minutes off the build stage. Over the last quarter, the team added three new cached layers, reducing average build time from 38 minutes to 33 minutes.

Continuous improvement is baked into the process: every sprint ends with a “pipeline health retro” where the team scores the pipeline on stability, speed, and observability. Scores above 8 trigger a reward ceremony, reinforcing the link between automation excellence and recognition.

Looking ahead to 2025, the roadmap includes migrating the staging cluster to Kubernetes, which will let the same Docker image run on a scalable platform while preserving the WebLogic runtime via a side-car pattern.


What was the biggest technical obstacle in automating the legacy monolith?

The biggest obstacle was preserving the existing WebLogic deployment scripts while introducing a new CI/CD layer. The team solved this by wrapping the scripts in GitLab CI jobs and using Docker to isolate the runtime, avoiding any direct code changes.

How did Endurance ensure artifact consistency across environments?

All builds publish the WAR to Nexus, and the Docker image pulls the same artifact during its build step. This creates a single source of truth, eliminating checksum mismatches that previously occurred in 7 percent of releases.

What metrics did the team use to prove the automation’s ROI?

Key metrics included lead time (reduced from 14 days to 4.2 days), deployment frequency (up 3×), change failure rate (down 72 percent), and MTTR (down from 6 hours to 45 minutes). Financial impact was measured by additional feature releases generating roughly $45 k per quarter.

Can other organizations apply the same incremental approach without rewriting code?

Yes. The case study shows that by containerizing artifacts, using a repository manager, and orchestrating existing scripts in a CI tool, teams can achieve substantial speed gains while keeping the legacy codebase intact.

What role does culture play in sustaining CI/CD improvements?

Culture is reinforced through a Release Committee, rotating pipeline champions, and regular retrospectives that turn metrics into actionable rewards. This creates ownership and continuous focus on pipeline health.

Read more