What it is. Why it matters. How to do it.
Break down silos. Automate relentlessly. Deploy with confidence.
Definition
DevOps is both a culture and a set of practices that unifies software development and IT operations [5]. The goal: deliver software more reliably, with continuous improvement [1].
Many teams moved away from the old model: developers handed off code to a separate operations team, who deployed and ran it with little context.DevOps emphasizes cross-functional teams owning the full lifecycle. Build it, ship it, run it.
This site covers several areas where DevOps thinking tends to show up in practice: the recurring principles that appear in the literature, the practices teams commonly adopt, the feedback loops that catch problems earlier, observability, delivery metrics, organizational change, and the tooling teams reach for. These reflect how we think about it at Creo Design — not a canonical definition.
History
The term DevOps emerged around 2009, building on earlier ideas from Agile software development, Lean manufacturing, and the practitioner community forming around the annual Velocity conference.Patrick Debois is widely credited with coining the term after organizing the first DevOpsDays in Ghent, Belgium [6]. The movement grew from there — driven by practitioners sharing real experience, not a standards body or academic institution.
The Phoenix Project (2013) and Accelerate (2018) turned early practitioner ideas into a broader movement.High-performing teams commonly practice CI/CD, infrastructure as code, and blameless post-mortems — patterns consistently associated with better delivery performance in longitudinal research [1][4]. How widely DevOps is "adopted" depends heavily on how it is defined, which varies across organizations and sources.
There is no single industry-agreed definition. DevOps has evolved alongside Agile, Lean, and SRE, and continues to be shaped by the organizations that practice it. AWS, Microsoft, and Atlassian each describe it differently.
01
These themes consistently appear across DevOps literature and high-performing engineering teams. They are not a fixed standard — interpretations vary by organization and context.
Break down silos between dev, ops, and security. Cross-functional teams with shared goals, shared on-call, and shared ownership.
Automate the path from commit to production. CI/CD pipelines that build, test, and deploy with every change — minimizing manual gates.
The faster you learn, the faster you improve. Feedback loops at every stage: from IDE linting to production monitoring to customer insights.
Treat infrastructure like application code. Version-controlled, peer-reviewed, automatically provisioned. Eliminate click-ops.
Shift security left. Automated scanning in CI, secrets in vaults, least-privilege IAM, and compliance as code. Baked in, not bolted on.
Blameless post-mortems. Retrospectives that drive action. A culture where failure is a learning opportunity, not a career risk.
DevOps is frequently misunderstood. These are patterns that come up often — and why they miss the mark.
It describes a culture and set of practices, not a role. "DevOps Engineer" is common shorthand, but DevOps itself is bigger than any single position.
CI/CD is one practice among many. DevOps encompasses culture, collaboration, observability, feedback loops, and organizational change.
Shared responsibility, not elimination of operational expertise. Someone still owns infrastructure, reliability, and incident response.
Definitions vary across practitioners, academics, and vendors. This site presents one structured lens, not the only one.
02
These are widely accepted DevOps practices, though interpretations vary across teams and organizations.
Git, trunk-based development, pull requests. Everything starts with code collaboration. Not just app code, but infrastructure, config, and docs.
Automated build, test, and deploy on every push. GitHub Actions, GitLab CI, Jenkins. The tool matters less than the discipline of automating the path to production.
Docker for consistent environments. Kubernetes for orchestration. Ship the same artifact from dev to staging to production.
Logs, metrics, and traces. Datadog, Prometheus, Grafana. You cannot fix what you cannot see. Make your systems transparent.
Decouple deployment from release. Ship code dark, enable for a subset, roll out gradually. Dramatically reduce blast radius.
GitHub Actions, GitLab CI, and cloud providers support OIDC. No long-lived credentials. CI requests a short-lived token; cloud validates the workflow and grants least-privilege access. No credentials to rotate.
Short-lived branches, merge to main daily. Feature flags instead of long-lived feature branches. Smaller PRs, faster reviews, fewer merge conflicts.
These are examples of commonly used tools and techniques; DevOps is not defined by any specific vendor or platform.
03
One of the core principles of DevOps is rapid feedback loops. Most problems are found eventually. The question is when. The further right a bug travels, the more expensive it is to fix [2]. Many high-performing teams invest in pushing feedback earlier.
Worst
"Something is broken." A support ticket, a frustrated tweet, a churned user. They found the bug in production, after it shipped, after it deployed, after every check failed to catch it. High cost and significant trust damage.
Late
Manual testing in staging. A human clicked through the flow and found the regression. Better than production, but often slow, expensive, and hard to scale. The deploy is already queued. Everyone context-switches.
Better
E2E tests, integration tests, load tests, security scans. Automated gates that run on every push. The PR can't merge until they pass. Runs automatically on every push, catching regressions before they reach staging.
Good
Unit tests fail before the code even reaches CI. A teammate reviews the PR and spots the logic error. Automated review tools can also flag edge cases. Many bugs are caught before they leave the branch.
Best
TypeScript squiggles. ESLint rules. Pre-commit hooks. IDE plugins. Inline feedback. Bugs are often caught as you type, before you save, before you commit, before anyone else ever sees them. Low cost. Low delay. This is the goal.as you type
The principle: invest in feedback loops that fire earlier. Every lint rule, every type annotation, every pre-commit hook, every unit test moves discovery closer to creation. That's the leverage.
04
Observability is the ability to understand what your system is doing, and why, from its external outputs alone. Not just dashboards. Not just alerts. Better confidence that your product is healthy, and the clarity to trace problems to root causes faster.
For the product team
Real-time visibility into what users experience. If something breaks, the team often knows before the first support ticket lands. Less guessing, less "it works on my machine". Data showing what's happening in production right now.
For the engineering team
When an alert fires at 2am, the on-call engineer needs to understand the whole picture fast. Good observability means tracing from symptom to root cause across services, seeing the full journey of a request, and resolving incidents without the person who wrote the code being available.
These are examples of common patterns; observability is not defined by any specific tool or vendor.
A single request ID that follows the entire journey across services. In our AWS stack, that looks like: API Gateway to Lambda, through SQS, into another Lambda, out to DynamoDB — one ID, one query, the full story. The same concept applies in any distributed system, regardless of cloud provider or tech stack. Without it, debugging means correlating logs across services and hoping timestamps align.
Step-by-step guides for common incidents. The person who wrote the service may not be awake, or still with the company, when it breaks at 3am. Runbooks capture their knowledge: what the service does, how to diagnose it, what to restart, who to escalate to, and what not to touch.
Set up your log insights, saved searches, and dashboard queriesbefore the incident. In AWS, this means CloudWatch Logs Insights query definitions ready to go. A library of common failure patterns one click away. During an outage is the worst time to learn your query syntax.
Document known failure modes. Then brainstorm the likely ones that haven't happened yet. "What happens if the SQS queue backs up?" "What if the third-party API starts returning 429s?" Write it down, set up the alerts, prepare the response. Setting up for success means preparing for failure.
JSON logs with context: request ID, user ID, action, duration, status code. Not console.log('something broke'). Structured logs are queryable, indexable, and traceable. They turn your logging from a wall of text into a searchable database.
The principle: observability is not a feature you add after launch. It's a design decision you make from the start. Services benefit from structured logs, trace IDs, and runbooks. The goal is not to prevent all failures. It's to make failures easier to understand, find, and fix.
05
The DORA metrics quantify software delivery performance. Their research suggests speed and stability can reinforce each other — many high-performing teams excel at both [1].
On-demand
Deployment Frequency
Multiple deploys per day
< 1 hour
Lead Time for Changes
Commit to production
< 1 hour
Time to Restore
From incident to recovery
< 5%
Change Failure Rate
Deploys causing incidents
Source: DORA — Accelerate State of DevOps Report, 2023 [4]. dora.dev
06
This is how we've approached DevOps adoption at Creo Design. It won't be universal — every org is different. But the pattern that has worked for us: pick a specific, painful problem, build a case for it, get buy-in, execute with support, and report results. What follows is our experience, not a prescription.
Step 01
Look for pain that's real and measurable. Maybe teams are deploying manually and every release is a white-knuckle event. Maybe on-call is burning people out and nobody tracks why. Maybe cloud costs doubled last quarter and nobody knows where. Pick the problem that hurts the most, the one people already complain about. That's your campaign. The best ones are specific: "every service gets a CI/CD pipeline with automated tests by Q3," not "improve DevOps maturity."
Step 02
Quantify the problem. How many hours per week does manual deployment cost? How many incidents trace back to missing tests? What's the monthly spend on idle resources? Then define what success looks like and what it takes to get there. Package it into a proposal and pitch it to leadership. If they see the value — reduced risk, lower costs, faster delivery, happier engineers — they'll push it. Leadership buy-in is often a force multiplier. Without it, progress is slower. With it, change accelerates.
Step 03
Don't send a wiki link and call it done. DevOps partners with each team to make the changes — pair on pipeline setup, review infrastructure, investigate cost anomalies together. The goal is enablement, not enforcement. Teams that feel supported adopt faster than teams that feel policed. Build the golden path, then walk it with them.
Step 04
Send regular status updates to the leadership group. Which teams have adopted? Which are blocked? What's improved? Make progress visible — dashboards, weekly digests, whatever the org responds to. When leaders see their teams' names on the board, they push. When teams see others ahead of them, they move. Transparency creates accountability without micromanagement. The campaign isn't done when the first team ships — it continues until all teams have adopted.
These three problems come up at most organizations we've worked with. They're not the only starting points, but they're usually the most visible.
Confidence
Every service gets automated build, test, and deploy. Not just a CI check — full continuous delivery with integration tests, security scans, and guardrails that let teams ship without fear. Measure: deployment frequency, change failure rate, time from commit to production.
Operational health
Regular on-call debriefs to surface what's actually burning time. How many engagements last rotation? How many were repeat issues? Were runbooks accurate? Are alerts firing at the right thresholds? Track touch points — every support ping is a signal that something needs to mature. Measure: on-call burden, repeat incidents, KTLO hours per team.
Cost
Identify the top cost drivers and evaluate paths to savings — right-sizing, reserved capacity, cleaning up orphaned resources, tagging for accountability. Not about cutting corners; about eliminating waste. Teams that see their own spend make better decisions. Measure: cost per service, month-over-month trend, idle resource spend.
These are structural choices that have made improvement easier to sustain at Creo. Your mileage will vary.
Build golden paths that teams choose to follow — reusable workflows, starter templates, shared Actions catalogs. When a data team needs a pipeline, they shouldn't need a DevOps ticket. When an app team needs deployment, they shouldn't need to learn Kubernetes from scratch. When the right thing is the easiest thing, adoption is more likely.
Every time someone asks for help instead of finding the answer themselves, that's a signal. Count touch points per product — every support ping, every doc search that came up empty, every "who do I ask about X?" in Slack. If Product A generates thirty support requests a month, that's where investment goes: better logs, more intuitive interfaces, clearer error messages, searchable documentation. Each improvement compounds.
Some actions need a gate — secret generation, OIDC roles, production access. Build self-service paths: teams submit a request, automation handles approval and provisioning. Meet people where they are — some live in code and want full GitHub access; others prefer a portal that deploys, rolls back, and audits without requiring commits. Both paths should lead to the same outcomes. Teams stay unblocked. Security stays in control.
The principle: organizational change tends to follow a pattern. Identify the pain. Build the case. Get buy-in. Execute with support. Report progress. In our experience, transformation is less about which tools a team uses and more about whether someone committed to seeing it through.
08
Emerging TrendThis is how Creo Design documents project context for AI coding tools. It's not a DevOps standard — just a workflow that's reduced friction for us. Three files that give an AI coding assistant the context it needs to be useful.
CLAUDE.md
Architecture, conventions, file map, design system, verification commands. Claude Code reads this automatically to understand your project.
AGENTS.md
Core rules, naming conventions, design system tokens, verification commands. Works with Claude Code, Cursor, GitHub Copilot, and more.
DEVOPS.md
Ship small, automate everything, secure by default, observe everything. Reusable across any repo — drop it in and every AI agent learns your standards.
your-project/
├── CLAUDE.md # Claude Code context
├── AGENTS.md # Universal agent reference
├── DEVOPS.md # DevOps & security principles
└── .cursor/
└── rules/
├── project.mdc # Project overview (alwaysApply)
├── code-style.mdc # Naming, formatting (alwaysApply)
├── typescript.mdc # TS conventions (*.ts, *.tsx)
├── components.mdc # Component standards (*.tsx)
└── security.mdc # Security patternsFeed the raw URL to any LLM as context: what-is-devops-fdly884ew-creo-design.vercel.app/api/raw/DEVOPS
09
The relationship between DevOps and Site Reliability Engineering is actively debated among practitioners. Some view SRE as a specific way to implement DevOps values. Others treat them as related but distinct disciplines — SRE focuses on reliability engineering, service-level objectives, and error budgets, while DevOps is a broader cultural philosophy. Neither characterization is universally agreed upon.
Google's SRE book [3] offered a well-known framing: “class SRE implements interface DevOps.” This captures real overlap, but it reflects Google's interpretation of their own practice — not an industry-standard definition.
Where they clearly converge: both emphasize shared ownership between development and operations, blameless post-mortems, automation over manual toil, and data-driven decisions. SRE tends to be more prescriptive (specific roles, error budget policies, SLO thresholds); DevOps tends to be more philosophical.
A practical distinction at Creo: DevOps is the philosophy and culture applied across the organization. SRE-style practices — SLOs, runbooks, on-call rotation, reliability reviews — are specific engineering disciplines that often live inside that broader DevOps context.
Less about tools. More about shared responsibility, continuous improvement, and creating space to try things without fear.
Read the principlesSources
DORA Research
The data behind elite teams. Deployment frequency, lead time, change failure rate, recovery time.
DevOpsDays
The community conference where the term DevOps was coined. Practitioner-led, worldwide.
Martin Fowler
Foundational writing on continuous integration, continuous delivery, and infrastructure as code.
The Twelve-Factor App
Methodology for building modern, portable, deployment-ready applications. Config, dependencies, logs, disposability.
DevOps Roadmap
Community-driven learning path. From Linux basics to CI/CD, containers, monitoring, and cloud.
References
Whatisdevops.com is written and maintained by engineers at Creo Design. It reflects our experience and perspective — not an independently peer-reviewed academic resource.
We periodically review the content for accuracy and update it as our understanding evolves. Where we make claims about industry research, we cite primary sources.
Last reviewed: February 2026.
If you spot an error, an outdated reference, or a claim that needs stronger sourcing, open an issue on GitHub. We take corrections seriously.