Hands-On DevOps Engineering

Hands-On DevOps Engineering

Day 25: Policy as Code Implementation

sdr's avatar
sdr
Dec 13, 2025
∙ Paid

What We’re Building Today

Today you’ll deploy a complete policy enforcement system using Open Policy Agent (OPA) and Gatekeeper. We’ll build a dashboard that shows policy violations in real-time, automatically blocks non-compliant deployments, and generates compliance reports. Think of it as your cluster’s security guard that checks every resource against your organization’s rules before letting it in.

High-Level Agenda:

  • Deploy OPA and Gatekeeper for admission control

  • Write Rego policies for compliance (container security, resource limits, labels)

  • Build a policy violation monitoring dashboard

  • Implement automated remediation workflows

  • Create compliance report generation system

Core Concepts: Policy as Code

What is Policy as Code?

Policy as Code treats security and compliance rules like software code - version controlled, tested, and automated. Instead of manually reviewing every deployment, you write policies once and enforce them automatically. Netflix uses OPA to ensure thousands of microservices comply with security standards without human intervention. Spotify enforces resource quotas and label requirements across 100+ engineering teams using Gatekeeper.

Open Policy Agent (OPA)

OPA is a general-purpose policy engine that uses Rego language to define rules. It works as a sidecar or admission webhook, evaluating JSON payloads against your policies. When a developer tries to deploy a pod without resource limits, OPA evaluates the request, finds it violates your policy, and rejects it before it reaches the API server.

Key Components:

  • Rego Policies: Declarative rules written in Rego language that define what’s allowed

  • Data Documents: JSON context data used for policy decisions (current state, configurations)

  • Query Engine: Evaluates policies against input and returns allow/deny decisions

Gatekeeper: OPA for Kubernetes

Gatekeeper is OPA’s Kubernetes-native implementation using CustomResourceDefinitions. It introduces ConstraintTemplates (reusable policy blueprints) and Constraints (policy instances with parameters). Instead of writing raw admission webhooks, you define a ConstraintTemplate once and create multiple Constraints with different parameters.

How It Works: When you apply a Kubernetes resource, the API server sends it to Gatekeeper’s webhook. Gatekeeper evaluates it against all active Constraints. If any policy fails, the resource is rejected with a detailed error message. This happens before the resource enters etcd, preventing non-compliant deployments entirely.

Policy Workflow

  1. Definition Phase: Engineers write ConstraintTemplates defining policy logic in Rego

  2. Application Phase: Ops teams create Constraints from templates with specific parameters

  3. Enforcement Phase: Gatekeeper intercepts API requests and evaluates them

  4. Monitoring Phase: Dashboard collects violations, audit logs, and compliance metrics

  5. Remediation Phase: Automated workflows fix violations or alert responsible teams

Context in Distributed Systems

Policy Enforcement in Production Systems

Major cloud platforms use policy engines extensively. AWS Service Control Policies, Azure Policy, and GCP Organization Policies all implement similar patterns. Shopify enforces policies on thousands of merchants’ stores to ensure PCI compliance. Uber uses OPA to enforce data access policies across microservices handling millions of ride requests.

System Design Integration

Policy as Code sits at the Kubernetes admission control layer, acting as a gatekeeper between users and the API server. It integrates with your GitOps workflow from Day 24 - policies are stored in Git, synced via ArgoCD, and enforced automatically. Your External Secrets Operator ensures policies can’t be bypassed by hardcoded secrets. The monitoring system feeds into your observability stack, creating a closed-loop security system.

Production Impact:

  • Prevents misconfigurations: Catches 95% of deployment errors before they reach production

  • Ensures compliance: Automated PCI, SOX, GDPR checks replace manual audits

  • Reduces incidents: Enforcing resource limits prevents resource exhaustion attacks

  • Speeds audits: Automated compliance reports cut audit time from weeks to hours

Architecture

Component Architecture

User's avatar

Continue reading this post for free, courtesy of ctoi.

Or purchase a paid subscription.
© 2025 ctoi · Privacy ∙ Terms ∙ Collection notice
Start your SubstackGet the app
Substack is the home for great culture