The Detection Rebuild, Part 2: Automating Detection Engineering Without Breaking the SOC


Coming off the heels of Part 1, where we focused on fixing the signal problem, Part 2 is all about scale. Because once you’ve cleaned up your alerts and improved your detection quality, the next question is: how do you keep it that way without burning your team out?

This post is a practical look at how to automate your detection engineering pipeline—safely, reliably, and without flooding your environment with untested rules or brittle logic. We’re not here to add DevOps jargon for the sake of it. We’re here to build sustainable detection systems that evolve without collapsing under their own complexity.

Let’s go.


What Detection Engineering Automation Shouldn’t Be

First, let’s set some boundaries. Automation doesn’t mean:

  • Generating 500 rules from every new threat report
  • Enabling every Sigma rule in the repo “just in case”
  • Bulk-deploying untested detection logic into production

Automation without validation is just accelerating failure.


The Goal: Detection-as-Code, Built Like Software

To scale detection engineering, you need to treat rules like code. That means:

  • Version control: Every rule lives in Git.
  • Peer review: No rule gets deployed without review.
  • CI/CD pipeline: Automated tests run before anything hits production.
  • Rollback: If something breaks, you can revert instantly.

This isn’t just clean—it’s survival.


Building a Detection CI/CD Pipeline

1. Rule Linting & Syntax Validation

Run static analysis on detection rules to catch formatting or logic errors. Example tools:

  • sigmac for Sigma rules
  • Custom YAML/JSON schema validators

Example:

In one org, broken YAML indentation silently caused several detection rules to fail during a weekly sync—but no one noticed because there was no validation step. After adding a pre-commit linter, rule failures dropped to zero, and missed alerts from key rules came back online.

2. Test Against Historical Data

Don’t ship rules blind. Run them in dry-run mode against last 30 days of data:

  • Count how many matches would have occurred
  • Compare against known alerts
  • Identify noise before it hits production

Example:

A detection for “rare login location” matched 1,200 events over 30 days. A quick review showed 90% were due to employees traveling for conferences. After tuning with known event calendars and tagging approved destinations, the final rule averaged just 3 alerts a week—and caught a real stolen token incident the next quarter.

3. Simulate Alerts with Controlled Inputs

For behavioral detections:

  • Replay simulated attack data into your test environment
  • Validate that your rule fires under known conditions

Example:

After deploying a lateral movement rule, one team replayed Atomic Red Team modules through a sandbox. Half the expected alerts didn’t trigger. Root cause? Their rule was too tightly scoped to specific process names. With broader parent-child logic and command-line keyword matching, they got full coverage without adding noise.

4. Tag & Route by Environment

Tag rules for:

  • Production
  • Audit-only / test mode
  • High-risk / watchlist

Route alerts accordingly. Not all rules need to page the on-call engineer.

Example:

An overzealous detection for suspicious registry changes triggered hundreds of tickets in prod before the team realized it was related to an internal update script. Now, any new rule starts in “audit-only” mode for a week, with tags for environment, priority, and ownership. Tickets are only created after review.


Automating the Boring, Not the Broken

Focus your automation on the high-friction, low-risk parts of the workflow:

  • Rule deployment: Auto-publish validated rules to your SIEM
  • Alert suppression: Auto-suppress known false positives based on analyst feedback
  • Metadata enrichment: Auto-tag alerts with threat intel, asset context, or prior alert history

Avoid:

  • Blindly generating rules from threat feeds
  • Auto-enabling rules without signal testing

Example:

One company added auto-enrichment for all file hash alerts using VirusTotal API. Within a week, analysts could triage commodity malware in seconds instead of minutes. No extra alerts were added—just more context per event.


Detection Rule Lifecycle: A Healthy Flow

To keep your detection stack healthy and scalable, every rule should follow a lifecycle:

  1. Draft → created and documented in Git
  2. Reviewed → peer-reviewed with threat context and test plan
  3. Test Mode → dry-run alerts captured and evaluated
  4. Production → alerting enabled, metrics tracked
  5. Monitored → rule performance reviewed regularly
  6. Tuned/Retired → based on feedback and activity

This helps avoid stale rules, redundant logic, and alert fatigue from the inside out.


Metrics That Matter

You can’t improve what you don’t measure. Track:

  • Alerts per rule, per week
  • % of alerts leading to investigation
  • % of alerts closed as false positives
  • Time-to-triage per rule category
  • Rules that haven’t fired in X days

Example:

Use a dashboard to track “high-volume, low-action” rules. Anything with >100 alerts/week and <1% triage got flagged.


Common Pitfalls to Avoid

  • No testing before deployment: Don’t learn the hard way.
  • Too much abstraction: Building a “platform” before proving the workflow.
  • Unclear rule ownership: Nobody wants to fix what nobody owns.
  • CI/CD without rollback: If your deploy fails, can you recover instantly?
  • Duplication across data sources: Alert storms from the same event via EDR, SIEM, and cloud logs.

Trust in automation is fragile. One false firestorm can undo months of good work.


Tooling Recommendations (Real & Practical)

  • Rule Management:
  • Sigma for standardized rule formats
  • DetectionRules (Elastic’s repo)
  • Git + pull requests for audits
  • CI/CD:
  • GitHub Actions / GitLab Pipelines
  • Pre-commit hooks for schema validation
  • Slack/Jira integrations for peer review flow
  • Testing/Replay:
  • Custom Python scripts with Elastic queries
  • Atomic Red Team or custom replay tools
  • Test harnesses using ECS data snapshots
  • Feedback:
  • Analyst tagging in SIEM
  • Jira or ServiceNow auto-linking
  • Slack bots for in-line feedback capture

Final Thought

You can’t scale detection engineering by working harder. You scale it by working smarter—building pipelines that catch issues before they cause pain, and automating workflows with safety rails.

Think of automation like detection plumbing: invisible when it works, catastrophic when it leaks.

Build for trust. Build for clarity. Build for scale.

That’s the rebuild.


Discover more from Annoyed Engineer

Subscribe to get the latest posts sent to your email.

,

Leave a Reply

Your email address will not be published. Required fields are marked *