The Detection Rebuild, Part 1: Fixing the Signal Problem


How to Stop Drowning in False Positives and Start Surfacing Real Threats

Let’s be honest: most security teams aren’t short on alerts—they’re short on good ones. Every SOC eventually hits the same wall: too many alerts, not enough signal, and a growing pile of detection rules no one wants to touch because something might break.

This post is about flipping that dynamic—moving from “generate everything and hope something hits” to building lean, reliable, high-signal detections your team can actually trust.


The Real Cost of Noise

Alert fatigue isn’t just annoying. It erodes trust, slows response times, and numbs analysts into ignoring the very system that’s supposed to protect them.

  • Analyst burnout: Alert queues turn into graveyards.
  • Broken feedback loops: No one triages false positives, so nothing improves.
  • Missed threats: Buried in noise, real incidents slip through.

If you’ve ever caught something serious by luck because someone “just happened to notice” it—congrats, your detection stack is lying to you.


Why Most Detections Suck

Let’s get brutally honest: a lot of detection logic out there is either too generic, too brittle, or too context-blind to be useful.

  • Overly broad rules: “Alert on PowerShell” is not detection, it’s panic.
  • Static thresholds: Hardcoded values rarely reflect real-world variability.
  • No enrichment: An alert on an IP address means nothing if you don’t know what system it hit or what that system does.
  • Pre-built rules from vendors: A great starting point, but rarely tuned for your environment.

Take cloud alerts, for example. We once deployed a prebuilt alert for “S3 Bucket Access from Unusual IP.”
It sounded useful—until it fired hundreds of times a day.

Why? Because it had no knowledge of which buckets were public, which accounts were service accounts, or which roles were expected to access what.

Once we:

  • Added a list of known, public S3 buckets,
  • Mapped expected service accounts to roles,
  • Suppressed alerts from common automation paths,

…the alert volume dropped by over 95% and what remained were actual anomalies—including a misconfigured deployment script pushing to the wrong bucket.

This is the difference between a rule that exists and a rule that works.


Principles for High-Signal Detections

Let’s shift from reactive to intentional. Here are five principles to guide better detection engineering:

1. Context Is King

A good detection rule should answer: “Is this suspicious for this user, on this host, at this time?” That means:

  • Asset tags (critical system? test server?)
  • User identity and role
  • Time of day, location, behavior history

How to implement it:
Use data from your CMDB, HR systems, or asset inventory to enrich logs. In Elastic or Splunk, use lookup tables or joins to tag critical systems or privileged users. You can also enrich alerts with user risk scores from UEBA tools.

2. Correlate Across Data Sources

No single log source tells the full story. Combine EDR, network logs, authentication logs, and cloud activity to:

  • Reduce false positives (“Yes it was odd, but the user also signed in via SSO seconds earlier”)
  • Spot real abuse patterns that span systems

Example:
Correlate Okta login data with AWS CloudTrail to spot access from new geo-locations immediately followed by role assumption. Correlate by user.email or principalId across sources.

3. Use Behavioral Baselines

What’s normal for one account or host may be completely off for another. Track behavioral norms:

  • Rare parent-child processes
  • New external destinations per host
  • Login patterns per user

Implementation tip:
Use time-bucketed aggregations to generate per-user or per-host baselines. In Elastic, you can use transforms and rollups to calculate average external destinations per week, then alert on 3x deviation.

4. Enrich Everything

Data without context is noise. Enrich with:

  • Threat intelligence (e.g., is this IP known-bad?)
  • CMDB or asset context
  • User risk scoring or historical incidents

Tools & methods:

  • Add VirusTotal or URLhaus lookups to alert pipelines.
  • In your detection engine (Elastic, Splunk, Sentinel), enrich incoming events using scripted fields or integration pipelines.
  • Use alert metadata to route alerts differently: critical assets vs dev systems.

5. Measure Signal, Not Just Coverage

A high-fidelity rule should have a measurable true positive rate. Track:

  • How many alerts lead to investigation
  • How many are closed as false positive
  • Analyst feedback (and act on it!)

Pro tip:
Maintain a dashboard showing per-rule alert count, % closed as false positive, and time-to-triage. If a rule fires 100x and gets ignored 99x, it’s time to kill or fix it.


Examples of Smarter Detections

Instead of: “Alert on PowerShell Execution”
Try: “Alert on PowerShell spawning with base64-encoded arguments from a domain user on a high-value asset, outside of business hours.”

Instead of: “Alert on Office macro execution”
Try: “Alert on Office macro spawning WScript, followed by outbound network connection to rare external domain.”

Instead of: “S3 bucket accessed from external IP”
Try: “S3 access to non-public bucket from unexpected role in region not associated with known workloads”


Fix the Feedback Loop

Your detection pipeline should be a feedback loop, not a firehose.

  • Include analysts in rule reviews.
  • Tag false positives directly in the platform (or via ticket systems).
  • Use feedback to tune suppression logic, thresholds, or context requirements.

Pre-deployment testing workflow:

  • Run your new detection logic against 30 days of historical data.
  • Output matching events to an audit-only channel.
  • Collect analyst feedback during a dry-run period.
  • Only promote to production once the signal-to-noise ratio is acceptable.

Final Thought

Good detections are part art, part science. They require technical depth, business context, and constant refinement. But the payoff is massive: fewer wasted cycles, faster triage, and higher trust in your security stack.

In Part 2, we’ll cover how to automate and scale this process without turning your rule repo into a garbage fire.

Until then: tune smarter, alert less, and signal better.


Discover more from Annoyed Engineer

Subscribe to get the latest posts sent to your email.

, ,

Leave a Reply

Your email address will not be published. Required fields are marked *