LLMs in Security Operations: Helpful Sidekick or Hallucinating Intern?

Large language models (LLMs) are everywhere now. Your inbox, your SIEM, maybe even embedded in your security tool’s new “AI assistant” tab. It’s tempting to believe these tools are ready to triage alerts, write detections, and handle analyst fatigue all on their own.

They aren’t. Not yet.

But that doesn’t mean they’re useless. Like any tool, it’s about understanding the strengths, weaknesses, and the sharp edges before you put it in production.

This post breaks down where LLMs can help in security operations, where they fail (spectacularly), and how to use them without losing your mind—or your SOC’s credibility.

Where LLMs Actually Help

1. Summarizing Noisy Alert Data

Given a blob of log data or an alert cluster, LLMs are surprisingly good at turning it into a plain-English summary.

“This alert is based on a PowerShell process executed by a domain user on a critical host. It used base64 encoding and contacted an external IP not previously seen in the environment.”

This is helpful in:

Slack summaries
Ticket descriptions
Post-incident writeups

2. Writing Detection Rules (Carefully)

LLMs can assist with rule generation when given:

Structured inputs (TTPs, Sigma templates)
Examples to work from

They’re great at:

Converting natural language like “alert on unusual login hours” into Sigma syntax
Translating Splunk queries into Elastic DSL (or vice versa)

Just don’t blindly trust the output. Always validate logic and run tests.

3. Reverse Engineering Logs

If you’ve ever stared at a Windows event and wondered what it actually means, LLMs can:

Describe obscure fields
Suggest what a given process + command line might be doing
Help with weird vendor logs you rarely see

4. Enriching Threat Intelligence

Feed it an IOC or phishing email and it can:

Summarize likely behavior
Recommend MITRE techniques
Draft enrichment notes or analyst commentary

Where LLMs Fall Apart

1. Hallucinating Syntax and Facts

LLMs are language prediction machines, not security engines. They’ll:

Invent detection logic that looks right but does nothing
Mix up fields (e.g., confusing src.ip with destination.ip)
Fabricate MITRE mappings that aren’t real

2. Misunderstanding Sequences of Events

They’re not great at:

Determining causality (e.g., what came first in a process tree)
Understanding context spread across multiple logs
Distinguishing between expected vs suspicious behavior in enterprise environments

3. Security Context Is Missing

Most LLMs don’t:

Know what’s normal in your environment
Understand real asset value, privilege level, or business context
Detect nuance in policy or team-specific suppression logic

They don’t know which service accounts are allowed to access your production buckets.
They don’t know that powershell.exe is part of your automated backup process on Wednesdays.
They can’t tell the difference between a test system and a domain controller.

In other words: they can mimic syntax, but not intent.

Without environment-specific context, an LLM might flag perfectly legitimate behavior as malicious—or worse, ignore something subtle but dangerous.

This is why plugging an LLM into your alert pipeline without strong context-aware rules, environment tagging, or human oversight is asking for trouble.

Real-World Example: ChatGPT in the SOC

We’ve used ChatGPT internally for:

Explaining strange PowerShell commands in alerts
Writing regex for detection tuning
Generating summaries of long event chains in postmortems

In one case, we asked ChatGPT to convert a detection idea into a Sigma rule:
“Alert on DNS queries to rare domains from a non-standard process.”

It generated something plausible. But upon testing it:

The process field was incorrect for our EDR source
The logic didn’t actually filter for “rare” domains—it just listed any external DNS queries
It missed edge cases like nslookup.exe spawning from a script

We revised the prompt and iterated until the output was usable. But it took validation, testing, and tuning before we’d trust it in production.

Bottom line: ChatGPT saved us time—but only once we treated it like a junior analyst, not a detection engineer.

Prompt Best Practices for Security Workflows

LLMs are only as good as the prompt you give them. Here’s how to get better results:

1. Be Structured

Instead of:

“Write a detection for suspicious login behavior”

Try:

“Write a Sigma rule to alert when a domain admin logs in from a country not seen in the last 30 days. Use ECS fields and include a false_positives section.”

2. Include Examples

Show what a good input and expected output looks like. The more structured the example, the better the response.

3. Give Context (When Safe)

Tell it:

What log format you use (ECS? Windows native?)
What tooling it’s for (Elastic? Splunk? Sentinel?)
What your intent is (triage alert? full detection? enrichment?)

4. Set the Role

Prompt like:

“You are a detection engineer helping to draft high-fidelity alert rules. Please provide the logic with comments explaining your assumptions.”

5. Always Validate Output

Check field names, logic flow, syntax, and assumptions.
Run it in test mode. Triage it like any other rule.

How to Use LLMs Safely in the SOC

Treat It Like a Junior Analyst

Never deploy suggestions without review
Use it for first drafts, summaries, and ideas
Build workflows around review, testing, and validation

Use Guardrails

Pre-define prompt structures and role-based contexts
Limit scope: detection assistant, log explainer, etc.
Pipe outputs into staging areas (not production alerts)

Don’t Expect ROI from Magic

LLMs are productivity boosters, not magic threat hunters
If your detection strategy is broken, this won’t fix it
If your data’s messy, LLMs will just make prettier messes

Final Thought

LLMs are powerful—but only when you respect their limits. They won’t replace your analysts. But they can absolutely speed up your work, reduce cognitive overhead, and make bad documentation a little less painful.

Use them for summaries, translations, and enrichment. Be skeptical of their outputs. And never forget: they’re still guessing.

Helpful sidekick? Absolutely.
Hallucinating intern? Also yes.

It’s your job to know the difference.

Annoyed Engineer

LLMs in Security Operations: Helpful Sidekick or Hallucinating Intern?

Where LLMs Actually Help

1. Summarizing Noisy Alert Data

2. Writing Detection Rules (Carefully)

3. Reverse Engineering Logs

4. Enriching Threat Intelligence

Where LLMs Fall Apart

1. Hallucinating Syntax and Facts

2. Misunderstanding Sequences of Events

3. Security Context Is Missing

Real-World Example: ChatGPT in the SOC

Prompt Best Practices for Security Workflows

1. Be Structured

2. Include Examples

3. Give Context (When Safe)

4. Set the Role

5. Always Validate Output

How to Use LLMs Safely in the SOC

Treat It Like a Junior Analyst

Use Guardrails

Don’t Expect ROI from Magic

Final Thought

Discover more from Annoyed Engineer

Leave a Reply Cancel reply

Annoyed Engineer

LLMs in Security Operations: Helpful Sidekick or Hallucinating Intern?

Where LLMs Actually Help

1. Summarizing Noisy Alert Data

2. Writing Detection Rules (Carefully)

3. Reverse Engineering Logs

4. Enriching Threat Intelligence

Where LLMs Fall Apart

1. Hallucinating Syntax and Facts

2. Misunderstanding Sequences of Events

3. Security Context Is Missing

Real-World Example: ChatGPT in the SOC

Prompt Best Practices for Security Workflows

1. Be Structured

2. Include Examples

3. Give Context (When Safe)

4. Set the Role

5. Always Validate Output

How to Use LLMs Safely in the SOC

Treat It Like a Junior Analyst

Use Guardrails

Don’t Expect ROI from Magic

Final Thought

Share this:

Discover more from Annoyed Engineer

Leave a Reply Cancel reply