Why security logs don’t behave like language (and never will.)

Large language models (LLMs) excel at parsing and generating human language. Security logs, however, are not language; they’re behavioral exhaust, a compact record of machine events. When we train models on internet‑scale text we teach them: ‑ Syntax ‑ Semantics ‑ Statistical relationships between words. But a firewall entry like SRC=10.1.5.23 DST=185.224.x.x DPT=443 ACTION=ALLOW is not a sentence; it’s a single data point that gains meaning only when placed in its operational context: time, surrounding traffic, policy rules, and asset relationships. Ignoring that context means missing the very signals that differentiate benign activity from an emerging threat.

Logs are about behavior, not text

Unlike sentences, logs encode discrete events. The question we ask isn’t ‘what does this sentence mean?’ but ‘what does this event tell us about the system’s behavior.’ The analogy would that the language model predicts the next word in a novel while behavioural modeling is like predicting the next move in a chess game. Generally, these investigations fall into three broad categories: historical frequency, temporal dynamics, and topology‑aware state.

ThemeSample questions
Historical frequency“Has this source IP appeared before?” “Is this destination typical for this subnet?”
Temporal dynamics“Did this pattern change in the last hour?” “Is this burst symmetric or asymmetric?”
Topology & asset state“Is this host newly joined to the domain?” “How does this event fit into the network map?”

Language models learn probability distributions over token sequences. In contrast, behavior modeling learns probability distributions over state transitions and structural relationships within a network. Language models compress text into probabilistic token relationships. Security requires reasoning over:

Security‑log reasoningWhy standard LLMs fall short
State transitionsLLMs lack an explicit notion of system state; they treat each token independently of external context.
Frequency shiftsLLMs capture corpus‑wide token frequencies, not localized, time‑windowed event rates.
Rarity within local baselinesLLMs learn global rarity (e.g., rare words), not rarity relative to a specific host or subnet.
Infrastructure‑specific normalityLLMs have no built‑in representation of network topology or asset inventory.

Because security analytics demand reasoning over state, frequency, and topology, we need models that natively incorporate these dimensions whether through graph neural networks, time‑series point processes, or hybrid systems that augment LLMs with structured context. The consequence of not doing this is:

  • LLM hallucinations become “detections”
  • False positives increase
  • Analysts lose trust
  • Systems become non-auditable (important for NIS2 / ISO 27001)

Training on “everything” removes what matters

General-purpose LLMs are trained on the entire internet. That scale gives them breadth. But breadth comes at a cost: loss of specificity. In cybersecurity, specificity is everything. I work for a multinational seafood company. Normal traffic patterns in a company that produces sushi quality salmon, uses a lot of OT and local AI, and partially based on seasonal production; is going to be far from any “Internet normality.” And think of the use case if your’re setting is a hospital, a fintech startup, your own home lab; they’re all radically different.

The baseline is local. The anomalies are relative. The risk is contextual. A model trained on “everything” cannot know what is normal here. And without a strong concept of local normality, anomaly detection becomes guesswork.

Security context is architectural

Log entries only become meaningful when they’re interpreted against the organization’s architecture. The necessary context lives outside the logs themselves in the design of the environment.

Key architectural dimensions that give logs relevance

DimensionWhy it matters
Network segmentationDetermines which traffic flows are expected and which are anomalous.
Asset criticalityHighlights which hosts or services require tighter monitoring.
Business processesLinks events to legitimate workflow steps, reducing false positives.
Change windowsProvides a baseline for scheduled activity versus unexpected behavior.
Policy designSets the rules that define acceptable versus suspicious actions.

Because this context isn’t embedded in the log text, generic AI models that merely “summarize” logs miss the deeper security insight. Effective security analysis must go beyond summarization and incorporate the architectural backdrop.

Detection and explanation are different problems

This is where many AI security tools blur the line. Detecting abnormal behavior is a statistical problem. Explaining that behavior in human language is a linguistic problem. They are related , but not the same. Treating logs as “just another text corpus” collapses those layers. And when the layers collapse, so does reliability.

What true security analysis looks like

  • Baseline awareness: Establish normal patterns for each asset and network segment.
  • Behavioral comparison: Continuously contrast observed activity against those baselines.
  • Infrastructure grounding: Map every event to its place in the architecture (e.g., which segment, asset tier, business process, or policy rule it touches).

Architectural implications

When I designed PyLog, the intent was to build a cybersecurity system where:

  • Detection should be grounded in local behavioral models
  • Context must be derived from infrastructure, not vocabulary
  • Language models should assist explanation, not define truth

How can you do this if you don’t have your own PyLog tool:

How to apply this in practice

  1. Enrich logs with metadata – Attach tags for segment, asset criticality, associated business process, etc., at ingestion time.
  2. Build a contextual knowledge base – Store architecture diagrams, change schedules, and policy definitions in a searchable repository that your SIEM or analytics platform can reference.
  3. Develop detection rules that reference context – Instead of “failed login,” use “failed login on a high‑criticality asset outside approved change windows.”
  4. Leverage AI that can consume structured context – Feed the enriched logs and the knowledge base into models designed for security reasoning rather than pure summarization.

By grounding log analysis in the architectural context, you move from surface‑level summaries to actionable security insights.

Postscript

It is possible to use LLMs for log analysis with a few steps. It’s not useful to use them without context. One usable architecture is:

[Logs / Metrics / Events]
          ↓
[Behavior models / baselines / rules / graphs]
          ↓
[Signal / finding / anomaly]
          ↓
[LLM]
          ↓
[Explanation, triage, narrative, decision support]

The LLM doesn’t define ‘truth,’ it explains and supports decision-making. If you give the LLM your specific context, you can gain significant insight without retraining the model.