BreachSleuth: Breach Data Triage & Analysis

Why BreachSleuth

Built from 20 years in security and incident response.

Sifting through gigabytes of data that may have been accessed or exfiltrated, trying to identify credentials, PII, confidential documents, and TLP-marked material, is a problem practitioners face repeatedly. Most still do this manually or with basic scripts. It works, but it's slow, and you're limited to the patterns you already know.

BreachSleuth is a local, offline triage tool built on deterministic risk scoring. No cloud. No uploads. Add a local LLM and it analyses context and intent beyond what regex can detect. Read the full story in the announcement post, by Adrian McAlister.

What it does

Scan. Triage. Investigate. Report.

BreachSleuth handles the full workflow from raw dataset to finished incident report, without any data leaving your machine.

🗂

Folder scan & risk scoring

Recursive file inventory with magic-byte type detection, automatic High/Medium/Low/No Findings classification, and duplicate flagging.

🔍

Content extraction & preview

Reads PDFs, Word documents, spreadsheets, emails, SQLite databases, and images (OCR). Risk hits highlighted in the preview.

📦

Archive support

ZIP, RAR, 7z, and tar archives are unpacked and scanned automatically. Encrypted archives are flagged.

🔎

Pattern search & YARA

Search across all files with keywords or regex. Run YARA rules to surface malware indicators and custom signatures alongside built-in detections.

🤖

Local AI analysis

Deep-dive investigation via Ollama. Ask questions about individual files or the full dataset. No API keys, no cloud.

📋

Incident report generation

Produce a structured HTML report suitable for client delivery, regulatory submission, or legal documentation. Classification marking, jurisdiction, and section selection included.

🔒

Privacy filter

Redact PII from the preview before you read it, and optionally before anything reaches the AI. Original files are never modified.

💼

Case management

Save and reload full investigation sessions as portable case files. Triage decisions, AI notes, chat history, and audit log all included.

🔏

Evidence-grade audit trail

Hash-chained, append-only logs with tool version pinning. Designed for court and regulator handoff where the integrity of findings must be demonstrable.

High: Credentials, keys, credit cards, SSNs

Medium: PII, emails, phones, addresses, DOBs

Low: Confidentiality markers, hashes, URLs

No Findings

breachsleuth: scan results

Scanning /datasets/acmecorp_breach ... Found 14 files across 4 subfolders Filename Risk Patterns Matched ──────────────────────────────────────────────────────── db_backup.sql HIGH Password in config · SSN (US) · Credential Pair (5) credential_dump.zip HIGH AWS Access Key · Credential Pair · Secret/Token breach_notification.eml HIGH Password in config · AWS Access Key · Email Address payment_records.txt HIGH Credit Card (7) · IBAN (2) user_records.csv MEDIUM Email · Phone · DOB · Street Address · UK Postcode internal_report.txt MEDIUM IBAN · IP Address · Confidential Marker config.env HIGH AWS Access Key · Secret/Token · Password in config notes.txt NO FINDINGS Scan complete in 1.4s · 8 files · 5 High · 2 Medium · 1 No Findings

Build history

Shipped features

BreachSleuth is developed one complete practitioner workflow at a time. Here's what's been built and what's coming next.

Early May 2026: Core engine

✓

Shipped

Folder scan & triage

Recursive scan with magic-byte fingerprinting, type mismatch detection, protected archive flagging, and automatic High / Medium / Low / No Findings scoring.

✓

Shipped

Archive extraction

ZIP, RAR, 7z, and tar archives unpacked and scanned automatically, including nested content.

✓

Shipped

Content extraction & risk preview

Text extracted from PDFs, DOCX, XLSX, EML, SQLite, images (OCR), and plain text. Pattern search with regex across the full dataset. HTML and CSV export.

✓

Shipped

LLM analysis & chat

Per-file and batch analysis via Ollama. Ask questions about any file in plain English. Fully offline, no API keys.

Mid May 2026: Investigation layer

✓

Shipped

Case management & persistent storage

SQLite backend keeps scan results, triage decisions, and AI notes between restarts. Save and reload full sessions as portable case files.

✓

Shipped

Privacy filter & audit trail

PII redaction in the preview and optionally in AI prompts. Every practitioner action timestamped and logged to the case database.

✓

Shipped

One-click launchers

Double-click to launch on Mac and Windows. Guided setup installs Tesseract, Poppler, 7-Zip, and Ollama. Fully offline after first run, no terminal needed.

Late May 2026: Advanced analysis

✓

Shipped

Regex builder from selection

Highlight any value in the file preview (email, IP, credit card, UUID, API key) and BreachSleuth generates the right search pattern automatically. Push it to Pattern Search with one click.

✓

Shipped

Formal incident report

Structured HTML report for client delivery or regulatory submission. Executive summary, key findings, data type breakdown, and prioritised recommendations. Classification marking and jurisdiction fields included.

✓

Shipped

YARA rule support

Paste or upload YARA rules and scan the full dataset. Matches show rule name, matched strings, and file path. Matching files can be automatically promoted to High Risk.

✓

Shipped

Expanded detection coverage

100+ built-in patterns across credentials, PII, financial data, and infrastructure secrets. Global coverage including payment track data, crypto private keys, modern AI/SaaS API tokens, and national identifiers across 15+ countries.

June 2026: Security hardening

✓

Shipped

Evidence-grade audit trail

SHA-256 hash-chained, append-only logs with tool and Python version pinning per entry. External anchor file for tamper detection, Verify Integrity check, and full CSV/HTML export designed for court and regulator handoff.

✓

Shipped

Prompt injection protection

Four-layer defence against attacker-controlled file content manipulating the LLM: structural separation of instructions from analysed content, fuzzy injection pattern detection, output monitoring, and model self-reporting, with a regex-vs-AI cross-check as a secondary signal.

Privacy & security

Offline by design.

Breach data is sensitive by definition. BreachSleuth runs entirely on your machine, with no telemetry, no uploads, and no cloud dependency.

🔒

No network calls

Scanning, analysis, and reporting all run locally. The only outbound connection is your Ollama instance, which is also on your machine.

⚡

Air-gap safe

Designed for use in isolated environments. Works with no internet connection after first-time setup.

🔏

Privacy filter

Three modes: no redaction, preview-only redaction, or full redaction before any LLM call. Original files are never modified.

📁

Files stay put

Nothing is copied, moved, or uploaded. Files are read in place; the dataset on disk is untouched throughout the investigation.

💾

SQLite case storage

All findings are written to a single local .db file. Copy it, back it up, store it on an encrypted volume; you control it entirely.

Get started

Available now on GitHub.

Clone the repo and double-click a launcher, no terminal needed after the first run. Free to use, fully offline.

View on GitHub Download Sample Dataset

Triage breach data.Find what matters.

Built from 20 years in security and incident response.

Scan. Triage. Investigate. Report.

Folder scan & risk scoring

Content extraction & preview

Archive support

Pattern search & YARA

Local AI analysis

Incident report generation

Privacy filter

Case management

Evidence-grade audit trail

Shipped features

Folder scan & triage

Archive extraction

Content extraction & risk preview

LLM analysis & chat

Case management & persistent storage

Privacy filter & audit trail

One-click launchers

Regex builder from selection

Formal incident report

YARA rule support

Expanded detection coverage

Evidence-grade audit trail

Prompt injection protection

Offline by design.

No network calls

Air-gap safe

Privacy filter

Files stay put

SQLite case storage

Available now on GitHub.

Triage breach data.
Find what matters.