Private Beta

Triage breach data.
Find what matters.

Drop a folder of leaked data onto BreachSleuth and get a risk-scored file inventory in seconds. Credentials, PII, and financial data highlighted, AI investigation ready, nothing leaving your machine.

Public release coming soon

What it does

Scan. Triage. Investigate. Report.

BreachSleuth handles the full workflow from raw dataset to finished incident report, without any data leaving your machine.

🗂

Folder scan & risk scoring

Recursive file inventory with magic-byte type detection, automatic High/Medium/Low/No Findings classification, and duplicate flagging.

🔍

Content extraction & preview

Reads PDFs, Word documents, spreadsheets, emails, SQLite databases, and images (OCR). Risk hits highlighted in the preview.

📦

Archive support

ZIP, RAR, 7z, and tar archives are unpacked and scanned automatically. Encrypted archives are flagged.

🔎

Pattern search & YARA

Search across all files with keywords or regex. Run YARA rules to surface malware indicators and custom signatures alongside built-in detections.

🤖

Local AI analysis

Deep-dive investigation via Ollama. Ask questions about individual files or the full dataset. No API keys, no cloud.

📋

Incident report generation

Produce a structured HTML report suitable for client delivery, regulatory submission, or legal documentation. Classification marking, jurisdiction, and section selection included.

🔒

Privacy filter

Redact PII from the preview before you read it, and optionally before anything reaches the AI. Original files are never modified.

💼

Case management

Save and reload full investigation sessions as portable case files. Triage decisions, AI notes, chat history, and audit log all included.

🔏

Evidence-grade audit trail

Hash-chained, append-only logs with tool version pinning. Designed for court and regulator handoff where the integrity of findings must be demonstrable.

High: Credentials, keys, credit cards, SSNs
Medium: PII, emails, phones, addresses, DOBs
Low: Confidentiality markers, hashes, URLs
No Findings
breachsleuth: scan results
Scanning /datasets/acmecorp_breach ... Found 14 files across 4 subfolders Filename Risk Patterns Matched ──────────────────────────────────────────────────────── db_backup.sql HIGH Password in config · SSN (US) · Credential Pair (5) credential_dump.zip HIGH AWS Access Key · Credential Pair · Secret/Token breach_notification.eml HIGH Password in config · AWS Access Key · Email Address payment_records.txt HIGH Credit Card (7) · IBAN (2) user_records.csv MEDIUM Email · Phone · DOB · Street Address · UK Postcode internal_report.txt MEDIUM IBAN · IP Address · Confidential Marker config.env HIGH AWS Access Key · Secret/Token · Password in config notes.txt NO FINDINGS Scan complete in 1.4s · 8 files · 5 High · 2 Medium · 1 No Findings

Build history

Shipped features

BreachSleuth is developed one complete practitioner workflow at a time. Here's what's been built and what's coming next.

Early May 2026: Core engine
Shipped

Folder scan & triage

Recursive scan with magic-byte fingerprinting, type mismatch detection, protected archive flagging, and automatic High / Medium / Low / No Findings scoring.

Shipped

Archive extraction

ZIP, RAR, 7z, and tar archives unpacked and scanned automatically, including nested content.

Shipped

Content extraction & risk preview

Text extracted from PDFs, DOCX, XLSX, EML, SQLite, images (OCR), and plain text. Pattern search with regex across the full dataset. HTML and CSV export.

Shipped

LLM analysis & chat

Per-file and batch analysis via Ollama. Ask questions about any file in plain English. Fully offline, no API keys.

Mid May 2026: Investigation layer
Shipped

Case management & persistent storage

SQLite backend keeps scan results, triage decisions, and AI notes between restarts. Save and reload full sessions as portable case files.

Shipped

Privacy filter & audit trail

PII redaction in the preview and optionally in AI prompts. Every practitioner action timestamped and logged to the case database.

Shipped

One-click launchers

Double-click to launch on Mac and Windows. Guided setup installs Tesseract, Poppler, 7-Zip, and Ollama. Fully offline after first run, no terminal needed.

Late May 2026: Advanced analysis
Shipped

Regex builder from selection

Highlight any value in the file preview (email, IP, credit card, UUID, API key) and BreachSleuth generates the right search pattern automatically. Push it to Pattern Search with one click.

Shipped

Formal incident report

Structured HTML report for client delivery or regulatory submission. Executive summary, key findings, data type breakdown, and prioritised recommendations. Classification marking and jurisdiction fields included.

Shipped

YARA rule support

Paste or upload YARA rules and scan the full dataset. Matches show rule name, matched strings, and file path. Matching files can be automatically promoted to High Risk.

Shipped

Expanded detection coverage

100+ built-in patterns across credentials, PII, financial data, and infrastructure secrets. Global coverage including payment track data, crypto private keys, modern AI/SaaS API tokens, and national identifiers across 15+ countries.

June 2026: Security hardening
12
Planned

Evidence-grade audit trail

Hash-chained, append-only logs with tool version pinning. Designed for court and regulator handoff where the integrity of findings must be demonstrable.

13
Planned

Prompt injection hardening

Attacker-controlled file content fed to an LLM is a real attack surface. Structural separation between system instructions and analysed content will isolate analyst-controlled data from attacker-controlled content.


Privacy & security

Offline by design.

Breach data is sensitive by definition. BreachSleuth runs entirely on your machine, with no telemetry, no uploads, and no cloud dependency.

🔒

No network calls

Scanning, analysis, and reporting all run locally. The only outbound connection is your Ollama instance, which is also on your machine.

Air-gap safe

Designed for use in isolated environments. Works with no internet connection after first-time setup.

🔏

Privacy filter

Three modes: no redaction, preview-only redaction, or full redaction before any LLM call. Original files are never modified.

📁

Files stay put

Nothing is copied, moved, or uploaded. Files are read in place; the dataset on disk is untouched throughout the investigation.

💾

SQLite case storage

All findings are written to a single local .db file. Copy it, back it up, store it on an encrypted volume; you control it entirely.