Drop a folder of leaked data onto BreachSleuth and get a risk-scored file inventory in seconds. Credentials, PII, and financial data highlighted, AI investigation ready, nothing leaving your machine.
BreachSleuth handles the full workflow from raw dataset to finished incident report, without any data leaving your machine.
Recursive file inventory with magic-byte type detection, automatic High/Medium/Low/No Findings classification, and duplicate flagging.
Reads PDFs, Word documents, spreadsheets, emails, SQLite databases, and images (OCR). Risk hits highlighted in the preview.
ZIP, RAR, 7z, and tar archives are unpacked and scanned automatically. Encrypted archives are flagged.
Search across all files with keywords or regex. Run YARA rules to surface malware indicators and custom signatures alongside built-in detections.
Deep-dive investigation via Ollama. Ask questions about individual files or the full dataset. No API keys, no cloud.
Produce a structured HTML report suitable for client delivery, regulatory submission, or legal documentation. Classification marking, jurisdiction, and section selection included.
Redact PII from the preview before you read it, and optionally before anything reaches the AI. Original files are never modified.
Save and reload full investigation sessions as portable case files. Triage decisions, AI notes, chat history, and audit log all included.
Hash-chained, append-only logs with tool version pinning. Designed for court and regulator handoff where the integrity of findings must be demonstrable.
BreachSleuth is developed one complete practitioner workflow at a time. Here's what's been built and what's coming next.
Recursive scan with magic-byte fingerprinting, type mismatch detection, protected archive flagging, and automatic High / Medium / Low / No Findings scoring.
ZIP, RAR, 7z, and tar archives unpacked and scanned automatically, including nested content.
Text extracted from PDFs, DOCX, XLSX, EML, SQLite, images (OCR), and plain text. Pattern search with regex across the full dataset. HTML and CSV export.
Per-file and batch analysis via Ollama. Ask questions about any file in plain English. Fully offline, no API keys.
SQLite backend keeps scan results, triage decisions, and AI notes between restarts. Save and reload full sessions as portable case files.
PII redaction in the preview and optionally in AI prompts. Every practitioner action timestamped and logged to the case database.
Double-click to launch on Mac and Windows. Guided setup installs Tesseract, Poppler, 7-Zip, and Ollama. Fully offline after first run, no terminal needed.
Highlight any value in the file preview (email, IP, credit card, UUID, API key) and BreachSleuth generates the right search pattern automatically. Push it to Pattern Search with one click.
Structured HTML report for client delivery or regulatory submission. Executive summary, key findings, data type breakdown, and prioritised recommendations. Classification marking and jurisdiction fields included.
Paste or upload YARA rules and scan the full dataset. Matches show rule name, matched strings, and file path. Matching files can be automatically promoted to High Risk.
100+ built-in patterns across credentials, PII, financial data, and infrastructure secrets. Global coverage including payment track data, crypto private keys, modern AI/SaaS API tokens, and national identifiers across 15+ countries.
Hash-chained, append-only logs with tool version pinning. Designed for court and regulator handoff where the integrity of findings must be demonstrable.
Attacker-controlled file content fed to an LLM is a real attack surface. Structural separation between system instructions and analysed content will isolate analyst-controlled data from attacker-controlled content.
Breach data is sensitive by definition. BreachSleuth runs entirely on your machine, with no telemetry, no uploads, and no cloud dependency.
Scanning, analysis, and reporting all run locally. The only outbound connection is your Ollama instance, which is also on your machine.
Designed for use in isolated environments. Works with no internet connection after first-time setup.
Three modes: no redaction, preview-only redaction, or full redaction before any LLM call. Original files are never modified.
Nothing is copied, moved, or uploaded. Files are read in place; the dataset on disk is untouched throughout the investigation.
All findings are written to a single local .db file. Copy it, back it up, store it on an encrypted volume; you control it entirely.