FIELD NOTES — THE CRAFT OF SMALL TOOLS
I Spent an Hour Breaking My Own Privacy Tool
The 6 vulnerabilities I found — and the 2 I can't fix
February 6, 2026 • 9 min read
"The only truly secure system is one that is powered off, cast in a block of concrete and sealed in a lead-lined room with armed guards."
— Gene Spafford
The Adversarial Hour
Tuesday, 9 PM. Coffee poured. Notes app open. One rule: think like someone who wants to break this.
Building a privacy tool and shipping it without red-teaming yourself isn't just lazy. It's professional malpractice.
So I spent an hour systematically trying to break Anancy. Not casually—methodically. Attack vectors, edge cases, failure modes.
I found six vulnerabilities. Two I fixed on the spot. Two more are fixable but on the roadmap. Two aren't—at least not without fundamentally changing what the tool is.
Here's what I learned.
The Framework
I used three questions:
Failure modes: What happens when inputs aren't what I expect?
Blind spots: What am I assuming works that I haven't verified?
Second-order effects: If this works as designed, what unintended consequences follow?
For each category, I tried to think like someone who wanted to misuse, abuse, or circumvent the tool. Not paranoid. Realistic.
Vulnerability 1: Deterministic Scaling
Category: Failure mode
The problem: When masking currency amounts, I scaled deterministically. $100 always became $83. This meant someone with access to both original and masked documents could reverse-engineer the scaling factor and uncover all amounts.
Original code:
def mask_amount(original: float, seed: str) -> float:
factor = hash_to_float(seed, 0.7, 1.3) # Deterministic based on seed
return round(original * factor, 2)
The fix: Add randomness within a range.
def mask_amount(original: float, seed: str) -> float:
base_factor = hash_to_float(seed, 0.7, 1.3)
random_variation = random.uniform(0.95, 1.05) # Random element
return round(original * base_factor * random_variation, 2)
Now $100 becomes somewhere between $66-86. Consistent enough to preserve meaning, random enough to prevent reversal.
Status: Fixed.
Vulnerability 2: Phone Regex Missing Parentheses
Category: Blind spot
The problem: The phone regex didn't handle parentheses-formatted numbers.
Input: "Call (555) 123-4567"
Expected: Masked
Actual: Not detected
I'd tested formats I used. I hadn't tested formats other people use.
The fix: Add parentheses support.
PHONE_PATTERN = r'(\+1\s?)?(\(?\d{3}\)?[\s.-]?)?\d{3}[\s.-]?\d{4}'
Status: Fixed.
Vulnerability 3: Western Name Bias
Category: Blind spot
The problem: Name detection relies on a dictionary of common names. That dictionary is heavily weighted toward Western (specifically American) names.
"John Smith" gets detected. "Priya Sharma" might not. "Wei Chen" probably won't be detected correctly.
Why it's hard to fix:
- ● Expanding the dictionary helps but never covers everything
- ● Name patterns vary dramatically across cultures
- ● False positives increase with broader coverage
- ● NER models have the same biases, just hidden
Honest approach: Document the limitation clearly.
## Known Limitations
Name detection is optimized for common Western names.
Detection rates for names from other cultural backgrounds
may be lower. Review masked output manually.
Status: Documented, not fixed. Some problems don't have technical solutions. They have honest disclosure.
Vulnerability 4: Context Leakage
Category: Second-order effect
The problem: Masking removes explicit PII but doesn't address context that reveals identity.
"The CEO of [Company] announced..." — even with [Company] masked, if the document clearly discusses topics only one company would be involved in, identity leaks through context.
"The 6'8" basketball player from [City]..." — height plus sport plus city narrows the population to dozens. Maybe fewer.
Why it's unfixable (without AI):
Context leakage requires understanding semantics, not just patterns. Regex can't reason about whether "the only female board member" reveals identity in context.
Real solutions:
- ● AI-based semantic analysis (expensive, slow, overkill)
- ● Human review (always recommended anyway)
- ● User awareness (documentation)
Status: Documented as limitation. Human review recommended.
This isn't failure. This is knowing what your tool does and doesn't do.
Vulnerability 5: Mapping File Security
Category: Failure mode
The problem: The mapping file (original → masked) is stored in plaintext JSON. Anyone with access to that file can reverse all masking.
{
"John Smith": "Marcus Webb",
"555-123-4567": "555-987-6543"
}
Why it matters: The mapping file is essential for reversibility (a feature, not a bug). But it's also the skeleton key.
Potential fix: Encrypt the mapping file with a user-provided key.
Status: On roadmap. Current mitigation: document that the mapping file should be protected with the same care as original data.
Vulnerability 6: Partial Masking Correlation
Category: Second-order effect
The problem: When the same PII appears multiple times, it's masked consistently. Same original → same replacement. This preserves document coherence.
But it also means: if an attacker knows ONE original-masked pair, they've cracked ALL instances of that value.
Why it's a tradeoff:
- ● Inconsistent masking breaks document coherence
- ● Consistent masking enables correlation attacks
- ● Both are valid choices depending on threat model
Status: Documented as design choice with tradeoffs.
Not every security decision has a right answer. Sometimes you choose and document.
The Limitations Section
The README now includes honest limitations:
## Limitations and Honest Assessment
Anancy is designed for common PII patterns in English-language
documents. It is NOT a substitute for professional data handling
or legal review.
**Known limitations:**
1. Name detection optimized for Western names. Detection
rates vary for other naming conventions.
2. Context leakage - masking explicit PII does not prevent
identification through contextual information.
3. Mapping security - the mapping file contains the reversal
key. Protect it appropriately.
4. Format-specific - designed for text documents.
5. Not for regulated data - if handling HIPAA, PCI, or
similarly regulated data, use certified tools.
**Always review masked output before sharing.**
The Philosophy
"Honesty is the first chapter of the book of wisdom."
— Thomas Jefferson
Most security content falls into two categories:
- ● "Here's how to hack things" (offensive)
- ● "Here's how to prevent attacks" (defensive)
This is a third category: "Here's how to honestly assess your own work."
The goal isn't perfection—it's informed use. A user who understands the limitations can compensate. A user who assumes the tool handles everything will make mistakes.
Honest documentation isn't weakness. It's the feature that keeps users safe.
The Takeaway
Red team yourself before shipping anything security-adjacent.
The process:
Clear dedicated time
Not "I'll think about it later."
Use a framework
Failure modes, blind spots, second-order effects.
Document everything
Including what you can't fix.
Be honest in public
Limitations aren't embarrassing if they're disclosed.
I found six vulnerabilities in an hour. Fixed two, roadmapped two, documented the two I can't solve. The tool is better for it—not despite the unfixable issues, but because they're now visible.
The shadows define the light.
Part 8 of "The Craft of Small Tools" series.
Building tools that handle sensitive data?
Textstone Labs helps teams build AI and automation with security baked in — not bolted on. We red-team before we ship.
Let's Talk →Want more Field Notes?
Practical lessons from the field, delivered to your inbox. No spam.
Textstone Labs — AI implementation for people who build things.