What 'privacy' means when your copilot sees your warehouse

Privacy in AI data tools is often marketing language. An honest account of the threat model, what l0l1 does at each boundary, and what it doesn't claim.

May 22, 2026 · #privacy#pii#trust

When a tool says it “respects your privacy,” that is almost always doing more work than it should. Privacy is not a single property. It is a set of constraints across several boundaries, and the only useful thing a tool can do is be specific about which boundaries it enforces and which it doesn’t.

This post is an attempt to be specific.

The threat model

Start from the boundaries. An LLM-assisted SQL tool that talks to a warehouse crosses, at minimum, four boundaries that matter:

User → tool. A person types a question. The question itself can contain sensitive content: literal customer names, account IDs, internal project codenames.
Tool → model provider. Whatever the tool decides to send to the LLM leaves your environment. OpenAI, Anthropic, anyone else — once the request is sent, that text exists on someone else’s infrastructure, subject to their retention policy, their employees, and their security incidents.
Tool → warehouse. The tool runs queries. Those queries return rows. Those rows contain whatever the warehouse contains, including columns that were never supposed to be touched by anything but a service account.
Warehouse → user. The rows come back. Where do they end up? In a terminal? In a notebook checked into a repo? In a Slack message? In a screenshot?

Each of those boundaries is a separate problem with separate mitigations. “Privacy” in the marketing sense usually means the tool has done something about one of them and is hoping nobody asks about the others.

l0l1’s posture is to be explicit about each boundary. Here’s how.

Boundary 1: what the user types

This one is mostly social. People paste things they shouldn’t. They include specific customer emails in prompts because they’re debugging that customer. They paste rows from a sample query into a chat to ask why the join looks weird, not noticing that one of those rows includes a credit card.

l0l1 does PII detection at the prompt layer specifically because of this. Before any text leaves the local process and goes to a model provider, the PII detector — built on Microsoft Presidio plus targeted regex rules — scans for SSN, email, phone number, credit card, and a few other categories. If any of those are found, the tool can either anonymize the literals (replacing 'john@example.com' with <EMAIL>) or refuse to forward the prompt, depending on configuration.

This is best-effort. PII detection is a probabilistic exercise and always will be. The model could miss an internal employee ID format we’ve never seen. It could miss a non-Western phone format. It could miss a national ID from a country we haven’t added rules for. We say “best-effort” because it is, and any tool that promises more is lying.

What l0l1 does promise is that the detector runs by default, that the rules are open source and visible, and that you can extend the rule set to cover your organization’s specific patterns.

Boundary 2: what gets sent to the model provider

This is the boundary most people are worried about, and it’s the one most tools handwave. The honest framing is: as long as you’re using OpenAI, Anthropic, or anyone else as the SQL-validating brain, something has to go over the wire. The question is what.

l0l1’s design tries to minimize that surface in three ways:

Schema context, not data. When the tool sends a query to an LLM for validation or explanation, it sends the SQL and the schema context — column names, types, table relationships. It does not send sample rows. The model sees that users has a column called email of type VARCHAR. It does not see any actual email addresses.

Anonymized literals. If the user’s query itself contains literals that look like PII, those literals are anonymized before the query text is sent. This is the same PII detector that runs at the prompt layer, applied to the query payload.

No raw row data, ever. l0l1 does not, as part of any normal flow, send query result rows to a model provider. Result rows go back to the user. The model is asked to reason about query structure, not query output. If you explicitly opt in to “explain this result” workflows, the data being sent should be obvious to you, but that path is opt-in, not default.

The honest caveat: if you write a query that selects a literal value out of a table, and that literal value happens to be PII, and the validator sends the query text to the model — yes, that value leaves your environment. The anonymizer catches obvious patterns. It cannot catch arbitrary semantic sensitivity. A column literally named customer_secret_internal_code might pass through if the detector doesn’t recognize the format.

Boundary 3: what the tool can see in the warehouse

The most underappreciated boundary. People think a lot about what goes to OpenAI and very little about the access scope of the tool that talks to their warehouse.

l0l1 connects to PostgreSQL, MySQL, SQLite, and DuckDB. It does so with whatever credentials you give it. If those credentials are the same ones your nightly ETL uses, then l0l1 can see everything your nightly ETL can see, which is usually everything.

We strongly recommend giving l0l1 a dedicated database role with the narrowest possible permissions. Read-only on the tables you actually want it to reason about. No access to tables you’ve designated as off-limits. Row-level security policies enforced server-side, not client-side. This is standard data governance. l0l1 cannot enforce it for you; it can only respect what’s already there.

The same applies to schema introspection. When the tool reads schema to give the model context, it reads only the schema the role can see. If the role can see a column called dob, then dob will appear in the context sent to the LLM as a column name. That’s a real consideration if your column names are themselves sensitive — and if they are, the access boundary needs to be tightened on the database side first.

Boundary 4: where the output ends up

The fourth boundary is the one tools almost never address, because it happens after the tool’s job is done. Query results land in a terminal, a notebook, a CSV export, a screenshot in a Slack thread. None of that is under the tool’s control.

What l0l1 does try to do, on the output side, is annotate. If a query result contains columns the schema metadata flagged as PII, the result is presented with those columns marked. The CLI uses color. The API includes a pii_columns field. The Jupyter integration adds a small warning header. This doesn’t stop anyone from copying the data into a screenshot — nothing can — but it makes “this column is sensitive” hard to miss.

The pattern store, where l0l1 records successful queries for future suggestion, sanitizes literals before storing. Stored patterns include the shape of the query, not its specific values. A pattern remembers “we filter users by email,” not “we filter users by someone@example.com.”

What l0l1 explicitly does not do

In keeping with being specific:

No differential privacy. l0l1 does not add noise to results. Output rows are exact.
No homomorphic encryption. Queries run on the actual warehouse against the actual data.
No local-only LLM by default. If you point l0l1 at OpenAI or Anthropic, your validated query text goes to OpenAI or Anthropic. We support routing to local models for users who need it; we do not pretend the default configuration is local.
No compliance certifications by transitivity. l0l1 is a tool. Your HIPAA, SOC 2, or GDPR posture depends on how you deploy it and what data you grant it. The tool helps; it does not certify.

Trust is specific

Privacy is not a feature you can put on a marketing page. It is a series of boundaries, and being trustworthy means being specific about which ones you defend, how, and with what limits. The worst thing a tool in this space can do is gesture vaguely at “privacy-first” while leaving the hard questions unanswered.

We’d rather be boring and clear. l0l1 detects PII in prompts and queries, anonymizes literals before model calls, never sends raw rows to a provider in normal use, respects whatever database access boundary you set, and sanitizes the patterns it learns from. That’s the list. There are real threats it does not address, and we’d rather you knew that than assumed otherwise.

If you want the details — the config flags, the detector output, the way the pattern store sanitizes input — they’re in the documentation. Read it before production access.

Want this in your stack? Read the docs, browse more posts, or see what l0l1 does.