← back to blog

Validation as a first-class layer in AI data tools

If LLMs are going to draft SQL, validation has to stop being a UI warning and start being an architectural surface. Here is how we think about it inside l0l1.

· #validation#architecture

Most AI data tools treat validation the way a website treats a cookie banner. It is something you have to put on the page, it shouldn’t be too noisy, and the goal is for the user to click through it as quickly as possible.

This is wrong, and it’s wrong in a way that gets more wrong as models get better. As long as LLMs produced obviously bad SQL, validation was decorative. You could see the problem with your own eyes. Now that LLMs produce SQL that is locally indistinguishable from what a competent human would write, the validator is the only thing standing between the model and the dashboard. It deserves to be designed as a real surface.

Here is what that means in practice, and how l0l1 organizes the layer internally.

Validation has a contract, not a vibe

The first thing that has to happen is that “validation” stops being a single function that returns a confidence score and starts being a small, stable contract with named outputs. A useful validator answers at least four questions, and answers them separately:

  1. Does this query bind? Every table, column, and function referenced resolves against the schema you provided. This is the easy one, and it’s also the one most LLM tools skip because they don’t have schema context.
  2. What is the shape of the result? Number of rows roughly, number of columns exactly, types of those columns, and whether the row count could exceed expectations due to fan-out joins.
  3. What are the privacy implications? Does this query touch columns tagged as PII? Does it return literal sensitive values? Is the output destined for somewhere it shouldn’t go?
  4. Does this match an approved pattern? Has the team already validated a query that does this exact thing, on these exact tables, with this exact filter shape? If so, the new query should look more like the old one, and divergence is worth surfacing.

Each of these has a yes/no answer plus a structured explanation. None of them is a confidence score. None of them is a stoplight icon. A consumer of the validator — a CLI, an IDE, a CI check, a Jupyter cell — can use the structured output to decide how to behave. A CLI might print warnings; CI might fail the build; the IDE might draw a squiggle.

Validation is multi-pass

A common mistake is to treat the LLM as the slow expensive step and the validator as the cheap fast step. In practice, the validator should run multiple times, in different contexts, with different cost profiles.

Pre-prompt validation runs before the model is even called. Is the question well-formed? Is the schema context coherent? Are there obvious red flags — does the question literally ask for someone’s social security number, for example, in which case maybe we don’t send anything to a model provider at all? This pass is essentially free and catches the most embarrassing failures.

Post-generation validation runs the moment the model returns text. Is it actually SQL? Does it bind? Does it use tables that exist? This is the pass most “AI SQL” tools call “validation” and is, on its own, insufficient.

Pre-execution validation runs against the live database. Explain plans, estimated row counts, dry-run results. The query binds — but does the database agree it will return a sane number of rows? This is the pass that catches fan-out joins.

Post-execution validation runs after the query has produced results. The result shape matched expectations. The row count is within the predicted band. No literal PII values appeared in the output. The numbers are consistent with prior runs of similar queries. This is the pass that catches the silent eight percent drift you wouldn’t see otherwise.

In an LLM-assisted analytics workflow, validation that only happens at one of these points is not enough. The validator that l0l1 ships is built around these four phases, not as a single function call.

Validation needs to be debuggable

A validator that says “this query looks suspicious” without explaining why is a validator that gets ignored. The output of every check should be addressable: a specific clause in the SQL, a specific column in the schema, a specific past query that disagreed with this one.

This sounds obvious, and it is, but a surprising number of AI data tools present validation as a single confidence percentage. Ninety-three percent confident in what, exactly? Compared to what? The author of the query has no way to act on that number, so the rational behavior is to ignore it.

l0l1’s validation output is intentionally verbose. Each finding includes:

  • The exact substring of the SQL the finding refers to.
  • The schema object (or absence of one) that triggered the finding.
  • A specific recommendation, not a vibe.
  • A pointer to a prior pattern in the workspace, if one is relevant.

This is much more useful than a percentage. It is also much harder to game.

Validation should learn, but carefully

A first-class validation layer is also a learning layer, but the learning needs to be honest about what it’s doing. l0l1’s pattern store records queries that the team has explicitly approved — queries that ran, returned plausible results, and were not subsequently rolled back. Those become the reference set for future validation.

The careful part is what gets learned. The pattern store does not record literals. It records query shape, table references, join structure, and the columns the team treats as authoritative. A query that passes validation because “we always join orders to users on user_id and the team has approved this 47 times” is a useful signal. A query that passes validation because “we always filter by email = 'specific.person@company.com'” is not — and storing that literal would be a privacy bug.

This is also where the line between learning and over-fitting matters. If the team has approved a particular join pattern 47 times and you are about to do something different, the validator should say so, but it should not automatically fail. Sometimes the new way is correct and the old way was a mistake nobody noticed. The validator surfaces the divergence; the human decides what to do about it.

Validation needs a UI that respects engineers

Engineers stop reading warnings the moment warnings outnumber actionable output. A validation layer that says everything is fine ninety percent of the time and says something useful ten percent of the time is a layer engineers trust. A layer that yells on every query is a layer engineers mute.

In practice this means:

  • Default to silent on green.
  • Group findings by severity and surface the highest first.
  • Suppress repeats; if the same warning fires three times in one session, collapse it.
  • Make it trivial to mark a finding as “intentional, don’t warn again in this workspace.”

l0l1 exposes validation through the CLI, REST API, Jupyter magic, and LSP-based IDE integration, and each surface implements those rules in its own idiomatic way. The CLI uses colors and exit codes. Jupyter uses cell output. The IDE uses squiggles. The API returns structured JSON. The point is that the validator’s job is to be a useful colleague, not a noisy alarm.

Validation is the product, not a feature

The biggest shift in mindset is treating the validation layer as the actual thing you’re shipping. The LLM is a commodity. Anyone can wire an OpenAI key into a chat box. What’s hard, and what’s defensible, is the layer that turns the model’s draft into a query you’d put your name on.

If you’re building an AI data tool right now, the question to ask isn’t “which model are we using?” but “what does our validator catch that the others don’t?” That’s the layer that determines whether your tool is a productivity multiplier or a quietly expensive way to put wrong numbers on the company dashboard.

The next post in this series gets into the part of validation that most teams hope they can avoid thinking about: privacy. What it actually means for a copilot to “see” your warehouse, and what l0l1 does and does not promise about that.

Want this in your stack? Read the docs or browse more posts.