How Text-to-SQL is Democratizing Data

In the information era, companies are storing increasing amounts of data: sales data, user and demographic information, social platforms usage events, logs, everything. This data is used by executives and leaders to understand customer behavior, product performance, and market trends. These leaders need to understand their company information, but most of them are not database specialists.

‍

Today, the primary interface to this data is still SQL. As a result, companies depend heavily on data engineers and analysts to translate business questions into queries and dashboards. This dependency slows decisions, creates bottlenecks, and keeps valuable insights just out of reach.

‍

To overcome this challenge, companies are looking for ways to enable their executives, leaders and managers to use their data without requiring SQL proficiency.

‍

The challenge of talking to databases

‍

Understanding what’s inside a database is one of the most complex parts of working with data.

You need to know which tables hold what, how they’re related, and how to join them correctly, just to answer a simple question.

‍

Imagine a stakeholder asking:

‍

> “Show me sales by region in 2023.”

‍

To get that answer, someone needs to write SQL, query the right tables, validate the results, and present them in a readable format.

For nontechnical teams, that process is anything but simple.

This isn't a new problem. Researchers have been studying this since 1961 (starting with the BASEBALL system), trying to make data analysis approachable without computer code.

Image 1: Introduction from the 1961 BASEBALL paper establishing the goal of enabling users to query computers directly in natural language

Source: BASEBALL: AN AUTOMATIC QUESTION-ANSWERER (1961) https://web.stanford.edu/class/linguist289/p219-green.pdf

‍

However, the introduction of Large Language Models (LLMs) changed the game. Unlike previous attempts, LLMs don't just match keywords; they understand meaning. They can translate intent into syntax. This advancement gave birth to modern systems where users can ask questions in plain English (or any language) and get precise, data-backed answers instantly.

‍

How the System Works

‍

A Text-to-SQL system needs to understand the user's question, transform this question into a SQL query, get the data from the database, and then answer the question.

‍

There are four pillars to a Text-to-SQL system:

Understanding the user’s question
Linking the database schema to the user’s question
Generating the correct SQL
Executing and retrieving the information to the user

‍

Image 2: From question to query: how NLIDB systems understand and answer user requests.

‍

Understanding the user’s question

‍

LLMs are great for understanding the user’s question. It can understand the user intent and contextualize it with information specific to the user’s domain and company. In this step, we can augment the user's question with information about the company, business rules for that domain, formulas and much more. Any information that is valuable and that even a human would need to know to understand the question should be added as context to the prompt.

‍

Example of business context augmentation

‍

User question:

"Show me sales by region in 2023."

‍

Business context:

- Returned purchases should not be counted

- Time aggregation should be monthly

- Year refers to the calendar year

‍

Linking the database to the user’s question (Schema Linking)

‍

This pillar is also called Schema Linking. The model needs to know which tables in your database are relevant to the specific question.

One of the LLM’s strengths is to learn new information from the prompt’s context, called in-context learning. This means the model doesn’t need to be trained on new database schemas, it can learn the new schema, all we need to do is give it. However, it cannot fit a massive enterprise schema into a single prompt.

To solve this, we create a Retrieval Augmented Generation (RAG) system, with the table schema and metadata (description of table, columns, relationships and usage notes). When a question is asked, the system retrieves only the relevant tables.

‍

Generating the correct SQL

With the relevant schema and the contextualized question, the model links the information together to create the correct SQL. We insert all this information in a prompt template, with instructions to the LLM on how to create the SQL query.

‍

A typical prompt template looks like this:

‍

```

You are an expert data analyst.

‍

Given:

‍

# The user question

{question}

‍

# Business rules

{context}

‍

# Database schema:

{schema}

‍

Generate a valid SQL query that answers the question. The SQL will run in a Postgres Database.

Return only SQL. Do not explain.

```

‍

After generating the query the system runs a validation step to check if the query fails and if it answers the question successfully. If the validation is not successful, we give this information back to the LLM and ask it to fix it.

‍

Executing and retrieving the information for the user

‍

Finally, the valid SQL is executed against the database. The results can be returned as a raw table for analysis, a generated graph, or fed back into the LLM to generate a natural language summary explaining the insights.

‍

Why It Matters

‍

Natural Language to SQL systems remove the dependency between business questions and technical answers. Decision-makers can explore data freely, asking complex questions like:

“Which products had the highest churn rate in Europe last year?”

Without waiting days for a report, insights become instant and interactive.

‍

Bridging the Gap

‍

We are moving past the era where data was locked behind a gatekeeper. By leveraging modern Text-to-SQL systems, we aren't just making querying easier; we are fundamentally changing the speed of business intelligence.

At Deverr, we are helping clients interact with data without barriers. We turn complex databases into conversational partners, ensuring that whether you are a startup or an enterprise, your data speaks your language.

‍