What is dbt and Why Every Data Team Is Adopting It

SQL has been around for over 50 years. Every analyst at every company already knows it. So why did the data industry suddenly need a new tool to run SQL? The answer isn't about the SQL — it's about everything that happened to the SQL after someone wrote it. Where it lived. Whether it was tested. Whether anyone could understand it six months later. dbt solves the problem that comes after you write the query.

The analogy: version control for your SQL

Before dbt, a typical data team’s transformation logic lived in a mess of stored procedures in the database, Airflow tasks with embedded SQL, Python scripts on someone’s laptop, and views that nobody dared to change.

No git history. No tests. No documentation. No way to know what depended on what. If the analyst who built revenue_by_channel left the company, that query was a black box.

dbt is what happens when you apply software engineering discipline to SQL.Version control, automated testing, documentation, dependency management — the same rigour developers have applied to application code for decades, now applied to the queries that power your dashboards.

What dbt actually does

At its core, dbt is surprisingly simple. You write .sql files. Each file is a model — a SELECT statement that defines a table or view in your warehouse. dbt compiles those files, figures out the correct order to run them (based on dependencies you declare with ref()), runs them against your warehouse, and optionally runs tests on the results.

That’s it. dbt doesn’t move data. It doesn’t connect to source systems. It doesn’t store data. It just takes your SQL, understands the dependencies between your models, and runs them in the right sequence inside BigQuery, Snowflake, or Redshift.

The ref() function — the key concept

The most important thing in dbt is a function called ref(). Instead of hardcoding a table name in your SQL, you write:

SELECT * FROM {{ ref('stg_orders') }}

dbt sees this and knows: “this model depends on stg_orders. Runstg_orders first, then this model.” Build a chain of ref()calls and dbt automatically figures out the entire dependency graph — the lineage. Change one model and dbt knows exactly which downstream models are affected.

The three-layer structure

Most dbt projects follow a three-layer pattern:

Sources — raw tables loaded from source systems (your app database, Stripe, Salesforce). Defined in schema.yml but not transformed. Never modified.
Staging models — one model per source table. Rename columns to consistent conventions, cast data types, add basic cleaning. Nothing complex. Example: stg_orders.sql renames o_id to order_idand casts created_at to a proper timestamp.
Mart models — business-logic-heavy models that analytics teams and BI tools query directly. Fact tables, dimension tables, and aggregated metrics. Example: fct_orders.sql joins staging orders with staging customers and calculates revenue after discounts.

This structure means any analyst can open the project and immediately understand how data flows from source to report. The layers are a map.

Testing — the underrated feature

dbt ships with four built-in tests you can apply to any column in a schema.ymlfile:

not_null — this column should never be null
unique — every value in this column should be distinct
accepted_values — values must be from a defined list
relationships — every value must exist in another table (referential integrity)

Run dbt test after every build. If order_id suddenly has duplicates or status contains an unexpected value, you find out immediately — not when an analyst notices a wrong number in a dashboard two weeks later.

What dbt is NOT

It’s easy to overstate what dbt does. To be clear:

dbt is not an orchestrator. It doesn’t schedule jobs. You still need Airflow, Prefect, or dbt Cloud’s scheduler to run dbt on a schedule.
dbt is not a data loader. It can’t pull data from Stripe or your app database. That’s Fivetran, Airbyte, or a custom ingestion pipeline.
dbt is not a BI tool. It doesn’t build dashboards or charts. Looker, Metabase, or Tableau connects to the tables dbt creates.

dbt’s job is the T in ELT — and only the T. It does that one job exceptionally well.

The modern data stack: Fivetran loads raw data → dbt transforms it → Metabase or Looker visualises it → Airflow or dbt Cloud schedules the whole thing. dbt is the transformation layer and nothing else.

The analytics engineer role

dbt is largely responsible for a job title that didn’t exist ten years ago: the analytics engineer. Not a data engineer (who builds ingestion pipelines), not a data analyst (who queries finished data), but someone in between — someone who writes production-quality SQL transformations that analysts rely on.

The role exists because dbt made it possible to apply engineering rigour to what was previously ad-hoc analyst work. When your SQL has tests, version control, and documentation, it becomes infrastructure — not just a query.

For growing companies, the practical implication is this: if you’re building a data warehouse and you’re writing more than a handful of SQL transformations, you should be using dbt. The overhead of setting it up is small. The benefit of having documented, tested, version-controlled transformations compounds every time someone new joins your data team.

What is dbt and Why Every Data Team Is Adopting It

The analogy: version control for your SQL

What dbt actually does

The ref() function — the key concept

The three-layer structure

Testing — the underrated feature

What dbt is NOT

The analytics engineer role

Working on something similar?