How to Use — dbt & BigQuery¶

This guide walks through using droughty end-to-end with a dbt project on Google BigQuery. It covers everything from initial setup through to an ongoing development workflow.

Note

This guide is specific to dbt + BigQuery. For Snowflake usage see Warehouse Support, and for the general command reference see Commands.

Prerequisites¶

Before you start, you need:

A Google Cloud project with BigQuery enabled
gcloud CLI installed — install guide
dbt installed (pip install dbt-bigquery)
droughty installed (pip install droughty)
A dbt project with at least a staging schema deployed to BigQuery

Your BigQuery project should have at least two datasets. Three is the full setup droughty is built for:

myproject_staging         ← raw source data, lightly transformed
myproject_integration     ← cleaned and joined data
myproject_analytics       ← final analytics layer (what Looker/Cube reads)

Step 1 — Authenticate with BigQuery¶

The simplest local setup uses Application Default Credentials:

gcloud auth application-default login

Follow the browser prompts. Once complete, droughty will use these credentials automatically — no key file needed.

For CI/CD or shared environments, use a service account instead. See Authentication for full details.

Step 2 — Set up your dbt project for droughty¶

Droughty reads INFORMATION_SCHEMA metadata, including column descriptions. To make sure your dbt column descriptions flow through to the warehouse (and back into droughty), add persist_docs to your dbt_project.yml:

# dbt_project.yml
models:
  +persist_docs:
    relation: true
    columns: true

  my_project:
    staging:
      materialized: view
      schema: staging
    integration:
      materialized: view
      schema: integration
    warehouse:
      materialized: table

With persist_docs enabled, any description you write in your dbt YAML files gets written into BigQuery’s table and column metadata. Droughty picks this up and includes it in generated LookML views, Cube cubes, and docs — so your descriptions only need to live in one place.

Tip

Run dbt run after enabling persist_docs to push your existing descriptions into BigQuery before running droughty.

Step 3 — Create your droughty profile¶

Create the file ~/.droughty/profile.yaml:

mkdir -p ~/.droughty
touch ~/.droughty/profile.yaml

Add your project configuration:

my_profile:
  warehouse_name: big_query
  project_name: my-gcp-project           # your GCP project ID
  schema_name: myproject_analytics        # your analytics / warehouse dataset

  openai_secret: sk-...                   # only needed for `droughty docs` and `droughty qa`

  test_schemas:                           # datasets droughty dbt will scan
    - myproject_staging
    - myproject_integration
    - myproject_analytics

  dbml_schemas:                           # datasets droughty dbml will scan
    - myproject_staging
    - myproject_integration
    - myproject_analytics

Tip

test_schemas controls which BigQuery datasets get scanned for dbt test generation. Include every dataset you want test coverage on — typically all three layers.

Step 4 — Create your project file¶

Create droughty_project.yaml at the root of your dbt git repository:

profile: my_profile

# dbt test output
dbt_path: models/
dbt_tests_filename: schema

# LookML output
lookml_path: lookml/
lookml_base_filename: base
lookml_explore_filename: explores
lookml_measures_filename: measures
lookml_base_path: views/

# DBML / ERD output
dbml_path: db_docs/
dbml_filenames:
  - analytics_erd

# Cube output (if using Cube)
cube_path: schema/
cube_base_filename: base
cube_integration_filename: integrations
cube_measures_filename: measures

# Staging model output
stage_path: models/staging/

# AI field descriptions
openai_field_descriptions_path: warehouse_docs
openai_field_descriptions_filename: openai_field_descriptions

Step 5 — Generate your staging layer¶

When you add a new raw data source to BigQuery, use droughty stage to generate the dbt staging boilerplate automatically instead of writing it by hand.

droughty stage -p my-gcp-project -d raw_source_dataset

This reads every table in raw_source_dataset and generates:

models/staging/sources.yml           ← BigQuery source definitions
models/staging/stg_orders.sql        ← one staging model per table
models/staging/stg_customers.sql
models/staging/stg_products.sql
...

The staging SQL models select all columns from the source and cast types where appropriate. From here, you refine the models — renaming columns, adding business logic, defining surrogate keys — rather than starting from a blank file.

To target a single table instead of the whole dataset:

droughty stage -p my-gcp-project -d raw_source_dataset -t orders

Step 6 — Generate dbt tests¶

After running your dbt models (dbt run), generate schema tests for all your schemas:

droughty dbt

This scans every table in your test_schemas and writes models/schema.yml with:

A unique test on every column ending in _pk
A not_null test on every column
An expression_is_true: valid_from <= valid_to test where those columns exist

Example output (excerpt from models/schema.yml):

version: 2

models:
  - name: dim_customers
    columns:
      - name: customer_pk
        tests:
          - unique
          - not_null
      - name: customer_name
        tests:
          - not_null
      - name: email
        tests:
          - not_null

  - name: fct_orders
    columns:
      - name: order_pk
        tests:
          - unique
          - not_null
      - name: customer_fk
        tests:
          - not_null

Run the tests with:

dbt test

Customising what gets tested

To override the tests on a specific column:

# droughty_project.yaml
test_overwrite:
  models:
    fct_orders:
      order_pk:
        - not_null
        - unique
        - dbt_utils.at_least_one

To skip test generation for a table entirely:

test_ignore:
  models:
    - raw_events_log
    - tmp_dedupe_working

Step 7 — Generate LookML¶

Once your analytics layer is deployed in BigQuery, generate the Looker base layer:

droughty lookml

This reads your schema_name dataset (myproject_analytics) from BigQuery and generates:

lookml/views/base.lkml       ← one LookML view per table
lookml/explores.lkml         ← explore definitions
lookml/measures.lkml         ← measure definitions

What the generated views look like (excerpt):

view: dim_customers {
  sql_table_name: `my-gcp-project.myproject_analytics.dim_customers` ;;

  dimension: customer_pk {
    type: number
    sql: ${TABLE}.customer_pk ;;
    hidden: yes
  }

  dimension: customer_name {
    type: string
    sql: ${TABLE}.customer_name ;;
    label: "Customer Name"
  }

  dimension_group: created {
    type: time
    timeframes: [raw, date, week, month, quarter, year]
    sql: ${TABLE}.created_at ;;
  }
}

PKs and FKs are hidden automatically. Timestamps get dimension_group with standard timeframes. If your dbt column descriptions flowed through via persist_docs, they appear as description: fields in the view.

The right way to extend the base layer

Don’t edit the generated files directly — droughty will overwrite them next time you run. Instead, use Looker refinements in a separate file:

lookml/
├── views/
│   └── base.lkml              ← generated — do not edit
├── explores.lkml               ← generated — do not edit
├── measures.lkml               ← generated — do not edit
└── refinements/
    └── dim_customers.lkml     ← your customisations go here

Step 8 — Generate your ERD¶

Before raising a pull request, regenerate the ERD:

droughty dbml

This generates db_docs/analytics_erd.dbml containing all tables across your dbml_schemas, with primary key and foreign key relationships inferred from _pk / _fk column suffixes.

Upload it to dbdocs.io for a browsable ERD:

npx dbdocs build db_docs/analytics_erd.dbml

Or paste it directly into dbdiagram.io for a quick visual.

Step 9 — Generate AI field descriptions (optional)¶

If your warehouse has undocumented fields, droughty can use OpenAI to infer a first-pass description for every column that doesn’t already have one:

droughty docs

This writes warehouse_docs/openai_field_descriptions.md. Review the output, then copy approved descriptions back into your dbt column YAML. With persist_docs enabled, they’ll flow back into BigQuery INFORMATION_SCHEMA and be picked up by droughty on the next run.

Requires openai_secret in your profile.yaml.

Warning

Always review AI-generated descriptions before adding them to production documentation. The model makes reasonable guesses but can be wrong about domain-specific terminology.

Step 10 — Run QA validation (optional)¶

Before a data release or stakeholder demo, run the QA agent to validate your warehouse data against a set of assumptions:

droughty qa --assumptions-dir ./droughty_qa_assumptions.yaml

Requires openai_secret in your profile.yaml. Optionally add langsmith_secret and langsmith_project to trace all LLM calls through LangSmith.

Putting it all together — Recommended workflow¶

Here is how droughty fits into a day-to-day dbt + BigQuery development cycle:

When onboarding a new source:

# 1. Generate staging boilerplate from the raw BigQuery dataset
droughty stage -p my-gcp-project -d new_source_dataset

# 2. Refine the generated staging models in your editor

# 3. Run dbt to deploy the new models
dbt run --select staging.new_source+

When building or modifying models:

# 1. Build your models
dbt run

# 2. Regenerate schema tests to cover new tables/columns
droughty dbt

# 3. Run tests
dbt test

Before committing:

# Regenerate the Looker base layer so it reflects current warehouse
droughty lookml

# Regenerate the ERD for the PR description
droughty dbml

Before a release or QA checkpoint:

droughty qa --assumptions-dir ./droughty_qa_assumptions.yaml

As a pre-commit hook (keep Looker permanently in sync):

# .git/hooks/pre-commit
#!/bin/bash
set -e
droughty lookml
git add lookml/

CI/CD with BigQuery¶

In a CI/CD pipeline, use environment variables instead of local files:

# .github/workflows/droughty.yml (example)
name: Droughty

on:
  push:
    branches: [main]

jobs:
  generate:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Set up Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'

      - name: Install droughty
        run: pip install droughty

      - name: Generate dbt tests
        env:
          PROFILE_NAME: my_profile
          WAREHOUSE_NAME: big_query
          PROJECT_NAME: ${{ secrets.GCP_PROJECT_ID }}
          SCHEMA_NAME: myproject_analytics
          KEY_FILE: /tmp/key.json
        run: |
          echo '${{ secrets.GCP_SA_KEY }}' > /tmp/key.json
          droughty dbt --env-vars enabled
          droughty lookml --env-vars enabled

      - name: Commit generated files
        run: |
          git config user.name "droughty-bot"
          git config user.email "bot@example.com"
          git add models/schema.yml lookml/
          git diff --staged --quiet || git commit -m "chore: regenerate droughty outputs"
          git push

Naming conventions reference¶

Droughty’s BigQuery integration depends on these column naming conventions:

Convention	Effect
Column ends in `_pk`	Treated as primary key — hidden in LookML, tested with `unique` + `not_null` in dbt, adds `Ref` source in DBML
Column ends in `_fk`	Treated as foreign key — hidden in LookML, adds `Ref` target in DBML
Column named `valid_from` + `valid_to`	Adds an `expression_is_true: valid_from <= valid_to` test in dbt
Column type `TIMESTAMP` / `DATETIME`	Generates a `dimension_group` with time, date, week, month, quarter, year timeframes in LookML
Column description in BigQuery (via `persist_docs`)	Included as `description:` in LookML views and Cube cubes