Commands¶

All commands follow the pattern droughty <command> [options] and must be run from inside a git repository.

Global Options¶

These options work on every command:

Option	Description
`--profile-dir <path>`	Path to `profile.yaml`. Defaults to `~/.droughty/profile.yaml`
`--project-dir <path>`	Path to `droughty_project.yaml`. Defaults to the git repository root
`--env-vars enabled`	Read all config from environment variables instead of local YAML files (useful for CI/CD)

lookml¶

droughty lookml

Why this exists

Your Looker semantic layer is just a reflection of your warehouse schema. But every time a column changes, gets added, or gets removed, someone has to go and manually update the LookML view. In a project with 30+ views across 3 schemas, that’s hours of mechanical work per sprint — and the work that doesn’t get done turns into stale dimensions, missing fields, and Looker views that silently reference columns that no longer exist.

Droughty removes this entirely. Run it as a pre-commit hook and your LookML base layer stays permanently in sync with the warehouse, with zero manual effort.

What it does

Reads every column from your analytics schema via INFORMATION_SCHEMA and generates:

A base views file with a LookML view for each table, containing dimensions for every column
An explores file with explore definitions and joins (if the explores section is configured)
A measures file with measure definitions

Data type mapping from warehouse → LookML:

Warehouse type	BigQuery	Snowflake
`number`	`INT64`, `NUMERIC`, `FLOAT64`	`NUMBER`, `INTEGER`, `FLOAT`
`string`	`STRING`	`TEXT`, `VARCHAR`
`time`	`TIMESTAMP`, `DATETIME`	`TIMESTAMP`, `TIMESTAMP_TZ`
`date`	`DATE`	`DATE`
`yesno`	`BOOL`	`BOOLEAN`

Primary key fields (_pk suffix) and foreign key fields (_fk suffix) are hidden by default. Column descriptions from the warehouse are included where available.

Output

lookml/base.lkml        # Views with all dimensions
lookml/explores.lkml    # Explore definitions and joins
lookml/measures.lkml    # Measure definitions

Output paths are configured via lookml_path, lookml_base_filename, lookml_explore_filename, lookml_measures_filename in droughty_project.yaml.

Examples

droughty lookml

# Custom config paths
droughty lookml --profile-dir /custom/path --project-dir ./

# In CI/CD with environment variables
droughty lookml --env-vars enabled

Optional config in droughty_project.yaml

Add explores with dimensional inference:

explores:
  parent_table:
    - my_fact_table
  dimensions:
    - my_dim_table
  facts:
    - my_fact_table

Add Period-over-Period dashboard parameters:

lookml_pop:
  views:
    orders_view:
      - orders_daily_fact

dbt¶

droughty dbt

Why this exists

Writing schema.yml tests is the task that always gets pushed to “later” and then never happens. Without baseline tests, you don’t find out about broken primary keys or unexpected nulls until a dashboard shows a problem to a stakeholder — at which point the data has been wrong for weeks.

Droughty generates test coverage for every single table in your schemas instantly. There’s no longer any reason not to have it.

What it does

Scans all tables in your test_schemas (defined in profile.yaml) and generates a schema.yml file with:

A model definition for every table
Column definitions with types and descriptions
unique test on every primary key column (_pk suffix)
not_null test on every column
expression_is_true: valid_from <= valid_to where both columns exist

Output

models/schema.yml

Path is configured via dbt_path and dbt_tests_filename in droughty_project.yaml.

Examples

droughty dbt

# CI/CD
droughty dbt --env-vars enabled

Customising tests

Override the generated tests for specific columns:

# droughty_project.yaml
test_overwrite:
  models:
    my_table_name:
      my_pk_field:
        - not_null
        - unique

Skip all test generation for specific tables entirely:

test_ignore:
  models:
    - raw_events_table
    - temp_staging_table

dbml¶

droughty dbml

Why this exists

Nobody updates the ERD. It was accurate when the project kicked off and it’s been wrong ever since. When an engineer needs to understand how tables relate to each other — which happens constantly — they either dig through dbt YAML files, ask someone, or guess. All three options waste time.

Droughty generates the ERD directly from the live warehouse on demand. Run it before raising a PR and your documentation is always current.

What it does

Reads all tables and columns from your dbml_schemas (defined in profile.yaml) and generates DBML files that:

Define a Table for every table in the schema
Add a Column for every field, with correct data types
Add Ref relationships by matching _pk and _fk column suffixes across tables

The output is ready to use with dbdocs.io or dbdiagram.io.

Output

db_docs/<filename>.dbml

Path is configured via dbml_path and dbml_filenames in droughty_project.yaml. Multiple filenames generate multiple files.

Example

droughty dbml

cube¶

droughty cube

Why this exists

The same schema drift problem that affects Looker affects Cube. When your warehouse schema changes, your Cube cubes need updating. Without automation, that work gets skipped and the Cube semantic layer quietly falls out of sync with what’s actually in the warehouse. Droughty keeps it in sync the same way it keeps LookML in sync.

What it does

Reads your analytics schema and generates Cube.js schema files with:

A cube for every table, with dimensions for every column
Data type mappings from warehouse types to Cube types
Explore and join definitions (based on the explores config)
Measure definitions

Output

schema/base.yml             # Cube definitions with dimensions
schema/integrations.yml     # Join/integration definitions
schema/measures.yml         # Measure definitions

Paths configured via cube_path, cube_base_filename, cube_integration_filename, cube_measures_filename in droughty_project.yaml.

Example

droughty cube

docs¶

droughty docs

Why this exists

Undocumented fields are the silent enemy of self-service analytics. When a column has no description, analysts either guess what it means, file a Slack question, or just avoid using it altogether. The reason fields go undocumented isn’t that people don’t care — it’s that writing a description for every single column in a 300-column warehouse is mind-numbing work.

Droughty uses OpenAI to infer a reasonable first-pass description for every undocumented field in seconds. It won’t be perfect, but “probably correct and worth reviewing” is infinitely better than nothing.

What it does

Reads existing field descriptions from your warehouse and from a local field_descriptions.md file
For every field that does not already have a description, calls the OpenAI API to infer one from the column name and data type
Writes the results to a markdown file for review

Note

Always review AI-generated descriptions before using them in production documentation. The model makes reasonable guesses but can be wrong about domain-specific terminology.

Requirements

openai_secret must be set in ~/.droughty/profile.yaml

Output

warehouse_docs/openai_field_descriptions.md

Path configured via openai_field_descriptions_path and openai_field_descriptions_filename in droughty_project.yaml.

Example

droughty docs

qa¶

droughty qa

# With a custom assumptions file
droughty qa --assumptions-dir ./droughty_qa_assumptions.yaml

Why this exists

Data quality issues are almost always found by end users, not by the team that built the pipeline. By the time someone files a ticket saying “these numbers look wrong”, the bad data has usually been sitting in the warehouse for days or weeks.

The QA agent runs a set of validation checks against your actual warehouse data, using an LLM to interpret results and flag anomalies. It catches problems proactively — before they become stakeholder complaints.

What it does

Builds a LangGraph state machine that:

Validates your warehouse data against a set of assumptions you define
Uses the OpenAI API to interpret results and identify anomalies
Outputs a Mermaid diagram of the validation DAG for traceability
(Optional) traces all LLM calls through LangSmith for observability

Additional option

Option	Description
`--assumptions-dir <path>`	Path to a YAML file defining your QA validation rules and assumptions

Requirements

openai_secret in ~/.droughty/profile.yaml
langsmith_secret and langsmith_project (optional — enables LangSmith tracing)

Output

Mermaid DAG files in the langgraph/ directory.

Example

droughty qa --assumptions-dir ./droughty_qa_assumptions.yaml

stage¶

droughty stage -p <gcp_project_id> -d <dataset1> [dataset2 ...] [-t <table>]

Why this exists

Onboarding a new data source in dbt means writing the same boilerplate every single time: a sources.yml block, a stg_ SQL model that selects and renames columns, type casts, maybe a _pk surrogate key. It’s completely mechanical. For a source with 20 tables you’re looking at an afternoon of copy-paste work.

Droughty reads the source schema from BigQuery and generates all of it in one command.

Note

This command is BigQuery only and currently requires macOS or Linux.

What it does

Reads the specified BigQuery datasets and generates:

A sources.yml with source definitions for every table
A stg_<table_name>.sql dbt model for each table

Options

Option	Required	Description
`-p <project_id>`	Yes	GCP project ID
`-d <datasets...>`	Yes	One or more BigQuery dataset names to generate staging models for
`-t <table>`	No	Target a single table within the dataset instead of all tables

Output

models/staging/sources.yml
models/staging/stg_<table_name>.sql

Path configured via stage_path in droughty_project.yaml.

Examples

# Stage all tables in a dataset
droughty stage -p my-gcp-project -d raw_data

# Stage tables across multiple datasets
droughty stage -p my-gcp-project -d raw_data events_data

# Stage a single table
droughty stage -p my-gcp-project -d raw_data -t orders

Command Quick Reference¶

Command	The problem it solves	Run it when…	Primary output
lookml	Looker drifting from the warehouse	Schema changes, pre-commit	`lookml/base.lkml`
dbt	No test coverage on warehouse tables	After building models	`models/schema.yml`
dbml	ERD is always out of date	Before raising a PR	`db_docs/*.dbml`
cube	Cube drifting from the warehouse	Schema changes	`schema/base.yml`
docs	Fields have no descriptions	When documentation is lacking	`warehouse_docs/*.md`
qa	Data issues found by stakeholders	Before data releases	`langgraph/` Mermaid DAG
stage	Boilerplate staging models	Onboarding a new source	`models/staging/`