Commands¶
All commands follow the pattern droughty <command> [options] and must be run from inside a git repository.
Global Options¶
These options work on every command:
Option |
Description |
|---|---|
|
Path to |
|
Path to |
|
Read all config from environment variables instead of local YAML files (useful for CI/CD) |
lookml¶
droughty lookml
Why this exists
Your Looker semantic layer is just a reflection of your warehouse schema. But every time a column changes, gets added, or gets removed, someone has to go and manually update the LookML view. In a project with 30+ views across 3 schemas, that’s hours of mechanical work per sprint — and the work that doesn’t get done turns into stale dimensions, missing fields, and Looker views that silently reference columns that no longer exist.
Droughty removes this entirely. Run it as a pre-commit hook and your LookML base layer stays permanently in sync with the warehouse, with zero manual effort.
What it does
Reads every column from your analytics schema via INFORMATION_SCHEMA and generates:
A base views file with a LookML view for each table, containing dimensions for every column
An explores file with explore definitions and joins (if the
exploressection is configured)A measures file with measure definitions
Data type mapping from warehouse → LookML:
Warehouse type |
BigQuery |
Snowflake |
|---|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Primary key fields (_pk suffix) and foreign key fields (_fk suffix) are hidden by default. Column descriptions from the warehouse are included where available.
Output
lookml/base.lkml # Views with all dimensions
lookml/explores.lkml # Explore definitions and joins
lookml/measures.lkml # Measure definitions
Output paths are configured via lookml_path, lookml_base_filename, lookml_explore_filename, lookml_measures_filename in droughty_project.yaml.
Examples
droughty lookml
# Custom config paths
droughty lookml --profile-dir /custom/path --project-dir ./
# In CI/CD with environment variables
droughty lookml --env-vars enabled
Optional config in droughty_project.yaml
Add explores with dimensional inference:
explores:
parent_table:
- my_fact_table
dimensions:
- my_dim_table
facts:
- my_fact_table
Add Period-over-Period dashboard parameters:
lookml_pop:
views:
orders_view:
- orders_daily_fact
dbt¶
droughty dbt
Why this exists
Writing schema.yml tests is the task that always gets pushed to “later” and then never happens. Without baseline tests, you don’t find out about broken primary keys or unexpected nulls until a dashboard shows a problem to a stakeholder — at which point the data has been wrong for weeks.
Droughty generates test coverage for every single table in your schemas instantly. There’s no longer any reason not to have it.
What it does
Scans all tables in your test_schemas (defined in profile.yaml) and generates a schema.yml file with:
A model definition for every table
Column definitions with types and descriptions
uniquetest on every primary key column (_pksuffix)not_nulltest on every columnexpression_is_true: valid_from <= valid_towhere both columns exist
Output
models/schema.yml
Path is configured via dbt_path and dbt_tests_filename in droughty_project.yaml.
Examples
droughty dbt
# CI/CD
droughty dbt --env-vars enabled
Customising tests
Override the generated tests for specific columns:
# droughty_project.yaml
test_overwrite:
models:
my_table_name:
my_pk_field:
- not_null
- unique
Skip all test generation for specific tables entirely:
test_ignore:
models:
- raw_events_table
- temp_staging_table
dbml¶
droughty dbml
Why this exists
Nobody updates the ERD. It was accurate when the project kicked off and it’s been wrong ever since. When an engineer needs to understand how tables relate to each other — which happens constantly — they either dig through dbt YAML files, ask someone, or guess. All three options waste time.
Droughty generates the ERD directly from the live warehouse on demand. Run it before raising a PR and your documentation is always current.
What it does
Reads all tables and columns from your dbml_schemas (defined in profile.yaml) and generates DBML files that:
Define a
Tablefor every table in the schemaAdd a
Columnfor every field, with correct data typesAdd
Refrelationships by matching_pkand_fkcolumn suffixes across tables
The output is ready to use with dbdocs.io or dbdiagram.io.
Output
db_docs/<filename>.dbml
Path is configured via dbml_path and dbml_filenames in droughty_project.yaml. Multiple filenames generate multiple files.
Example
droughty dbml
cube¶
droughty cube
Why this exists
The same schema drift problem that affects Looker affects Cube. When your warehouse schema changes, your Cube cubes need updating. Without automation, that work gets skipped and the Cube semantic layer quietly falls out of sync with what’s actually in the warehouse. Droughty keeps it in sync the same way it keeps LookML in sync.
What it does
Reads your analytics schema and generates Cube.js schema files with:
A cube for every table, with dimensions for every column
Data type mappings from warehouse types to Cube types
Explore and join definitions (based on the
exploresconfig)Measure definitions
Output
schema/base.yml # Cube definitions with dimensions
schema/integrations.yml # Join/integration definitions
schema/measures.yml # Measure definitions
Paths configured via cube_path, cube_base_filename, cube_integration_filename, cube_measures_filename in droughty_project.yaml.
Example
droughty cube
docs¶
droughty docs
Why this exists
Undocumented fields are the silent enemy of self-service analytics. When a column has no description, analysts either guess what it means, file a Slack question, or just avoid using it altogether. The reason fields go undocumented isn’t that people don’t care — it’s that writing a description for every single column in a 300-column warehouse is mind-numbing work.
Droughty uses OpenAI to infer a reasonable first-pass description for every undocumented field in seconds. It won’t be perfect, but “probably correct and worth reviewing” is infinitely better than nothing.
What it does
Reads existing field descriptions from your warehouse and from a local
field_descriptions.mdfileFor every field that does not already have a description, calls the OpenAI API to infer one from the column name and data type
Writes the results to a markdown file for review
Note
Always review AI-generated descriptions before using them in production documentation. The model makes reasonable guesses but can be wrong about domain-specific terminology.
Requirements
openai_secretmust be set in~/.droughty/profile.yaml
Output
warehouse_docs/openai_field_descriptions.md
Path configured via openai_field_descriptions_path and openai_field_descriptions_filename in droughty_project.yaml.
Example
droughty docs
qa¶
droughty qa
# With a custom assumptions file
droughty qa --assumptions-dir ./droughty_qa_assumptions.yaml
Why this exists
Data quality issues are almost always found by end users, not by the team that built the pipeline. By the time someone files a ticket saying “these numbers look wrong”, the bad data has usually been sitting in the warehouse for days or weeks.
The QA agent runs a set of validation checks against your actual warehouse data, using an LLM to interpret results and flag anomalies. It catches problems proactively — before they become stakeholder complaints.
What it does
Builds a LangGraph state machine that:
Validates your warehouse data against a set of assumptions you define
Uses the OpenAI API to interpret results and identify anomalies
Outputs a Mermaid diagram of the validation DAG for traceability
(Optional) traces all LLM calls through LangSmith for observability
Additional option
Option |
Description |
|---|---|
|
Path to a YAML file defining your QA validation rules and assumptions |
Requirements
openai_secretin~/.droughty/profile.yamllangsmith_secretandlangsmith_project(optional — enables LangSmith tracing)
Output
Mermaid DAG files in the langgraph/ directory.
Example
droughty qa --assumptions-dir ./droughty_qa_assumptions.yaml
stage¶
droughty stage -p <gcp_project_id> -d <dataset1> [dataset2 ...] [-t <table>]
Why this exists
Onboarding a new data source in dbt means writing the same boilerplate every single time: a sources.yml block, a stg_ SQL model that selects and renames columns, type casts, maybe a _pk surrogate key. It’s completely mechanical. For a source with 20 tables you’re looking at an afternoon of copy-paste work.
Droughty reads the source schema from BigQuery and generates all of it in one command.
Note
This command is BigQuery only and currently requires macOS or Linux.
What it does
Reads the specified BigQuery datasets and generates:
A
sources.ymlwith source definitions for every tableA
stg_<table_name>.sqldbt model for each table
Options
Option |
Required |
Description |
|---|---|---|
|
Yes |
GCP project ID |
|
Yes |
One or more BigQuery dataset names to generate staging models for |
|
No |
Target a single table within the dataset instead of all tables |
Output
models/staging/sources.yml
models/staging/stg_<table_name>.sql
Path configured via stage_path in droughty_project.yaml.
Examples
# Stage all tables in a dataset
droughty stage -p my-gcp-project -d raw_data
# Stage tables across multiple datasets
droughty stage -p my-gcp-project -d raw_data events_data
# Stage a single table
droughty stage -p my-gcp-project -d raw_data -t orders
Command Quick Reference¶
Command |
The problem it solves |
Run it when… |
Primary output |
|---|---|---|---|
Looker drifting from the warehouse |
Schema changes, pre-commit |
|
|
No test coverage on warehouse tables |
After building models |
|
|
ERD is always out of date |
Before raising a PR |
|
|
Cube drifting from the warehouse |
Schema changes |
|
|
Fields have no descriptions |
When documentation is lacking |
|
|
Data issues found by stakeholders |
Before data releases |
|
|
Boilerplate staging models |
Onboarding a new source |
|