Connecting to Databricks

Connect your Databricks account to CloudZero to bring your Databricks cost and usage data into a unified view of all your cloud and SaaS spend. CloudZero organizes your Databricks costs, along with your other costs, into categories (called Dimensions) that matter most to your business. For example: business unit, team, product, feature, environment, customer, or virtually anything else. Once connected and organized, you can quickly answer questions about your spend that matter to your stakeholders.

This guide covers connecting a Databricks account purchased directly from Databricks on AWS. CloudZero queries your Databricks system tables through a service principal to pull billing, pricing, and compute data across all workspaces in the account.

ℹ️
If you purchased Databricks through the Azure or GCP Marketplace, your Databricks costs are included in your Azure or GCP billing connection. You can add the Databricks connector for more granular data, but this introduces duplicate cost data into CloudZero.

What you need

CloudZero user with data configuration permissions
A Databricks workspace with Unity Catalog enabled
The Databricks CLI installed and configured
A SQL warehouse in the workspace (CloudZero does not require a dedicated warehouse)

Overview

Connecting Databricks takes three steps:

Enable the billing and compute system schemas
Create and configure a service principal
Configure the connection in CloudZero

Step 1: Enable billing and compute system schemas

CloudZero reads from the system.billing and system.compute schemas in your Unity Catalog workspace. For full details, see Databricks' system tables documentation.

Find your workspace ID:
```
databricks account workspaces list
```

Get the metastore ID for your workspace:

databricks account metastore-assignments get <workspace-id>

Enable the billing and compute schemas:

databricks system-schemas enable <metastore-id> billing
databricks system-schemas enable <metastore-id> compute

ℹ️
If you have not used the Databricks CLI before, run databricks auth login to set up a profile. You need your Databricks host (https://accounts.cloud.databricks.com) and your account ID.

Step 2: Create and configure a service principal

CloudZero connects through a Databricks service principal with read access to the billing and compute system tables. For full details on authorization, see Databricks' authorization documentation.

2a. Create the service principal

In the Databricks account console, navigate to User Management > Service principals.
Select Add Service principal.
Enter a name and select Add.
Click the new service principal in the list.
Select Generate Secret. Copy and save both the Secret and Client ID. The secret is not shown again.

2b. Grant workspace access

In the Databricks account console, navigate to Workspaces.
Find the workspace where you enabled the system schemas. Click the menu on the far right and select Update.
Select Permissions > Add permissions.
Add the service principal by its Client ID. It needs only User permissions.

2c. Grant warehouse access

Log in to the workspace.
Select the SQL warehouse you want CloudZero to use.
Select Permissions (requires admin access to the workspace).
Grant the service principal the Can Use permission.

2d. Grant SQL and system table access

In the workspace, ensure the service principal has the Databricks SQL access entitlement. See Databricks' entitlement documentation for details.

Open a SQL editor in the workspace and run the following commands:

GRANT USE SCHEMA ON SCHEMA system.compute TO `<service-principal-client-id>`;
GRANT SELECT ON TABLE system.compute.clusters TO `<service-principal-client-id>`;
GRANT USE SCHEMA ON SCHEMA system.billing TO `<service-principal-client-id>`;
GRANT SELECT ON TABLE system.billing.list_prices TO `<service-principal-client-id>`;
GRANT SELECT ON TABLE system.billing.account_prices TO `<service-principal-client-id>`;
GRANT SELECT ON TABLE system.billing.usage TO `<service-principal-client-id>`;

The service principal now has the permissions CloudZero needs.

Step 3: Configure the connection in CloudZero

In CloudZero, go to Settings > Cloud Connections. Select Create Connection + and select the Databricks tile.

Field	Description
Connection Name	A display name for this connection in CloudZero
Billing Account ID	Your Databricks account ID (how to find it)
Workspace URL	The URL of the workspace where you enabled the system schemas
Warehouse ID	The ID of the SQL warehouse CloudZero uses for queries
Client ID	The service principal Client ID from Step 2
Client Secret	The service principal secret from Step 2
Use Fixed IP Egress	Enable if your Databricks account restricts access by IP address. See Fixed IP egress.

Select Create Connection to create the connection.

What to expect

After you create the connection, CloudZero begins pulling your Databricks cost and usage data:

New connection: CloudZero pulls up to 12 months of historical billing data if available.
Re-enabled connection: CloudZero pulls up to 24 months of billing data, starting from the current period back to the last previously ingested period.
Steady state: CloudZero pulls the current and previous billing periods.

Cost data appears in the Explorer within 24 hours. One connection covers all workspaces in the account; you do not need a separate connection per workspace.

Connection details

Tag	Description
`dbx_cz:cluster_name`	Cluster name
`dbx_cz:cluster_id`	Cluster ID
`dbx_cz:cluster_source`	Cluster source
`dbx_cz:dbr_version`	Databricks Runtime version
`dbx_cz:dlt_pipeline_id`	Delta Live Tables pipeline ID
`dbx_cz:driver_instance_pool_id`	Driver instance pool ID
`dbx_cz:driver_node_type`	Driver node type
`dbx_cz:instance_pool_id`	Instance pool ID
`dbx_cz:job_id`	Job ID
`dbx_cz:job_run_id`	Job run ID
`dbx_cz:notebook_id`	Notebook ID
`dbx_cz:owned_by`	Owner
`dbx_cz:warehouse_id`	Warehouse ID
`dbx_cz:worker_instance_pool_id`	Worker instance pool ID
`dbx_cz:worker_node_type`	Worker node type
`dbx_cz:workspace_id`	Workspace ID

Pricing and discounts

CloudZero uses default Databricks SKU pricing. Two override options are available:

Method	Details
Automated (Databricks Private Preview)	If your account has access to the `account_prices` system table, CloudZero automatically applies your negotiated rates and discounts. The SQL GRANT statements in Step 2d already include this table.
Manual	For accounts without `account_prices` access, contact your account manager to configure SKU rate overrides.

Region and service mapping

CloudZero splits the Databricks sku_name_with_region field into separate Service and Region values:

Databricks `sku_name_with_region`	CloudZero Service	CloudZero Region
`ENTERPRISE_SERVERLESS_SQL_COMPUTE_US_EAST_N_VIRGINIA`	`ENTERPRISE_SERVERLESS_SQL_COMPUTE`	`US_EAST_N_VIRGINIA`
`INTER_AVAILABILITY_ZONE_EGRESS`	`INTER_AVAILABILITY_ZONE_EGRESS`	None

Grouping by Service in the Explorer combines costs across regions. Add a Region filter for region-specific breakdowns.

Maintenance and configuration

Fixed IP egress

If your Databricks account restricts access by IP address:

Enable Use Fixed IP Egress in the CloudZero connection settings.
In Databricks, go to Account Console > Settings > Security > IP Access List.
Add 52.0.118.180 and 52.0.33.111.

If your organization also restricts IPs at the workspace level, add the same addresses to the workspace IP Access List.

ℹ️
Have questions or feedback? Reach out to your account manager.

What you need

Overview

Step 1: Enable billing and compute system schemas

Step 2: Create and configure a service principal

2a. Create the service principal

2b. Grant workspace access

2c. Grant warehouse access

2d. Grant SQL and system table access

Step 3: Configure the connection in CloudZero

What to expect

Connection details

Tags

Pricing and discounts

Region and service mapping

Maintenance and configuration

Fixed IP egress