Connecting to Databricks

Connect your Databricks account to CloudZero to bring your Databricks cost and usage data into a unified view of all your cloud and SaaS spend. CloudZero organizes your Databricks costs, along with your other costs, into categories (called Dimensions) that matter most to your business. For example: business unit, team, product, feature, environment, customer, or virtually anything else. Once connected and organized, you can quickly answer questions about your spend that matter to your stakeholders.

This guide covers connecting a Databricks account purchased directly from Databricks on AWS. CloudZero queries your Databricks system tables through a service principal to pull billing, pricing, and compute data across all workspaces in the account.

ℹ️

If you purchased Databricks through the Azure or GCP Marketplace, your Databricks costs are included in your Azure or GCP billing connection. You can add the Databricks connector for more granular data, but this introduces duplicate cost data into CloudZero.

What you need

  • CloudZero user with data configuration permissions
  • A Databricks workspace with Unity Catalog enabled
  • The Databricks CLI installed and configured
  • A SQL warehouse in the workspace (CloudZero does not require a dedicated warehouse)

Overview

Connecting Databricks takes three steps:

  1. Enable the billing and compute system schemas
  2. Create and configure a service principal
  3. Configure the connection in CloudZero

Step 1: Enable billing and compute system schemas

CloudZero reads from the system.billing and system.compute schemas in your Unity Catalog workspace. For full details, see Databricks' system tables documentation.

  1. Find your workspace ID:

    databricks account workspaces list
  2. Get the metastore ID for your workspace:

    databricks account metastore-assignments get <workspace-id>
  3. Enable the billing and compute schemas:

    databricks system-schemas enable <metastore-id> billing
    databricks system-schemas enable <metastore-id> compute
ℹ️

If you have not used the Databricks CLI before, run databricks auth login to set up a profile. You need your Databricks host (https://accounts.cloud.databricks.com) and your account ID.

Step 2: Create and configure a service principal

CloudZero connects through a Databricks service principal with read access to the billing and compute system tables. For full details on authorization, see Databricks' authorization documentation.

2a. Create the service principal

  1. In the Databricks account console, navigate to User Management > Service principals.
  2. Select Add Service principal.
  3. Enter a name and select Add.
  4. Click the new service principal in the list.
  5. Select Generate Secret. Copy and save both the Secret and Client ID. The secret is not shown again.

2b. Grant workspace access

  1. In the Databricks account console, navigate to Workspaces.
  2. Find the workspace where you enabled the system schemas. Click the menu on the far right and select Update.
  3. Select Permissions > Add permissions.
  4. Add the service principal by its Client ID. It needs only User permissions.

2c. Grant warehouse access

  1. Log in to the workspace.
  2. Select the SQL warehouse you want CloudZero to use.
  3. Select Permissions (requires admin access to the workspace).
  4. Grant the service principal the Can Use permission.

2d. Grant SQL and system table access

  1. In the workspace, ensure the service principal has the Databricks SQL access entitlement. See Databricks' entitlement documentation for details.

  2. Open a SQL editor in the workspace and run the following commands:

    GRANT USE SCHEMA ON SCHEMA system.compute TO `<service-principal-client-id>`;
    GRANT SELECT ON TABLE system.compute.clusters TO `<service-principal-client-id>`;
    GRANT USE SCHEMA ON SCHEMA system.billing TO `<service-principal-client-id>`;
    GRANT SELECT ON TABLE system.billing.list_prices TO `<service-principal-client-id>`;
    GRANT SELECT ON TABLE system.billing.account_prices TO `<service-principal-client-id>`;
    GRANT SELECT ON TABLE system.billing.usage TO `<service-principal-client-id>`;

The service principal now has the permissions CloudZero needs.

Step 3: Configure the connection in CloudZero

In CloudZero, go to Settings > Cloud Connections. Select Create Connection + and select the Databricks tile.

FieldDescription
Connection NameA display name for this connection in CloudZero
Billing Account IDYour Databricks account ID (how to find it)
Workspace URLThe URL of the workspace where you enabled the system schemas
Warehouse IDThe ID of the SQL warehouse CloudZero uses for queries
Client IDThe service principal Client ID from Step 2
Client SecretThe service principal secret from Step 2
Use Fixed IP EgressEnable if your Databricks account restricts access by IP address. See Fixed IP egress.

Select Create Connection to create the connection.

What to expect

After you create the connection, CloudZero begins pulling your Databricks cost and usage data:

  • New connection: CloudZero pulls up to 12 months of historical billing data if available.
  • Re-enabled connection: CloudZero pulls up to 24 months of billing data, starting from the current period back to the last previously ingested period.
  • Steady state: CloudZero pulls the current and previous billing periods.

Cost data appears in the Explorer within 24 hours. One connection covers all workspaces in the account; you do not need a separate connection per workspace.

Connection details

Tags

The Databricks connection provides tags with the dbx_cz prefix for use in Dimensions. Customer-created tags pass through exactly as they appear in Databricks.

TagDescription
dbx_cz:cluster_nameCluster name
dbx_cz:cluster_idCluster ID
dbx_cz:cluster_sourceCluster source
dbx_cz:dbr_versionDatabricks Runtime version
dbx_cz:dlt_pipeline_idDelta Live Tables pipeline ID
dbx_cz:driver_instance_pool_idDriver instance pool ID
dbx_cz:driver_node_typeDriver node type
dbx_cz:instance_pool_idInstance pool ID
dbx_cz:job_idJob ID
dbx_cz:job_run_idJob run ID
dbx_cz:notebook_idNotebook ID
dbx_cz:owned_byOwner
dbx_cz:warehouse_idWarehouse ID
dbx_cz:worker_instance_pool_idWorker instance pool ID
dbx_cz:worker_node_typeWorker node type
dbx_cz:workspace_idWorkspace ID

Pricing and discounts

CloudZero uses default Databricks SKU pricing. Two override options are available:

MethodDetails
Automated (Databricks Private Preview)If your account has access to the account_prices system table, CloudZero automatically applies your negotiated rates and discounts. The SQL GRANT statements in Step 2d already include this table.
ManualFor accounts without account_prices access, contact your account manager to configure SKU rate overrides.

Region and service mapping

CloudZero splits the Databricks sku_name_with_region field into separate Service and Region values:

Databricks sku_name_with_regionCloudZero ServiceCloudZero Region
ENTERPRISE_SERVERLESS_SQL_COMPUTE_US_EAST_N_VIRGINIAENTERPRISE_SERVERLESS_SQL_COMPUTEUS_EAST_N_VIRGINIA
INTER_AVAILABILITY_ZONE_EGRESSINTER_AVAILABILITY_ZONE_EGRESSNone

Grouping by Service in the Explorer combines costs across regions. Add a Region filter for region-specific breakdowns.

Maintenance and configuration

Fixed IP egress

If your Databricks account restricts access by IP address:

  1. Enable Use Fixed IP Egress in the CloudZero connection settings.
  2. In Databricks, go to Account Console > Settings > Security > IP Access List.
  3. Add 52.0.118.180 and 52.0.33.111.

If your organization also restricts IPs at the workspace level, add the same addresses to the workspace IP Access List.

ℹ️

Have questions or feedback? Reach out to your account manager.