Connecting to Databricks
Connections are how CloudZero manages the various Cost Sources that bring Billing, Resource, and other types of data into the platform.
How the Databricks Connection Works
The CloudZero Databricks connection uses API access to a single Databricks workspace to gather consumption and pricing data for all workspaces in the account by querying the Billable Usage, Pricing and compute system tables.
This billing connection is needed for Databricks on AWS purchased directly from Databricks. CloudZero gathers usage and cost information from Databricks purchased through the Azure or GCP Marketplace directly from the billing connections to those cloud providers.
Connection Prerequisites
Databricks settings
Unity Catalog
To access Databricks system tables, you must have a workspace enabled for Unity Catalog. For details, see Databricks Unity Catalog
Billing and Compute Schemas
The system.billing
and system.compute
"system schemas" must then be enabled in that workspace.
Service Principal
CloudZero requires credentials for a Databricks service principal that can access tables in those schemas in that workspace. CloudZero recommends you create a new service principal with narrowly scoped permissions for this purpose, as explained further on in these instructions.
Warehouse
CloudZero requires the warehouse id of a warehouse to use while querying billing and usage information. CloudZero does not require a dedicated warehouse.
If you are creating a new warehouse for CloudZero billing queries, CloudZero recommends specifying a serverless warehouse with the lowest Auto Stop, Scaling, and Cluster Size settings possible.
Enabling the Billing and Compute Schemas
The goal is to make available the billing and usage data CloudZero needs to query. The data will be made available in the Unity Catalog enabled workspace identified in the pre-requisites section. This can be done through the Databricks CLI.
Installing Databricks CLI
If you have not used it before, Download and install the Databricks CLI.
You can set up a Databricks CLI profile that connects to your account with the command databricks auth login
.
This command prompts for the following information:
-
Databricks Profile Name:
account
-
Databricks Host:
https://accounts.cloud.databricks.com
-
Databricks Account ID:
<Account ID>
Locate your account ID
Commands to Enable
To get the Metastore-ID first, make sure you have the ID of the workspace. Use the following command to see all the workspaces: databricks account workspaces list
.
Then you can list what metastores are available to that workspace: databricks account metastore-assignments get <workspace-id>
.
When you have the Metastore-ID you can enable the system-schemas for that metastore:
databricks system-schemas enable <METASTORE-ID> compute
databricks system-schemas enable <METASTORE-ID> billing
For more information about system tables, see the Databricks documentation.
Configuring a Databricks Service Principal
You must have the following information to create the Databricks connection:
- databricks host: Url for the workspace
- client id: UUID for the service principal
- client secret: secret so CloudZero may use Databricks API as the service principal
Create the Service Principal and Secret
- Log into the Databricks account console and navigate to User Management.https://accounts.cloud.databricks.com/users
- Click Service principals.
- Click Add Service principal.
- Enter a name and click Add.
- Click the new Principal in the list of Service Principals
- Click Generate Secret. Note the Secret and Client ID for later. You can always view the Client ID, which is the UUID for the service principal.
You must be sure to generate an OAuth secret in order for the Service principal to function correctly. For more information, refer to Databricks authorization methods.
Give the Service Principal access to the workspace
- Log in to the Databricks account console and navigate to Workspaces https://accounts.cloud.databricks.com/workspaces.
- Find the workspace that has the billing and compute schemas enabled and click on the kebab on the far right to Update.
- Click Permissions > Add permissions.
- Add the Service Principal by its Client ID (UUID guid). It needs only
User
permissions in the workspace.
Ensure the Service Principal has warehouse access
- Log int o the workspace.
- Select the warehouse provided in the connection configuration.
- Click Permissions. You must have
admin
access to the workspace to see this. - Ensure the Service Principal has the
Can Use
permission. If you enabled the permission after it was previously disabled, it may take a while for the Databricks connection to read from the warehouse.
Ensure the Service Principal has sql access
- Follow the Databricks documentation to find entitlement management for the workspace
- Ensure the Service Principal has the
Databricks SQL access
entitlement enabled. Alternatively, you can manage the Service Principal entitlements through its group membership.
Give the Service Principal access to the system tables
- Log in to the workspace.
- Open an SQL editor and issue the following commands:
GRANT USE SCHEMA ON SCHEMA system.compute TO `<service principal client id>`;
GRANT SELECT ON TABLE system.compute.clusters TO `<service principal client id>`;
GRANT USE SCHEMA ON SCHEMA system.billing TO `<service principal client id>`;
GRANT SELECT ON TABLE system.billing.list_prices TO `<service principal client id>`;
GRANT SELECT ON TABLE system.billing.usage TO `<service principal client id>`;
The Service Principal now has permission to query tables in the compute and billing schemas.
Create a Databricks Connection
Select the gear icon from the sidebar and select Connections or navigate to https://app.cloudzero.com/organization/connections.

On the Databricks Connection Creation page, select the Add New Connection button. Then select the Databricks tile.

Enter the connection metadata:
- Connection Name: A connection name that will appear in the CloudZero UI.
- Billing Account ID: Your Databricks parent account ID (Locate your account ID).
- Workspace URL: The URL to access the workspace where you have enabled the billing and compute system schemas that CloudZero will use to pull your cost and usage data.
- Warehouse ID: ID of the warehouse to use to query for billing and usage information.
- Client ID: ID of the Service Principal created for CloudZero to access billing and compute data.
- Client Secret: Secret for that Service Principal.
- Use Fixed IP Egress: Enable to use Databricks fixed IP egress functionality. See the Fixed IP Egress section.
To save the connection, select the Save button. You will return to the Connection Details page in the CloudZero platform, where you should see your newly created connection.
Databricks Connection Notes
Billing Period Ingest Windows
- Newly Created Connection: CloudZero will ingest the most recent 12 months of billing periods if available.
- Re-enabled Connection: CloudZero will attempt to ingest up to 24 months of billing periods starting from the current billing period and going back to the most recent billing period ingested.
- Steady State: CloudZero will ingest the current billing period and the previous billing period if it is likely to have changed.
Tag Prefix
Some information from the Databricks platform will be provided in CloudZero as tags with a prefix of dbx_cz
. For example, cluster name is available when you use the CloudZero tag dbx_cz:cluster_name
.
Customer-created tags will be passed through exactly as they appear in Databricks.
A list of Databricks information that can be assigned a dbx_cz tag
follows:
- cluster_name
- cluster_id
- cluster_source
- dbr_version
- dlt_pipeline_id
- driver_instance_pool_id
- driver_node_type
- instance_pool_id
- job_id
- job_run_id
- notebook_id
- owned_by
- warehouse_id
- worker_instance_pool_id
- worker_node_type
- workspace_id
Multiple Workspaces in an Account
Access to one workspace as described in this document will provide CloudZero with data for all spend associated with the Databricks account. It is not necessary to set up a connection for each workspace.
Default Pricing
The Databricks cost adaptor uses default pricing for Databricks SKUs.
Overrides for SKU rates can be configured upon request.
Fixed IP Egress
Databricks allows access to be restricted to specific IP addresses at both the account and workspace level. If your organization restricts IP access at the account level, you can configure access as follows:
- Enable Use Fixed IP Address for the CloudZero managed Databricks connection.
- In your Databricks account, navigate to Account Console > Settings > Security tab > IP Access List.
- Add a rule that allows the following IP addresses:
52.0.118.180, 52.0.33.111
If your organization also restricts access to specific IP addresses at the workspace level, you must add the same IP addresses to the workspace IP Access List.
Databricks Region and Service Details
You may see a discrepancy between raw data in Databricks and CloudZero data in the Explorer.
This happens because when Databricks data is ingested into CloudZero, the sku_name_with_region
field is split into two separate fields, one for the SKU name, displayed as Service in CloudZero, and one for the Region.
For example:
sku_name_with_region = ENTERPRISE_SERVERLESS_SQL_COMPUTE_US_EAST_N_VIRGINIA
is split into
Service = ENTERPRISE_SERVERLESS_SQL_COMPUTE
and Region = US_EAST_N_VIRGINIA
.
This results in slightly different behavior when you filter and group spend in CloudZero compared to Databricks.
For example, in Databricks, entries like ENTERPRISE_SERVERLESS_SQL_COMPUTE_US_WEST_OREGON
and ENTERPRISE_SERVERLESS_SQL_COMPUTE_US_EAST_N_VIRGINIA
appear as separate entities. In CloudZero, grouping by Service combines costs from different regions, such as US-West and US-East into one service titled ENTERPRISE_SERVERLESS_SQL_COMPUTE
.
To replicate a Databricks view in CloudZero, you can group by service and add a filter for region.
Note that some Databricks SKUs do not contain any region information, for example: INTER_AVAILABILITY_ZONE_EGRESS
. In these cases, the Region in CloudZero will be set to None.
Updated about 11 hours ago