Connections are how CloudZero manages the various Cost Sources that bring Billing, Resource, and other types of data into the platform.

How the Databricks Connection Works

The CloudZero Databricks connection utilizes API access to a single Databricks workspace to gather consumption and pricing data for all workspaces in the account by querying the Billable Usage, Pricing and compute system tables.

This billing connection is needed for Databricks on AWS purchased directly from Databricks. CloudZero gathers usage and cost information from Databricks purchased through the Azure or GCP Marketplace directly from the billing connections to those cloud providers.

Connection Prerequisites

Overview

Unity Catalog

To access Databricks system tables, you must have a workspace enabled for Unity Catalog. Databricks Unity Catalog

Billing and Compute Schemas

The system.billing and system.compute "system schemas" must then be enabled in that workspace.

Service Principal

CloudZero requires credentials for a Databricks Service principal that can access tables in those schemas in that workspace. It is recommended you create a new service principal with narrowly scoped permissions for this purpose and there are instructions for that below.

Warehouse

CloudZero also requires the warehouse id of a warehouse to use while querying billing and usage information. CloudZero does not require a dedicated warehouse.

📘
Note: Warehouse sizing
If creating a new warehouse for CloudZero billing queries we recommend specifying a Serverless warehouse with the lowest Auto Stop, Scaling and Cluster Size settings possible.

Enabling the Billing and Compute Schemas

The goal is to make available the billing and usage data CloudZero needs to query. The data will be made available in the Unity Catalog enabled workspace identified in the pre-requisites section. This can be done through the Databricks CLI.

Installing Databricks CLI

If you've never used it before, Download and install the Databricks CLI.
You can set up a Databricks CLI profile that connects to your account with the command:

databricks auth login

This command will prompt for 3 pieces of information

Databricks Profile Name: account
Databricks Host: https://accounts.cloud.databricks.com
Databricks Account ID: <Account ID> Locate your account ID

Commands to Enable

To get the Metastore-ID first make sure you have the ID of the workspace. You can see all the workspaces with:

databricks account workspaces list

Then you can list what metastores are available to that workspace:

databricks account metastore-assignments get <workspace-id>

Once you have the Metastore-ID you can enable the system-schemas for that metastore.

databricks system-schemas enable <METASTORE-ID> compute
databricks system-schemas enable <METASTORE-ID> billing

Additional Databricks documentation about system tables can be found here

Configuring a Databricks Service Principal

This section will give us 3 pieces of information needed to create the Databricks connection:

databricks host: Url for the workspace
client id: UUID for the service principal
client secret: secret so CloudZero may use Databricks API as the service principal

Creating the Service principal and secret

Log into the Databricks account console and navigate to "User Management" https://accounts.cloud.databricks.com/users
Click on "Service principals"
Click "Add Service principal"
Fill out a name and click "Add".
Click on the new principal in list of service principals
Click on Generate Secret. Note the Secret and Client ID for later. (the Client ID is the UUID for the service principal and can always be viewed)

Note: You must be sure to generate an OAuth secret in order for the Service principal to function correctly. For more information, refer to Databricks authorization methods.

Giving Service Principal access to the workspace

Log into the Databricks account console and Navigate to "Workspaces" https://accounts.cloud.databricks.com/workspaces
Find the workspace which has the billing and compute schemas enabled and click on the kebab on the far right to "Update" it
Click on permissions then "Add permissions"
You can add the Service principal by its Client ID (UUID guid). It only needs "User" permissions in the workspace.

Ensure Service Principal has warehouse access

Log into the workspace
Select the warehouse provided in the connection configuration
Click "Permissions" (You must have "admin" access to the workspace to view this)
Ensure the service principal has the "Can Use" permission (If you enabled after it was previously disabled, it may take a while for the Databricks connection to read from the warehouse)

Ensure Service Principal has sql access

Follow the Databricks documentation to find entitlement management for the workspace
Ensure the Service Principal has the "Databricks SQL access" entitlement enabled.
Alternatively you can manage the service principal's entitlements via its group membership.

Give Service Principal access to the system tables

Log into the workspace
Open an SQL Editor and issue the following commands

GRANT USE SCHEMA ON SCHEMA system.compute TO `<service principal client id>`;
GRANT SELECT ON TABLE system.compute.clusters TO `<service principal client id>`;
GRANT USE SCHEMA ON SCHEMA system.billing TO `<service principal client id>`;
GRANT SELECT ON TABLE system.billing.list_prices TO `<service principal client id>`;
GRANT SELECT ON TABLE system.billing.usage TO `<service principal client id>`;

The service principal now has permission to query tables in the compute and billing schemas.

Create a Databricks Connection

Open the Connections page

This can be found by selecting the gear icon from the sidebar and selecting Connections, or alternatively going to https://app.cloudzero.com/organization/connections
CloudZero Connections

Navigate to the Databricks Connection Creation page

Select the Add New Connection button. Then select the Databricks tile.
Databricks Connection Tile

Connection Metadata

Connection Name: A connection name that will appear in the CloudZero UI
Billing Account ID: Your Databricks parent account ID (Locate your account ID)
Workspace URL: URL to access the workspace configured the billing and compute system schemas.
Warehouse ID: ID of the warehouse to use to query for billing and usage information.
Client ID: ID of the service principal created for CloudZero to access billing and compute data.
Client Secret: Secret for that service principal.
Use Fixed IP Egress: Enable to use Databricks's fixed IP egress functionality. See the below.

Save the Connection

Select the Save button. You will be redirected back to the Connection Details page in the CloudZero platform, where you should see your newly created connection.

Databricks Connection Notes

Billing Period Ingest Windows

Newly Created Connection: CloudZero will ingest the most recent 12 months worth of billing periods if available.
Re-enabled Connection: CloudZero will attempt to ingest up to 24 months of billing periods starting from the current billing period and going back to the most recent billing period ingested.
Steady State: CloudZero will ingest the current billing period and the previous billing period if it is likely to have changed.

Tag Prefix

Some information from the Databricks platform will be provided in CloudZero as tags with a prefix of dbx_cz. For example, cluster name is available when you use the CloudZero tag dbx_cz:cluster_name.

Customer-created tags will be passed through exactly as they appear in Databricks.

A list of Databricks information that can be assigned a dbx_cz tag follows:

cluster_name
cluster_id
cluster_source
dbr_version
dlt_pipeline_id
driver_instance_pool_id
driver_node_type
instance_pool_id
job_id
job_run_id
notebook_id
owned_by
warehouse_id
worker_instance_pool_id
worker_node_type
workspace_id

Multiple Workspaces in an Account

Access to one workspace as described in this document will provide CloudZero with data for all spend associated with the Databricks account. It is not necessary to set up a connection for each workspace.

Default Pricing

The current version of the Databricks cost adaptor currently uses default pricing for Databricks SKUs.

Overrides for SKU rates can be configured upon request.

Fixed IP Egress

To establish a Databricks connection, you must provide a Client ID and a Client Secret. Databricks allows access to be restricted to specific IP addresses, which can be configured as follows:

Enable Use Fixed IP Address for the CloudZero Managed Databricks Connection.
In your Databricks account, navigate to Account Console > Settings > Security tab > IP Access List.
Add a rule that allows the following IP addresses: 52.0.118.180, 52.0.33.111

Connecting to Databricks

How the Databricks Connection Works

Connection Prerequisites

Overview

Unity Catalog

Billing and Compute Schemas

Service Principal

Warehouse

📘
Note: Warehouse sizing

Enabling the Billing and Compute Schemas

Installing Databricks CLI

Commands to Enable

Configuring a Databricks Service Principal

Creating the Service principal and secret

Giving Service Principal access to the workspace

Ensure Service Principal has warehouse access

Ensure Service Principal has sql access

Give Service Principal access to the system tables

Create a Databricks Connection

Open the Connections page

Navigate to the Databricks Connection Creation page

Connection Metadata

Save the Connection

Databricks Connection Notes

Billing Period Ingest Windows

Tag Prefix

Multiple Workspaces in an Account

Default Pricing

Fixed IP Egress

How the Databricks Connection Works

Connection Prerequisites

Overview

Unity Catalog

Billing and Compute Schemas

Service Principal

Warehouse

📘Note: Warehouse sizing

Enabling the Billing and Compute Schemas

Installing Databricks CLI

Commands to Enable

Configuring a Databricks Service Principal

Creating the Service principal and secret

Giving Service Principal access to the workspace

Ensure Service Principal has warehouse access

Ensure Service Principal has sql access

Give Service Principal access to the system tables

Create a Databricks Connection

Open the Connections page

Navigate to the Databricks Connection Creation page

Connection Metadata

Save the Connection

Databricks Connection Notes

Billing Period Ingest Windows

Tag Prefix

Multiple Workspaces in an Account

Default Pricing

Fixed IP Egress

📘
Note: Warehouse sizing