Connecting to Databricks

The CloudZero Databricks Adaptor

The CloudZero Databricks adaptor is containerized software that can be run from any container hosting platform. The adaptor will pull your Databricks spend data, convert it to the CloudZero (Common Bill Format (CBF)), and create the data drop that can then be ingested and processed by the CloudZero platform.

How to setup the Databricks Adaptor

Step 1: Setup your ECR Storage

You will need to pull the CloudZero image into your own ECR instance. To do this, follow the steps below.

  1. Upload and register the CloudZero Databricks Adaptor for Lambda container to your AWS ECR storage.

For more information, refer to the Push your image to Amazon Elastic Container Registry section of this AWS document.

  1. Copy the URI of the ECR container instance you created for use in later setup.

Step 2: Setup your billable usage logs delivery

In order for the adaptor to be able to access the data it needs from your Databricks instance, you will need to configure your billable usage logs delivery.

Refer to the Databricks billable usage logs delivery documentation to export your logs into the appropriate S3 bucket for your adaptor to consume and convert to the CloudZero format.

📘

Please Note

This should be a different S3 path than the one you are configuring to store your adaptor data exports.

Step 3: Setup the AWS Parameter Store

Your adaptor config settings will be using the AWS System Manager Parameter Store.

  1. Setup a Parameter Store for your CloudZero Databricks adaptor configuration values. You will need to add the parameters in the table below. You can use the encrypted storage option for any of the values as necessary.

    Please Note: When adding these parameters, ensure they are all located on the same path, such as /CloudZero/Databricks_Adaptor/, and make note of that path for use later in the setup process.

    ConfigurationDescription
    S3_BUCKETThis is the name of the S3 bucket you will be creating in Step 5 below.
    S3_BUCKET_FOLDERThis is the location of the S3 bucket you will be creating in Step 5 below.
    DATABRICKS_USAGE_LOGS_S3_BUCKETThis is the bucket where your Databricks billable usage logs will be exported.
    STANDARD_ALL_PURPOSE_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.40 which is the standard.
    STANDARD_ALL_PURPOSE_COMPUTE_(DLT)This should match your Databricks pricing model. If this is not known, use 0.40 which is the standard.
    STANDARD_ALL_PURPOSE_COMPUTE_(PHOTON)This should match your Databricks pricing model. If this is not known, use 0.40 which is the standard.
    STANDARD_JOBS_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.10 which is the standard.
    STANDARD_JOBS_COMPUTE_(PHOTON)This should match your Databricks pricing model. If this is not known, use 0.10 which is the standard.
    STANDARD_JOBS_LIGHT_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.07 which is the standard.
    PREMIUM_ALL_PURPOSE_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.55 which is the standard.
    PREMIUM_ALL_PURPOSE_COMPUTE_(DLT)This should match your Databricks pricing model. If this is not known, use 0.55 which is the standard.
    PREMIUM_ALL_PURPOSE_COMPUTE_(PHOTON)This should match your Databricks pricing model. If this is not known, use 0.55 which is the standard.
    PREMIUM_JOBS_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.15 which is the standard.
    PREMIUM_JOBS_COMPUTE_(PHOTON)This should match your Databricks pricing model. If this is not known, use 0.15 which is the standard.
    PREMIUM_JOBS_LIGHT_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.10 which is the standard.
    PREMIUM_SQL_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.22 which is the standard.
    PREMIUM_SERVERLESS_SQL_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.55 which is the standard.
    ENTERPRISE_ALL_PURPOSE_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.65 which is the standard.
    ENTERPRISE_ALL_PURPOSE_COMPUTE_(DLT)This should match your Databricks pricing model. If this is not known, use 0.65 which is the standard.
    ENTERPRISE_ALL_PURPOSE_COMPUTE_(PHOTON)This should match your Databricks pricing model. If this is not known, use 0.65 which is the standard.
    ENTERPRISE_JOBS_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.20 which is the standard.
    ENTERPRISE_JOBS_COMPUTE_(PHOTON)This should match your Databricks pricing model. If this is not known, use 0.20 which is the standard.
    ENTERPRISE_JOBS_LIGHT_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.13 which is the standard.
    ENTERPRISE_SQL_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.22 which is the standard.
    ENTERPRISE_SERVERLESS_SQL_COMPUTEThis should match your Databricks pricing model. If this is not known, use 0.55 which is the standard.

Step 4: Create and configure your AWS Lambda

You need to create and configure the Lambda function that will execute the adaptor from your ECR instance.

To do this, follow the steps below.

  1. Create an AWS Lambda using the Container Image template.
  2. Input the URI of your ECR instance created in Step 1.
  3. By default, the Lambda creation process will create a default execution role. We will use this role, so note its ID.
  4. In the Configuration tab of your Lambda settings, edit your General Configuration section.
  5. Set your Timeout value to 15 minutes and Memory to 4GB.
  6. Edit your Environment Variables section.
  7. Add the variable SSM_PARAMETER_STORE_FOLDER_PATH with the value of the location of your configuration variables in AWS Parameter Store. Be sure to include leading and trailing slashes (i.e., /CloudZero/Databricks_Adaptor/).
  8. Access the execution role auto created with the Lambda, and add the following policies.
"ssm:GetParametersByPath",
"ssm:GetParameters",
"ssm:GetParameter",
"kms:Decrypt"

Step 5: Setup your AWS S3 bucket

You will need to setup an S3 bucket where your adaptor will drop its files, and where the CloudZero platform will pull them.

To do this, follow the steps below.

  1. Create an AWS S3 bucket where the files will be stored.
  2. Grant the Lambda execution role that was auto created full access rights to this S3 bucket. For more information, see the AWS S3 User Policy Examples documentation.
  3. Update your S3_BUCKET and S3_BUCKET_FOLDER values in your AWS Parameter Store (Step 3) to match this S3 bucket.

Step 6: Schedule your Databricks adaptor

Once all the parts are in place, follow the steps below to setup the run schedule of your Databricks Adaptor.

  1. Access your Lambda settings, and select the Add Trigger button.
  2. Choose the EventBridge (CloudWatch Events) trigger type, and select Create New Rule.
  3. Set the Schedule Expression to run on a regular cadence. We recommend coordinating this run on the same cadence as your Databricks Billable usage logs delivery for optimal efficiency.
  4. Once added, your Lambda will begin executing at the set time, and you will begin seeing data drops in the S3 bucket you configured.

Step 7: Create a CloudZero billing connection

Once your files are successfully dropping to your S3 bucket, you will need to setup a CloudZero custom connection to begin ingesting this data into the platform.

For more information on how to do this, see Connecting Custom Data from AnyCost Adaptors.