Connect via S3 Bucket
The S3 Bucket method uploads cost data as CSV files to an AWS S3 bucket that CloudZero reads from. This method is available for organizations that need to transfer large volumes of cost data or prefer file-based delivery through AWS.
This guide covers setting up the S3 bucket, creating the connection in CloudZero, and writing the Adaptor that converts and uploads your data. Each upload is called a drop.
What you need
- Permission to create connections in CloudZero
- An existing AWS connection in CloudZero for the AWS account where the bucket will live
- An AWS account with permission to create S3 buckets and manage IAM policies
Step 1: Create an S3 bucket
- Create an S3 bucket in your AWS account. See the AWS documentation.
- Create a key prefix (displayed as a folder in the S3 console) inside the bucket for your cost data (for example,
simple-cloud-cost-data). The prefix can be nested (for example,cloudzero/simple-cloud-cost-data).
A key prefix is required. Cost data cannot be stored at the root of the bucket.
You will need the Bucket Name and Bucket Path (the key prefix) in Step 2.
Step 2: Create the connection in CloudZero
- In CloudZero, go to Settings > Cloud Connections.
- Select Create Connection +.
- Select Custom, then select the S3 Bucket tile.
- Enter a Connection Name to identify the billing source. This is typically the main account or billing account that the service sends the bill from (for example,
simple-cloud-main). The name cannot contain spaces, periods, or special characters except hyphens and underscores. - Enter the Cloud Provider name. This label appears in the Cloud Provider Dimension and throughout the CloudZero UI (for example,
Simple Cloud). - (Optional) Enter the Health Check Interval, in Hours (whole number). This sets how long CloudZero waits between drops before flagging the connection as unhealthy. For recommended values, see Monitor connection status with Health Check Interval.
- Select the AWS Connection that corresponds to the AWS account where your S3 bucket lives. This must be an account already connected to CloudZero.
- Enter the Bucket Name from Step 1.
- Enter the Bucket Path (the key prefix from Step 1). Do not include leading or trailing slashes (for example,
my/file/path). - Select Generate IAM Policy. This generates a policy definition using your bucket name.
- Attach the generated policy to the CloudZero resource owner IAM role in your AWS account (named similar to
cloudzero-connected-account-live-ResourceOwner-Role-<ID>). This grants CloudZero read-only access to your bucket (s3:Get*ands3:List*). To attach a policy, see the AWS documentation.
If your bucket is in a region where STS is not active by default (such as
ap-east-1oreu-south-1), activate that region first. See Managing AWS STS in an AWS Region.
- Select Save.
Your connection appears in the Billing Connections table with a status of Pending Data until CloudZero receives and processes your first drop.
Step 3: Write your Adaptor
An Adaptor is a script that retrieves cost data from your source, converts it to Common Bill Format (CBF), and uploads it to your S3 bucket. You can write it in any programming language.
Your Adaptor needs to handle five things:
-
Identify your source's cost data. Refer to your vendor's documentation or support to find where they provide billing or usage data. You need either a total cost for a given period, or usage data you can convert into a cost based on your rate.
-
Identify how to retrieve that data. Determine how your source lets you export or access the data: an API, a CSV download, a database query, or another method. This determines how your script retrieves data for each billing month. Your Adaptor should also support reprocessing previous months, since your source can retroactively adjust billing data through reconciliations, credits, or corrections.
-
Convert the data to Common Bill Format as CSV. Map each record from your source's format to CBF fields and write the output as a gzipped CSV file (
.csv.gz), encoded in UTF-8:- At minimum, each record needs a cost amount (
cost/cost) and a date (time/usage_start). - Add context fields (service, account, region) for richer breakdowns in CloudZero.
- Preserve the granularity of your source data. If your source provides hourly data, do not aggregate it to daily.
- Amortize costs to hourly granularity when possible. Data is viewable in CloudZero at the lowest granularity you provide.
- Include all charge types your source provides: usage, taxes, discounts, and committed use charges.
- Use appropriate numeric data types for cost values (for example,
decimal.Decimalinstead offloatin Python). - Maximum of 1 million rows per CSV file. For larger data sets, split across multiple files in the same drop.
For example, a CBF CSV file looks like this:
lineitem/type,resource/service,resource/id,time/usage_start,cost/cost,resource/account Usage,Compute,instance-0000,2024-08-16T13:00:00Z,12,prod-001 Usage,Compute,instance-0001,2024-08-16T13:00:00Z,20,prod-001 Usage,Storage,bucket-main,2024-08-16T13:00:00Z,5.30,prod-001 - At minimum, each record needs a cost amount (
-
Create or update the manifest file. The manifest tells CloudZero which drop to read. Your Adaptor should generate or update this file with each drop. See The manifest file for the format.
-
Upload to your S3 bucket. Upload the CSV files and manifest to your S3 bucket following this path structure:
<bucket_path>/<billing_data_id>/<drop_id>/<data_file>.csv.gz <bucket_path>/<billing_data_id>/manifest.jsonComponent Description bucket_pathThe Bucket Path from Step 1. billing_data_idA single month, formatted as YYYYMMDD-YYYYMMDD. For example, August 2024 is20240801-20240901.drop_idA unique identifier for this drop. Best practice is a timestamp without special characters (for example, 20240816T150000Z).data_file.csv.gzOne or more gzipped CBF CSV files. For example, here is what a complete bucket structure looks like for a source called "Simple Cloud" with two drops for March 2022:
simple-cloud-cost-data/ 20220301-20220401/ 20220314T100216Z/ data_export-0001.csv.gz 20220317T171218Z/ data_export-0001.csv.gz manifest.jsonA downloadable version of this example is available at Simple Cloud billing example.
If your source provides only incremental data (for example, just today's records), your Adaptor must accumulate all previous records for the month and include them in every drop. CloudZero replaces all existing data for a billing period each time the manifest is updated.
The manifest file
After each upload, your Adaptor updates manifest.json to point to the new drop. This is the only action that triggers CloudZero to ingest your data.
{
"version": "1.3.0",
"current_drop_id": "20240816T150000Z"
}For details on how the manifest works, delivery patterns, and managing previous drops, see S3 Bucket Delivery Reference.
What to expect
After your Adaptor uploads its first drop, CloudZero validates and processes the data. The connection status changes from Pending Data to Healthy, and your cost data appears in the Explorer. This can take up to 24 hours.
If CloudZero cannot connect to the bucket or your Adaptor is not writing data correctly, the connection status updates with details about the issue.
Maintaining your data
Each drop for a billing period replaces all previous data for that period. To update, correct, or clear your data, see S3 Bucket Delivery Reference for the full replacement model, managing old drops, and clearing a billing period.
Have questions or feedback? Reach out to your account manager.
Updated 2 days ago
