Published on: March 25, 2025
4 min read
Learn how to create repeatable, auditable, and efficient processes for automating and securing BigQuery data exports.
GitLab offers a powerful solution for automating and securing BigQuery data exports. This integration transforms manual exports into repeatable, auditable processes that can eliminate security vulnerabilities while saving valuable time. This tutorial explains how to implement this solution so you can quickly reduce manual operations, permission issues, and security concerns with just a few lines of GitLab YAML code.
Follow along with this step-by-step video:
Our solution leverages GitLab CI/CD pipelines to automate the secure export of data from BigQuery to Google Cloud Storage. Here's the high-level architecture:
Before we begin, ensure you have:
1. Configure Google Cloud credentials.
First, set up the necessary environment variables in your GitLab project:
GCS_BUCKET
: Your Google Cloud Storage bucket nameGCP_PROJECT_ID
: Your Google Cloud project IDGCP_SA_KEY
: Base64-encoded service account key (mark as masked)2. Create your SQL query.
Create a file named query.sql
in your GitLab repository with your BigQuery SQL query. The query looks like this:
-- This query shows a list of the daily top Google Search terms.
SELECT
refresh_date AS Day,
term AS Top_Term,
-- These search terms are in the top 25 in the US each day.
rank,
FROM `bigquery-public-data.google_trends.top_terms`
WHERE
rank = 1
-- Choose only the top term each day.
AND refresh_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 2 WEEK)
-- Filter to the last 2 weeks.
GROUP BY Day, Top_Term, rank
ORDER BY Day DESC
-- Show the days in reverse chronological order.
This query gets the top 25 search terms from Google Trends for the current day.
3. Configure the GitLab CI/CD pipeline.
Create a .gitlab-ci.yml
file in your repository root:
image: google/cloud-sdk:alpine
include:
- template: Jobs/Secret-Detection.gitlab-ci.yml # https://gitlab.com/gitlab-org/gitlab/blob/master/lib/gitlab/ci/templates/Jobs/Secret-Detection.gitlab-ci.yml
execute:
stage: deploy
script:
# Set up Google Cloud authentication and install necessary components
- export GOOGLE_CLOUD_CREDENTIALS=$(echo $SERVICE_ACCOUNT_KEY | base64 -d)
- echo $GOOGLE_CLOUD_CREDENTIALS > service-account-key.json
- gcloud auth activate-service-account --key-file service-account-key.json
- gcloud components install gsutil
# Set the active Google Cloud project
- gcloud config set project $PROJECT_ID
# Run the BigQuery query and export the results to a CSV file
- bq query --format=csv --use_legacy_sql=false < test.sql > results.csv
# Create a Google Cloud Storage bucket if it doesn't exist
- gsutil ls gs://${CLOUD_STORAGE_BUCKET} || gsutil mb gs://${CLOUD_STORAGE_BUCKET}
# Upload the CSV file to the storage bucket
- gsutil cp results.csv gs://${CLOUD_STORAGE_BUCKET}/results.csv
# Set the access control list (ACL) to make the CSV file publicly readable
- gsutil acl ch -u AllUsers:R gs://${CLOUD_STORAGE_BUCKET}/results.csv
# Define the static URL for the CSV file
- export STATIC_URL="https://storage.googleapis.com/${CLOUD_STORAGE_BUCKET}/results.csv"
# Display the static URL for the CSV file
- echo "File URL = $STATIC_URL"
4. Run the pipeline.
Now, whenever changes are merged to your main branch, the pipeline will provide a link to the CSV file stored on the Google Cloud Storage bucket. This file contains the result of the executed SQL query that GitLab subjects to security checks.
By combining GitLab's DevSecOps capabilities with Google Cloud's BigQuery and Cloud Storage, you've now automated and secured your data publishing workflow. This approach reduces manual operations, resolves permission headaches, and addresses security concerns – all achieved with just a few lines of GitLab CI code.
Use this tutorial's complete code example to get started now.