Bigquery utils
Useful scripts, udfs, views, and other utilities for migration and data warehouse operations in BigQuery.
BigQuery is a serverless, highly-scalable, and cost-effective cloud data warehouse with an in-memory BI Engine and machine learning built in. This repository provides useful utilities to assist you in migration and usage of BigQuery. The project is written primarily in Jupyter Notebook, distributed under the Apache License 2.0 license, first published in 2019. It has gained significant community traction with 1,303 stars and 331 forks on GitHub. Key topics include: bigquery, data-warehouse, google-cloud-platform, sql, utilities.
BigQuery Utils
BigQuery is a serverless, highly-scalable, and cost-effective cloud data
warehouse with an in-memory BI Engine and machine learning built in. This
repository provides useful utilities to assist you in migration and usage of
BigQuery.
Getting Started
This repository is broken up into:
- Dashboards - Pre-built dashboards for common use cases
- System Tables - Looker Studio Dashboards built on BigQuery's INFORMATION_SCHEMA metadata views to understand organization's slot and reservation utilization, job execution, and job errors.
- Notebooks - Colab notebooks for various BigQuery related use cases.
- BigQuery Frequent Items Sketches Demo - Calculating TopN file extensions in Git repos from public Github Dataset.
- BigQuery KLL Sketches Demo - Calculating Quantiles over user defined time-windows.
- BigQuery Theta Sketches Demo - Calculating Distinct User Logins over different months.
- Geospatial - Colab notebooks for Geospatial Analytics.
- Performance Testing - Examples for doing performance testing
- JMeter - Examples for using JMeter to test BigQuery performance
- Scripts - Python, Shell, & SQL scripts
- billing - Example queries over the GCP billing
export - optimization - Scripts to help identify areas for optimization in your BigQuery warehouse.
- billing - Example queries over the GCP billing
- Stored Procedures - Example stored procedures
- Third Party - Relevant third party libraries for BigQuery
- compilerworks - BigQuery UDFs which mimic the behavior of proprietary functions in other databases
- Tools - Custom tooling for working with BigQuery
- Cloud Functions - Cloud Functions to automate common use cases
- UDFs - User-defined functions for common usage as well as migration
- community - Community contributed user-defined
functions - datasketches - UDFs deployed from the latest release of Apache Datasketches for BigQuery
- migration - UDFs which mimic the behavior of
proprietary functions in the following databases:
- community - Community contributed user-defined
- Views - Views over system tables such as audit logs or the
INFORMATION_SCHEMA- query_audit - View to simplify querying
the audit logs which can be used to power dashboards
(example).
- query_audit - View to simplify querying
Public UDFs
For more information on UDFs and using those provided in the repository with
BigQuery, see the README in the udfs folder.
Contributing
See the contributing instructions to get started
contributing.
To contribute UDFs to this repository, see the
instructions in the udfs folder.
License
Except as otherwise noted, the solutions within this repository are provided under the
Apache 2.0 license. Please see
the LICENSE file for more detailed terms and conditions.
Disclaimer
This repository and its contents are not an official Google Product.
Contributors
Showing top 12 contributors by commit count.
