Unlocking Nested Fields: Accessing STRUCT Data in BigQuery with Lightdash’s Additional Dimensions

Yu Ishikawa
5 min readFeb 15, 2024

Introduction

In the dynamic realm of data analytics, the ability to efficiently access and interpret complex data structures is crucial. BigQuery, Google’s enterprise data warehouse, offers robust capabilities for handling large datasets, notably through its STRUCT data type. This data type facilitates the storage of nested, hierarchical data, a common occurrence in contemporary data landscapes. However, navigating these nested fields can pose significant challenges, often leading to complexities in data analysis and reporting.

Lightdash, emerging as a formidable tool in the analytics toolkit, introduces an elegant solution to this conundrum with its additional_dimensions feature. This functionality allows analysts to define multiple dimensions from a single column, simplifying the process of accessing and analyzing nested fields within the STRUCT data type. The implication of this is profound, offering a streamlined approach to managing and interpreting nested data, a task that has traditionally been cumbersome and time-intensive.

This article aims to demystify the process of accessing nested fields in BigQuery’s STRUCT data type using Lightdash’s additional dimensions. We will explore the practicalities of implementing this feature, underscored by a step-by-step guide and real-world examples. Whether you’re a seasoned data professional or new to the field, the insights provided here will equip you with the knowledge to leverage Lightdash’s capabilities, enhancing your data analytics and reporting efficiency.

Current Limitations of Nested Fields in Lightdash

Lightdash, as a powerful and user-friendly analytics tool, aims to simplify the process of data analysis and reporting. However, like any tool, it has its limitations, particularly when dealing with nested fields in data structures like BigQuery’s STRUCT data type. Understanding these limitations is crucial for data analysts who seek to maximize the potential of their data.

One of the primary challenges in Lightdash pertains to the direct access and utilization of nested fields within a STRUCT column. Consider the following example:

columns:
- name: struct.id
meta:
dimension:
type: number

In this scenario, the intention is to access the id field nested within a STRUCT named struct. However, Lightdash does not inherently support this direct referencing of nested fields in its current configuration. As a result, this method yields no effective way to extract and analyze the nested data, leading to significant roadblocks in data processing.

This limitation poses a notable challenge. Nested fields are a common feature in modern data sets, especially those that include complex JSON objects or hierarchically structured data. The inability to directly access these nested elements means analysts might miss out on valuable insights hidden within these layers. Furthermore, it leads to a reliance on more complex and time-consuming methods to restructure or flatten the data outside of Lightdash, which can be inefficient and error-prone.

The good news is that Lightdash offers alternative ways to access these nested fields effectively, maintaining the integrity and depth of the data. The next section, “Additional Dimensions in Lightdash,” will delve into how Lightdash’s additional_dimensions feature provides a robust solution to this limitation, enabling users to unlock the full potential of their nested data structures.

Additional Dimensions in Lightdash

Overcoming the limitations of accessing nested fields in Lightdash is made possible through the innovative use of the additional_dimensions feature. This functionality provides a flexible and powerful means to define and access nested fields within a STRUCT type column, particularly when used in conjunction with the sql property. Let's explore how this feature can be effectively implemented, using the provided dbt model as our example.

Harnessing the Power of additional_dimensions

The additional_dimensions property in Lightdash allows us to define custom dimensions that can directly access nested fields within a STRUCT column. This is achieved by specifying a custom SQL expression for each nested field we want to expose as a dimension.

In the provided dbt model, we have a STRUCT column named struct_col which itself contains nested fields. The challenge is to access these nested fields like id and string_col at various levels of nesting. By utilizing the additional_dimensions property, we can create separate, easily accessible dimensions for each of these nested fields.

Example Implementation

In the dbt model:

WITH test_additional_dimensions AS (
SELECT
1 AS id,
"x" AS string_col,
STRUCT(
1 AS id,
"x" AS string_col,
STRUCT(
1 AS id,
"x" AS string_col
) AS struct_col
) AS struct_col
)

SELECT
*,
FROM test_additional_dimensions

We have a nested structure within struct_col. To access these nested fields in Lightdash, the dbt schema YAML is defined as follows:

version: 2

models:
- name: test_additional_dimensions
columns:
- name: id
meta:
dimension:
type: number
- name: string_col
meta:
dimension:
type: string
- name: struct_col
meta:
dimension:
type: string
sql: "TO_JSON_STRING(${TABLE}.struct_col)"
hidden: true
additional_dimensions:
struct_col__id:
type: number
sql: "${TABLE}.struct_col.id"
struct_col__string_col:
type: string
sql: "${TABLE}.struct_col.string_col"
struct_col__struct_col__id:
type: number
sql: "${TABLE}.struct_col.struct_col.id"
struct_col__struct_col__string_col:
type: string
sql: "${TABLE}.struct_col.struct_col.string_col"

I made sure it is possible to use the additional dimensions of the nested fields on Lightdash.

Practical Benefits

This configuration allows Lightdash users to easily query and analyze data from nested fields. Each additional_dimension acts like a bridge, connecting the user directly to the nested data points, bypassing the complexity of navigating through the STRUCT layers. This setup enhances the flexibility and depth of data analysis, allowing for more nuanced and detailed insights. It also significantly simplifies the SQL queries required to access this data, improving both efficiency and readability.

In conclusion, additional_dimensions in Lightdash offers a practical and efficient solution to the challenge of accessing nested fields within STRUCT data types. By leveraging this feature, along with the strategic use of the sql property, data analysts can unlock the full potential of their complex datasets, leading to more comprehensive and insightful data analysis.

Conclusion

The exploration of utilizing Lightdash’s additional_dimensions to access nested fields in BigQuery's STRUCT data type highlights a significant stride in data analytics. This approach, while addressing the limitations of direct access, not only simplifies the querying process but also enriches data analysis. It exemplifies how adaptability and innovative features in analytics tools can transform complex data challenges into insightful opportunities.

Looking ahead, there is an anticipation that Lightdash will further evolve to support nested fields by default, augmenting its already robust capabilities. Such advancements would streamline data analysis even more, allowing users to delve deeper into their data with greater ease and efficiency. The integration of these features would mark a notable milestone in making complex data structures more accessible and insightful for analysts and decision-makers.

--

--

Yu Ishikawa

Data Engineering / Machine Learning / MLOps / Data Governance / Privacy Engineering