Lightdash at Ubie Part 2: Governance at Scale

Yu Ishikawa
12 min readOct 19, 2023

This is part 2 of the blog post series about Lightdash at Ubie. Ubie automatically generates medical records using an AI-powered patient questionnaire that helps save time and provide better patient care. As you can imagine, data engineering, data management and data governance are very significant to build the high-quality AI-powered AI system.

Brief Recap of Part 1

In the first part of this series, we introduced the powerful analytics capabilities of Lightdash. We discussed how Lightdash, when integrated with dbt, offers a robust solution for Business Intelligence (BI). We also covered the initial setup of Lightdash projects and the use of GitHub Actions for deploying dbt models to Lightdash.

Objectives of Part 2

As we move into the second part of this series, our focus will shift to some of the more nuanced aspects of Lightdash. Specifically, we aim to cover:

  1. Challenges in BI Tool Governance: We’ll delve into the complexities of managing data models and permissions in a BI environment, focusing on the need for seamless integration between data warehouses and BI tools.
  2. Deploying Lightdash Projects with Confidence: We’ll explore how to use the Lightdash CLI in conjunction with GitHub Actions to ensure smooth and reliable deployments.
  3. Access Control in Lightdash: This section will discuss the structure of Lightdash projects and how to manage permissions and resources at different levels — organization, project, and space.
  4. Lightdash APIs and Terraform: We’ll take an in-depth look at how to programmatically manage Lightdash resources and permissions using its APIs and a custom Terraform provider.

By the end of this article, you should have a well-rounded understanding of how to deploy and manage Lightdash projects confidently, with a focus on fine-grained access control and resource management with terraform.

Challenges in BI Tool Governance

Managing Business Intelligence tools like Lightdash involves more than just setting up analytics dashboards. It requires careful consideration of various factors, from data model management to intricate permission controls. In this section, we’ll delve into some of the key challenges that organizations often encounter when governing BI tools.

Seamless Integration with Data Warehouses

One of the foremost challenges in BI governance is ensuring a seamless integration between data warehouses and BI tools. Traditional BI tools allow us to create data models on top of data warehouses but often lack the capability to validate if changes in the data warehouse will adversely affect the BI tool.

In an ideal world, modifications to the data warehouse should automatically update in the BI tool without causing errors or inconsistencies. This seamless linkage is vital for maintaining the integrity of analytics and reports.

Fine-grained Permission Controls

Regional and Product-based Segregation

A significant challenge in BI governance is the need for fine-grained permission controls. For global organizations operating in multiple countries and offering various products, the ability to segregate data and analytics becomes crucial. For example, a company with operations in both Japan and the United States may need to keep data and analytics separate for each region. Similarly, different products within the same region may require distinct analytics spaces.

Types of Employment Status

Adding to the complexity is the varying employment statuses within an organization. Permanent employees, temporary contract employees, and subcontractors may all require different levels of access to the BI tool. The need to manage these granular permissions effectively adds another layer of requirements to fine-grained permission controls. Balancing these permissions without compromising on security or functionality is a significant governance challenge.

By tackling these challenges, organizations can build a more robust, secure, and efficient BI environment. In the upcoming sections, we’ll look at how Lightdash addresses these challenges, with a special focus on deployment confidence and access control.

Deploying Lightdash Projects with Confidence

One of the most significant challenges in BI governance is ensuring a seamless link between data warehouses and BI tools. Lightdash offers a set of features designed to address this challenge, enabling us to deploy projects with greater confidence. In this section, we’ll explore these features and how they contribute to a more robust and reliable deployment process.

Using GitHub Actions for Deployment and Validation

Automating the deployment process is crucial for minimizing errors and ensuring consistency. Lightdash allows us to use GitHub Actions to automate the deployment of tables and models defined by dbt. This setup not only automates the deployment but also validates our changes against our Lightdash projects.

By using GitHub Actions, we can set up workflows that automatically run the lightdash validate command provided by the Lightdash CLI. This command checks for any inconsistencies or errors in our Lightdash charts and dashboards that might occur due to changes in our dbt models within the data warehouse. If any validation errors are detected, the workflow will halt, allowing us to fix the issues before they affect our production environment.

Solving the Seamless Link Challenge

The ability to validate changes in real-time addresses one of the most pressing challenges in BI tool governance: creating a seamless link between data warehouses and BI tools. By catching validation errors early in the deployment process, we can ensure that changes to our data warehouse — managed by dbt — do not break our Lightdash charts and dashboards. This feature is crucial for maintaining the integrity of our analytics and reports.

Lightdash CLI for Experimental Deployments

For further confidence, Lightdash CLI offers a lightdash start-preview command. This allows us to create a temporary Lightdash project where we can experiment with metrics, dimensions, and charts without affecting our main project. This sandbox environment is perfect for testing new dbt models and their impact on our Lightdash setup before making changes to the production environment.

By utilizing GitHub Actions for automated deployments and validations, along with the Lightdash CLI for experimental setups, we can significantly mitigate the risks associated with BI deployments. These features ensure that our Lightdash projects remain consistent and error-free, even when there are changes to the underlying data warehouse managed by dbt.

By incorporating these features into our deployment process, we can address the challenge of seamlessly linking our data warehouse and Lightdash, thereby enhancing the governance and reliability of our BI tools.

Access Control in Lightdash

Managing access effectively is a cornerstone of robust BI governance. Lightdash offers a nuanced approach to access control, incorporating both role-based and attribute-based access control mechanisms. In this section, we’ll explore the hierarchical structure of Lightdash and how it facilitates fine-grained permission controls.

Lightdash Project Structure

Organization

Lightdash allows usto establish an Organization, serving as the overarching entity under which multiple projects and spaces can exist. This level is where we can set global governance policies affecting all projects and spaces.

Projects

Within an organization, we can create multiple Projects, each acting as a separate container for analytics related to specific business units, products, or services. Projects are the primary units where dbt models are deployed and where we can set project-specific governance policies.

Spaces

Each project can be further segmented into Spaces, essentially folders containing our charts and dashboards. Spaces allow for more granular organization of our analytics content, facilitating easier access management for different teams or departments.

Fine-grained Permission Controls

Lightdash offers two types of access control mechanisms:

Role-based Access Control

  • Roles at the Organization Level: Lightdash allows us to define roles at the organization level, providing a way to set broad permissions applicable across all projects and spaces within the organization.
  • Roles at the Project Level: Lightdash also allows roles at the project level, which are more specific and can be tailored to individual projects.
  • Space Access: Space permissions are inherited from a user’s project permissions, allowing for even more granular control.

Attribute-based Access Control

  • User Attributes: Lightdash allows us to define user attributes at the organization level, such as ‘Sales Region’ or ‘Department’. These attributes can be used to customize charts, dashboards, and even SQL queries, providing another layer of personalized access control.
  • SQL Filtering with User Attributes: We can use user attributes to filter the rows returned by a query or a join, adding an extra layer of security and personalization.

Solving the Challenge of Managing Complicated, Separated Resources

The role-based and attribute-based access control mechanisms in Lightdash offer a comprehensive solution to one of the most significant challenges in BI governance: managing access to a myriad of separated resources. The ability to define roles and attributes at various levels — organization, project, and space — allows us to tailor access permissions to the specific complexities of our organization. This multi-layered approach ensures that we can segregate data and analytics based on business units, geographical locations, or even employment statuses, making our BI governance both robust and flexible.

By understanding the hierarchical structure of Lightdash and the dual mechanisms of role-based and attribute-based access control, we can manage access to our BI tools effectively. This multi-layered approach ensures that the right people have the right level of access, enhancing both security and functionality.

Lightdash APIs

Lightdash offers a robust set of APIs that allow us to programmatically interact with its platform, further enhancing its governance capabilities. These APIs are particularly useful for managing roles and space access, among other functionalities. Below, we’ll delve into some of the key APIs that facilitate these operations.

Role Management APIs

  • Create Role: This API allows usto create a new role within an organization or project. We can specify the permissions associated with this role, making it easier to manage access at scale.
  • Update Role: If we need to modify the permissions of an existing role, this API provides the flexibility to do so.
  • Delete Role: Should a role become obsolete or require removal, this API enables us to delete it from the system.
  • List Roles: This API provides a list of all roles within a

Space Access APIs

  • Create Space: This API allows us to create a new space within a project, specifying which roles or users have access to it.
  • Delete Space: Similar to role deletion, this API allows us to remove a space if it’s no longer needed.
  • Grant Space Access: This API lets us grant the access permissions of an existing space.
  • Revoke Space Access: This API lets us revoke the space access permissions of an existing space.
  • List Spaces: Provides a list of all spaces within a project, along with their associated access permissions, making it easier to manage and audit.

The availability of these APIs significantly enhances Lightdash’s governance capabilities. By using these APIs, we can automate the process of role and space management, ensuring that access controls are consistently applied across our organization. This is particularly beneficial for complex organizations with multiple projects, spaces, and varying levels of access requirements.

Custom Terraform Provider

Managing permissions and resources in a BI tool can be a cumbersome task, especially when done manually. To automate this process and make it more efficient, we have implemented a custom Terraform provider for Lightdash. This provider leverages Lightdash’s robust set of APIs to programmatically manage spaces, roles at the organization and project levels, and space accesses.

How It Works

The custom Terraform provider is designed to interact with Lightdash’s APIs, allowing us to define our Lightdash resources as code. Here’s a brief overview of how we can use this provider:

Managing roles at project level

We can manage roles at both the organization and project levels. For instance, to grant an ‘editor’ role at the project level to a user, we can use the following code:

resource "lightdash_project_role_member" "test" {
project_uuid = data.lightdash_project.jaffle_shop.project_uuid
user_uuid = data.lightdash_organization_member.test_user.user_uuid
role = "editor"
}

Managing spaces and access

Similarly, we can manage spaces and their access permissions. While the specific API calls for this are not detailed in the documentation, the Terraform provider is designed to handle these tasks seamlessly. To create a new space within a project, we can use the following Terraform code:


resource "lightdash_space" "test_public" {
project_uuid = "xxxxxxxx-xxxxxxxxxx-xxxxxxxxx"
name = "zzz_test_private_space"
is_private = true

deletion_protection = false
}

To manage access to the newly created space, we can specify which roles or users have access to it:

resource "lightdash_space_access_member" "example" {
project_uuid = "xxxxx-xxxxx-xxxx"
space_uuid = "yyyy-yyyy-yyy"
user_uuid = data.lightdash_organization_member.example.user_uuid
}

Benefits for Governance

The custom Terraform provider offers several advantages:

  • Automated Management: By defining our Lightdash resources as code, we can automate the management process, making it more efficient and less error-prone.
  • Version Control: Since the configurations are stored as code, they can be version-controlled, providing an audit trail and making it easier to roll back changes if needed.
  • Scalability: As our organization grows, managing resources manually becomes increasingly challenging. The Terraform provider scales with our needs, allowing us to manage complex setups with ease.

By using this custom Terraform provider, we can programmatically manage all aspects of our Lightdash setup, from roles and permissions to spaces and their access controls. This not only makes the governance process more efficient but also ensures that it is consistent and secure.

Python-based Command-Line Tool for Lightdash

In addition to the custom Terraform provider, we’ve also developed a Python-based command-line tool called lightdash-ops to interact with Lightdash. This tool focuses on operating resources like user roles and spaces on Lightdash, similar to what the official Lightdash CLI offers. For example, we can easily retrieve all members in an organization by running a simple command like lightdash-ops organization get-members.

$ export LIGHTDASH_URL="https://localhost:8000"
$ export LIGHTDASH_API_KEY="YOUR-LIGHTDASH-PERSONAL-ACCESS-TOKEN"
$ lightdash-ops organization get-members
[
{
"member_uuid": "ade0aef5-bca8-4cbe-819b-07803390ffb0",
"email": "lightdash-member@example.com",
"role": "member"
},
{
"member_uuid": "d7ee948b-26d6-461a-b289-906cc7bb0c73",
"email": "lightdash-admin@example.com",
"role": "admin"
}
]

Contributions to Lightdash: Leverage Open-Sourced collaborations

Thanks to the dedication of the Lightdash team and contributors, I was able to enhance the APIs to implement the custom terraform provider. We definitelly take advantage of the benefits of a open-sourced project. We can openly discuss new features and impelement them. This is one of the main reasons why we take Lightdash as an open-sourcecd project.

Summary

In this article, we’ve explored the challenges of BI governance and how Lightdash addresses these issues with its robust features and extensible APIs. We delved into the hierarchical structure of Lightdash, which allows for fine-grained permission controls at the organization, project, and space levels. We also discussed the dual mechanisms of role-based and attribute-based access control, which offer a comprehensive solution to managing complicated, separated resources.

We took a closer look at how to deploy Lightdash projects with confidence, leveraging features like GitHub Actions and the Lightdash CLI. These tools enable seamless integration with data warehouses and provide a safety net to catch validation errors, ensuring that changes in data models do not break our analytics setup.

To further enhance governance capabilities, we introduced a custom Terraform provider that allows for programmatic management of Lightdash resources. This provider makes it easier to manage roles, spaces, and their respective access permissions, all defined as code for better governance and scalability.

Lastly, we touched upon a Python-based command-line tool, lightdash-ops, which offers a convenient and flexible way to interact with Lightdash's APIs. This tool serves as a valuable addition to Lightdash's governance toolkit, providing quick and efficient ways to manage resources.

By leveraging these features and tools, Lightdash not only simplifies the complexities of BI governance but also provides a scalable and secure environment for analytics. Whether you’re a small team or a large organization, Lightdash offers a comprehensive set of tools to meet your BI governance needs.

In the part 3, we will discuss the potential future direction to leverage AI and LLM in data analytics and Lightdash as a BI tool.

--

--

Yu Ishikawa

Data Engineering / Machine Learning / MLOps / Data Governance / Privacy Engineering