Lightdash at Ubie Part 1: Introduction

Yu Ishikawa
12 min readOct 18, 2023

--

This is part 1 of the blog post series about Lightdash at Ubie. Ubie automatically generates medical records using an AI-powered patient questionnaire that helps save time and provide better patient care. As you can imagine, data engineering, data management and data governance are very significant to build the high-quality AI-powered AI system.

I plan to post three blogs in the series. In part 1, I would like to introduce to Lightdash by describing the key features and how we deploy tables to Lightdash, an open-source BI tool that integrates seamlessly with modern data stacks. Built on top of dbt (Data Build Tool), it leverages the power of dbt’s semantic layer to provide a consistent and reliable data model. This ensures that you have a single source of truth for your analytics, solving one of the most significant pain points we faced with traditional BI tools.

The goals of part 1 are:

  1. Describe our data stack and the challenges before introducing Lightdash on the previous BI tool.
  2. Understand Key Features: We’ll delve into the core functionalities of Lightdash, from data exploration to sharing insights, and explain why they are game-changers in the BI landscape.
  3. Develop and Deploy with Lightdash: Learn how to seamlessly add dimensions and metrics to your tables and deploy them in Lightdash. We’ll also discuss the advantages of using the Cloud Pro Tier of Lightdash for scalable and secure data operations.

By the end of this article, you’ll have a comprehensive understanding of what makes Lightdash a standout choice for modern data stacks, and how you can leverage its features for more efficient and insightful data operations. So, let’s dive in and explore the world of Lightdash!

Data Stack at Ubie

Understanding the technical context of our data stack is crucial for appreciating the value that Lightdash brings to our operations. Let’s take a closer look at the components of our data ecosystem.

BigQuery

Our primary data storage and query engine is Google’s BigQuery. This fully-managed, serverless data warehouse is designed for speed and scalability, allowing us to run SQL queries efficiently using Google’s robust infrastructure.

dbt Core

For data transformation and modeling, we use dbt Core which is the open-sourced edition of dbt. It enables us to write, document, and execute SQL-based data workflows. dbt also acts as a semantic layer, ensuring that our metrics and dimensions are consistent across all analytical tools.

Redash: Our Previous BI Tool

Before transitioning to Lightdash, we used Redash as our Business Intelligence tool. Redash allowed us to connect and query various data sources and build dashboards for data visualization.

Challenges on the traditional BI tool

While traditional Business Intelligence tools like Redash have served us well in the past, we’ve encountered several limitations that hindered our data operations. Let’s delve into these challenges:

Inefficiency in Custom SQL Queries

Writing custom SQL queries in Redash is flexible but not necessarily productive. This limitation hampers our data democratization efforts. We aim to empower not just data analysts who are proficient in SQL, but also team members who may not be as familiar with SQL to analyze data quickly and efficiently.

Lack of a Single Source of Truth

Custom SQL queries in Redash also lead to another significant issue: the absence of a single source of truth. Without a centralized way to manage data definitions, it becomes challenging to maintain consistency across various data models and reports.

No Alerts for Broken Data Models

Traditional BI tools, including Redash, primarily focus on data modeling features that sit on top of data warehouses like BigQuery. However, they fall short in alerting us when changes in the data warehouse break existing charts and dashboards. This lack of proactive warning can lead to significant issues down the line.

Limited Permission Control

Redash offers basic user group management by allowing us to attach different data sources to specific user groups. However, this approach lacks comprehensive permission control, making it difficult to fine-tune access at a granular level.

Understanding these challenges is crucial as we explore how Lightdash addresses each of these pain points, offering a more robust and user-friendly solution for modern data stacks.

Key features of Lightdash

Explore: The Data Exploration Interface

The Explore feature serves as the primary interface for data exploration in Lightdash. It offers a list of metrics and dimensions available on the Table, a filters panel to restrict data, a chart panel for visualization, and a results panel to explore query results. You can start by selecting a Table and running your query, which can be as simple as selecting a metric and one or more dimensions to split the metric into groups. The Explore view is highly customizable and allows you to save your charts for future reference or sharing.

Interacting with Dashboards

Dashboards in Lightdash allow you to arrange multiple related charts into a single view. You can create a new dashboard or add existing charts to it. The dashboards are flexible; you can resize and reposition charts as needed. Filters can also be added to dashboards, allowing for more interactive data exploration.

SQL Runner: Direct SQL Queries

For those who prefer writing SQL queries, Lightdash offers the SQL Runner. This feature allows you to run direct SQL queries and visualize the results, offering a balance between flexibility and control.

Share Insights

Lightdash provides multiple ways to share your insights, including sharing URLs to ‘work-in-progress’ data explorations or saved charts. You can also download results as a .csv or various image formats, making it easy to share your findings with your team.

Scheduled Delivery & Slack Integration

While this feature is not explicitly mentioned in the documentation, scheduled delivery can be a crucial aspect of any BI tool. It allows you to schedule the delivery of reports and dashboards at regular intervals, ensuring that your team is always up-to-date with the latest insights.

Lightdash offers Slack integration, allowing you to receive real-time updates and alerts directly in your Slack channels. This feature enhances collaboration and ensures that your team is immediately notified of any critical data changes or insights.

By offering these advanced features, Lightdash not only addresses the limitations of traditional BI tools but also adds layers of functionality and user-friendliness that are essential for modern data analytics.

Deploying a Lightdash Project

Setting up and deploying a Lightdash project involves a few key steps that integrate closely with your dbt project. Here’s how you can get your Lightdash project up and running:

Preparing Your dbt Project

Before you can use Lightdash, you need to prepare your dbt project. Lightdash uses Tables, which are essentially dbt models defined in your dbt project’s .yml files. To make your dbt project Lightdash-ready, you need to define all the columns in your dbt models that you want to explore in Lightdash. This can be easily done using the Lightdash CLI.

Deploying Your dbt Project

Once you’ve defined your dimensions and metrics in your dbt project, the next step is to deploy these changes to your Lightdash project. There are two primary ways to do this: using the Lightdash UI and using the Lightdash CLI. Please

Implementing Dimensions and Metrics in Lightdash

One of the most powerful features of Lightdash is its seamless integration with dbt, allowing you to define dimensions and metrics directly in your dbt project. Here’s how to go about it:

Defining Dimensions

In Lightdash, dimensions are more than just columns in your data tables; they are highly customizable attributes that can be fine-tuned to meet your specific analytical needs. Here’s a closer look at the specifications you can set for each dimension in Lightdash.

Basic configuration

At its core, a dimension in Lightdash corresponds 1:1 with a column in your dbt model. To make a column appear as a dimension in Lightdash, you simply declare it in your dbt model’s YAML file. For example:

# schema.yml
version: 2
models:
- name: "orders"
description: "A table of all orders."
columns:
- name: "status"
description: "Status of an order: ordered/processed/complete"

Advanced Customization

Lightdash allows you to go beyond basic configuration by offering a range of properties that can be customized:

  • Label: You can set a custom label for the dimension, which will be displayed in Lightdash instead of the dimension name.
  • Type: While Lightdash automatically pulls the dimension type from your table schema, you can override it using this property.
  • Description: This property allows you to set or override the description of the dimension in Lightdash.
  • SQL: You can apply custom SQL logic to the column used to define the dimension.
  • Time Intervals: For date and timestamp dimensions, Lightdash automatically adds time intervals like ‘DAY’, ‘WEEK’, ‘MONTH’, etc. You can customize these or even turn them off.
  • Hidden: If set to true, the dimension will be hidden from Lightdash.
  • Round: This property rounds a number to a specified number of digits.
  • Format: You can format the output value on the result table and CSV export. Supported formats include ‘km’, ‘mi’, ‘usd’, ‘gbp’, ‘eur’, etc.
  • URLs: You can add URLs to a dimension, allowing users to click dimension values in the UI to take specific actions, like opening an external tool.

Metrics in Lightdash

Metrics in Lightdash serve as the backbone for data summarization and analysis. They are essentially values that describe or summarize features from a collection of data points. For instance, “Num unique user ids” could be a metric that counts the unique number of user IDs in a dataset.

Types of Metrics

Metrics in Lightdash fall into two main categories:

Aggregate Metrics: These metrics perform aggregations on your data. Examples include count, count_distinct, sum, average, max, and min. Aggregate metrics can only reference dimensions, not other metrics.

  • count and count_distinct: These are used for counting the total and unique number of values in a dimension, respectively.
  • sum: Adds up the values in a given field.
  • average: Takes the average of the values in a given field.
  • max and min: These metrics find the largest and smallest values in a numeric field, respectively.

Non-Aggregate Metrics: These metrics do not perform aggregations and are generally used for boolean conditions or simple transformations. Examples include boolean, number, and date.

  • boolean: Used to show if something is true or false based on other aggregate metrics.
  • number: Used for simple transformations on other metrics.
  • date: Used to derive date values from expressions.

Metric Configuration

Metrics in Lightdash can be highly customized. You can define metrics using the meta tag in your dbt project. Here's a simplified example:

version: 2

models:
- name: sales_stats
meta:
joins:
- join: web_sessions
sql_on: ${web_sessions.date} = ${sales_stats.date}
columns:
- name: revenue
description: "Total estimated revenue in GBP based on forecasting done by the finance team."
meta:
metrics:
total_revenue:
label: 'Total revenue GBP'
type: SUM
description: "Total revenue in GBP"
sql: "IF(${revenue} IS NULL, 10, ${revenue})"
hidden: false
round: 0
format: 'gbp'
show_underlying_values:
- revenue
- forecast_date
- web_sessions.session_id # field from joined table
filters:
- is_adjusted: true

Please take a look at the official documentation. It describes how we can configure metrics in detail.

Leveraging Lightdash Cloud Pro for Scalable Analytics

As our data analytics needs grew, we found it increasingly challenging to manage and upgrade Lightdash on our own. That’s when we turned to Lightdash Cloud Pro, a fully-managed, scalable solution that comes with a host of advanced features tailored for businesses like ours. Here’s why Lightdash Cloud Pro is a game-changer for us:

Follow the Rapid Growth of Lightdash

One of the main reasons we opted for Lightdash Cloud Pro is its ability to keep pace with the rapid growth and updates of Lightdash itself. Managing frequent updates in-house can be time-consuming and prone to errors. With Cloud Pro, we can focus on analytics while Lightdash takes care of the upgrades.

API & Webhooks

Lightdash Cloud Pro offers robust API and webhook functionalities. This allows us to integrate Lightdash seamlessly into our existing tech stack, automating data workflows and triggering real-time alerts based on our analytics.

Unlimited Seats

The Cloud Pro tier comes with unlimited seats, providing the flexibility to scale our analytics team without worrying about additional costs per user. This is particularly beneficial as our team grows and more stakeholders need access to data insights.

HIPAA BAA

For businesses in healthcare or those dealing with sensitive data, Lightdash Cloud Pro offers HIPAA Business Associate Agreements (BAAs), ensuring that your data is handled in compliance with industry regulations.

Data Protection Agreement

Security is a top priority for us, and Lightdash Cloud Pro doesn’t disappoint. Hosted in Google Cloud with servers in Europe and North America, it offers robust data protection agreements, ensuring that our data remains safe and secure.

By opting for Lightdash Cloud Pro, we’ve been able to focus more on deriving insights from our data and less on managing the tool itself, making it an invaluable asset in our analytics journey.

Upcoming Features: The Future of Lightdash

As we continue to leverage Lightdash for our analytics needs, we’re particularly excited about some of the upcoming features that promise to make our data exploration even more powerful and intuitive.

Integration with dbt Semantic Layer

The integration between Lightdash and dbt’s new Semantic Layer is a significant milestone in the data analytics community. This feature allows you to define your core business metrics in one place within dbt and consume them in various downstream tools, including Lightdash. This is a game-changer for maintaining a single source of truth for your metrics across different platforms.

The integration is already live for Lightdash Cloud users and involves three simple steps:

  1. Get Connected: The first step is to connect with the new dbt Semantic Layer. If you haven’t already, you can follow dbt’s getting started guide to define your first metrics.
  2. Start Querying: Once connected, you can head over to Lightdash’s semantic layer explorer, where you’ll find a new sidebar displaying your metrics and dimensions. Simply click on any metric to start visualizing it.
  3. Visualize and Collaborate: Lightdash will automatically pick a suitable visualization for your metric, which you can then customize using Lightdash’s extensive visualization library.

Charts from SQL Runner

Another upcoming feature that has caught our attention is the ability to save charts directly from SQL Runner. This feature will be particularly useful for edge cases where SQL-only charts make more sense. For example, if you want to create a complex matrix to track user transitions between different product tiers, this can be easily achieved using SQL Runner.

Both of these features align perfectly with our goal to make data analytics more accessible, reliable, and actionable. We’re eagerly looking forward to incorporating these new capabilities into our analytics workflow.

Summary

In part 1 of the blog serise, we’ve taken a deep dive into Lightdash, a data-friendly Business Intelligence tool that seamlessly integrates with dbt. We started by outlining our data stack and the challenges we faced with traditional BI tools like Redash. Then, we explored the key features of Lightdash, from its powerful data exploration interface to its scheduled delivery.

We also discussed how to implement dimensions and metrics in Lightdash, providing a step-by-step guide to make your dbt project Lightdash-ready. For those looking for a fully-managed solution, we shared our experience with Lightdash Cloud Pro, highlighting its advanced features like API & webhooks, unlimited seats, and robust data protection agreements.

Lastly, we touched upon some exciting upcoming features, including the integration with dbt’s new Semantic Layer and the ability to save charts directly from SQL Runner. Both of these features promise to make Lightdash even more powerful and user-friendly.

As we continue to leverage Lightdash for our analytics needs, we’re excited about its potential to revolutionize the way businesses handle data. Whether you’re a head of data, an analytics engineer, or a data engineer planning to build a modern data stack, Lightdash offers a robust, scalable, and intuitive solution that can meet your needs.

In the part 2, we will discuss how we govern Lightdash at scale.

--

--

Yu Ishikawa

Data Engineering / Machine Learning / MLOps / Data Governance / Privacy Engineering