Self HostingInfrastructureClickhouse
Version: v3

ClickHouse

This is a deep dive into ClickHouse configuration. Follow one of the deployment guides to get started.

ClickHouse is the main OLAP storage solution within Langfuse for our Trace, Observation, and Score entities. It is optimized for high write throughput and fast analytical queries. This guide covers how to configure ClickHouse within Langfuse and what to keep in mind when (optionally) bringing your own ClickHouse.

Langfuse supports ClickHouse versions >= 24.3.

Configuration

Langfuse accepts the following environment variables to fine-tune your ClickHouse usage. They need to be provided for the Langfuse Web and Langfuse Worker containers.

VariableRequired / DefaultDescription
CLICKHOUSE_MIGRATION_URLRequiredMigration URL (TCP protocol) for the ClickHouse instance. Pattern: clickhouse://<hostname>:(9000/9440)
CLICKHOUSE_MIGRATION_SSLfalseSet to true to establish an SSL connection to ClickHouse for the database migration.
CLICKHOUSE_URLRequiredHostname of the ClickHouse instance. Pattern: http(s)://<hostname>:(8123/8443)
CLICKHOUSE_USERRequiredUsername of the ClickHouse database. Needs SELECT, ALTER, INSERT, CREATE, DELETE grants.
CLICKHOUSE_PASSWORDRequiredPassword of the ClickHouse user.
CLICKHOUSE_DBdefaultName of the ClickHouse database to use.
CLICKHOUSE_CLUSTER_ENABLEDtrueWhether to run ClickHouse commands ON CLUSTER. Set to false for single-container setups.
LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLEDfalseWhether to disable automatic ClickHouse migrations.

Langfuse uses default as the cluster name if CLICKHOUSE_CLUSTER_ENABLED is set to true. You can overwrite this by setting CLICKHOUSE_CLUSTER_NAME to a different value. In that case, the database migrations will not apply correctly as they cannot run dynamically for different clusters. You must set LANGFUSE_AUTO_CLICKHOUSE_MIGRATION_DISABLED = false and run ClickHouse migrations manually. Clone the Langfuse repository, adjust the cluster name in ./packages/shared/clickhouse/migrations/clustered/*.sql and run cd ./packages/shared && sh ./clickhouse/scripts/up.sh to manually apply the migrations.

Deployment Options

This section covers different deployment options and provides example environment variables.

ClickHouse Cloud

ClickHouse Cloud is a scalable and fully managed deployment option for ClickHouse. You can provision it directly from ClickHouse or through one of the cloud provider marketplaces:

ClickHouse Cloud clusters will be provisioned outside your cloud environment and your VPC, but Clickhouse offers private links for AWS, GCP, and Azure.

Example Configuration

Set the following environment variables to connect to your ClickHouse instance:

CLICKHOUSE_URL=https://<identifier>.<region>.aws.clickhouse.cloud:8443
CLICKHOUSE_MIGRATION_URL=clickhouse://<identifier>.<region>.aws.clickhouse.cloud:9440
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=changeme
CLICKHOUSE_MIGRATION_SSL=true

Troubleshooting

  • ‘error: driver: bad connection in line 0’ during migration: If you see the previous error message during startup of your web container, ensure that the CLICKHOUSE_MIGRATION_SSL flag is set and that Langfuse Web can access your ClickHouse environment. Review the IP whitelisting if applicable and whether the instance has access to the Private Link.

ClickHouse on Kubernetes (Helm)

The Bitnami ClickHouse Helm Chart provides a production ready deployment of ClickHouse using a given Kubernetes cluster. We use it as a dependency for Langfuse K8s. See Langfuse on Kubernetes (Helm) for more details on how to deploy Langfuse on Kubernetes.

Example Configuration

For a minimum production setup, we recommend to use the following values.yaml overwrites when deploying the Clickhouse Helm chart:

clickhouse:
  deploy: true
  shards: 1 # Fixed: Langfuse does not support multi-shard clusters
  replicaCount: 3
  resourcesPreset: large # or more
  auth:
    username: default
    password: changeme
  • shards: Shards are used for horizontally scaling ClickHouse. A single ClickHouse shard can handle multiple Terabytes of data. Today, Langfuse does not support a multi-shard cluster, i.e. this value must be set to 1. Please get in touch with us if you hit scaling limits of a single shard cluster.
  • replicaCount: The number of replicas for each shard. ClickHouse counts the all instances towards the number of replicas, i.e. a replica count of 1 means no redundancy at all. We recommend a minimum of 3 replicas for production setups. The number of replicas cannot be increased at runtime without manual intervention or downtime.
  • resourcesPreset: ClickHouse is CPU and memory intensive for analytical and highly concurrent requests. We recommend at least the large resourcesPreset and more for larger deployments.
  • auth: The username and password for the ClickHouse database. Overwrite those values according to your preferences, or mount them from a secret.
  • disk: The ClickHouse Helm chart uses the default storage class to create volumes for each replica. Ensure that the storage class has allowVolumeExpansion = true as observability workloads tend to be very disk heavy. For cloud providers like AWS, GCP, and Azure this should be the default.

Langfuse assumes that certain parameters are set in the ClickHouse configurations. To perform our database migrations, the following values must be provided:

<!--
    Substitutions for parameters of replicated tables.
     Optional. If you don't use replicated tables, you could omit that.
     See https://clickhouse.com/docs/en/engines/table-engines/mergetree-family/replication/#creating-replicated-tables
-->
<!--
    <macros>
        <shard>01</shard>
        <replica>example01-01-1</replica>
    </macros>
-->
<!--
    <default_replica_path>/clickhouse/tables/{database}/{table}</default_replica_path>
    <default_replica_name>{replica}</default_replica_name>
-->

macros and default_replica_* configuration should be covered by the Helm chart without any further configuration.

Set the following environment variables to connect to your ClickHouse instance assuming that Langfuse runs within the same Cluster and Namespace:

CLICKHOUSE_URL=http://<chart-name>-clickhouse:8123
CLICKHOUSE_MIGRATION_URL=clickhouse://<chart-name>-clickhouse:9000
CLICKHOUSE_USER=default
CLICKHOUSE_PASSWORD=changeme

Docker

You can run ClickHouse in a single Docker container for development purposes. As there is no redundancy, this is not recommended for production workloads.

Example Configuration

Start the container with

docker run --name clickhouse-server \
  -e CLICKHOUSE_DB=default \
  -e CLICKHOUSE_USER=clickhouse \
  -e CLICKHOUSE_PASSWORD=clickhouse \
  -d --ulimit nofile=262144:262144 \
  -p 8123:8123 \
  -p 9000:9000 \
  -p 9009:9009 \
  clickhouse/clickhouse-server

Set the following environment variables to connect to your ClickHouse instance:

CLICKHOUSE_URL=http://localhost:8123
CLICKHOUSE_MIGRATION_URL=clickhouse://localhost:9000
CLICKHOUSE_USER=clickhouse
CLICKHOUSE_PASSWORD=clickhouse
CLICKHOUSE_CLUSTER_ENABLED=false

Blob Storage as Disk

ClickHouse supports blob storages (AWS S3, Azure Blob Storage, Google Cloud Storage) as disks. This is useful for auto-scaling storages that live outside the container orchestrator and increases availability und durability of the data. For a full overview of the feature, see the ClickHouse External Disks documentation.

Below, we give a config.xml example to use S3 and Azure Blob Storage as disks for ClickHouse Docker containers using Docker Compose. Keep in mind that metadata is still stored on local disk, i.e. you need to use a persistent volume for the ClickHouse container or risk loosing access to your tables.

S3 Example

Create a config.xml file with the following contents in your local working directory:

<clickhouse>
    <merge_tree>
        <storage_policy>s3</storage_policy>
    </merge_tree>
    <storage_configuration>
        <disks>
            <s3>
                <type>object_storage</type>
                <object_storage_type>s3</object_storage_type>
                <metadata_type>local</metadata_type>
                <endpoint>https://s3.eu-central-1.amazonaws.com/example-bucket-name/data/</endpoint>
                <access_key_id>ACCESS_KEY</access_key_id>
                <secret_access_key>ACCESS_KEY_SECRET</secret_access_key>
            </s3>
        </disks>
        <policies>
            <s3>
                <volumes>
                    <main>
                        <disk>s3</disk>
                    </main>
                </volumes>
            </s3>
        </policies>
    </storage_configuration>
</clickhouse>

Replace the Access Key Id and Secret Access key with appropriate AWS credentials and change the bucket name within the endpoint element. Alternatively, you can replace the credentials with <use_environment_credentials>1</use_environment_credentials> to automatically retrieve AWS credentials from environment variables.

Now, you can start ClickHouse with the following Docker Compose file:

services:
  clickhouse:
    image: clickhouse/clickhouse-server
    user: "101:101"
    container_name: clickhouse
    hostname: clickhouse
    environment:
      CLICKHOUSE_DB: default
      CLICKHOUSE_USER: clickhouse
      CLICKHOUSE_PASSWORD: clickhouse
    volumes:
      - ./config.xml:/etc/clickhouse-server/config.d/s3disk.xml:ro
      - langfuse_clickhouse_data:/var/lib/clickhouse
      - langfuse_clickhouse_logs:/var/log/clickhouse-server
    ports:
      - "8123:8123"
      - "9000:9000"
 
volumes:
  langfuse_clickhouse_data:
    driver: local
  langfuse_clickhouse_logs:
    driver: local

Azure Blob Storage Example

Create a config.xml file with the following contents in your local working directory. The credentials below are the default Azurite credentials and considered public.

<clickhouse>
    <merge_tree>
        <storage_policy>blob_storage_disk</storage_policy>
    </merge_tree>
    <storage_configuration>
        <disks>
            <blob_storage_disk>
                <type>object_storage</type>
                <object_storage_type>azure_blob_storage</object_storage_type>
                <metadata_type>local</metadata_type>
                <storage_account_url>http://azurite:10000/devstoreaccount1</storage_account_url>
                <container_name>langfuse</container_name>
                <account_name>devstoreaccount1</account_name>
                <account_key>Eby8vdM02xNOcqFlqUwJPLlmEtlCDXJ1OUzFT50uSRZ6IFsuFq2UVErCz4I6tq/K1SZFPTOtr/KBHBeksoGMGw==</account_key>
            </blob_storage_disk>
        </disks>
        <policies>
            <blob_storage_disk>
                <volumes>
                    <main>
                        <disk>blob_storage_disk</disk>
                    </main>
                </volumes>
            </blob_storage_disk>
        </policies>
    </storage_configuration>
</clickhouse>

You can start ClickHouse together with an Azurite service using the following Docker Compose file:

services:
  clickhouse:
    image: clickhouse/clickhouse-server
    user: "101:101"
    container_name: clickhouse
    hostname: clickhouse
    environment:
      CLICKHOUSE_DB: default
      CLICKHOUSE_USER: clickhouse
      CLICKHOUSE_PASSWORD: clickhouse
    volumes:
      - ./config.xml:/etc/clickhouse-server/config.d/azuredisk.xml:ro
      - langfuse_clickhouse_data:/var/lib/clickhouse
      - langfuse_clickhouse_logs:/var/log/clickhouse-server
    ports:
      - "8123:8123"
      - "9000:9000"
    depends_on:
      - azurite
 
  azurite:
    image: mcr.microsoft.com/azure-storage/azurite
    container_name: azurite
    command: azurite-blob --blobHost 0.0.0.0
    ports:
      - "10000:10000"
    volumes:
      - langfuse_azurite_data:/data
 
volumes:
  langfuse_clickhouse_data:
    driver: local
  langfuse_clickhouse_logs:
    driver: local
  langfuse_azurite_data:
    driver: local

This will store ClickHouse data within the Azurite bucket.

Backups

ClickHouse Cloud manages backups automatically for you. For self-hosted ClickHouse instances, you need to create backups on your own. We recommend following the ClickHouse backup guide for more details. In addition, the ClickHouse state can be restored based on the data in S3. While this is computationally expensive and may take a long time, it is an alternative safety measure to prevent data loss.

Was this page useful?

Questions? We're here to help

Subscribe to updates