ChainLaunch

External Observability (remote_write)

ChainLaunch ships an embedded Prometheus instance that scrapes every managed node (see Configure Monitoring). For production you usually want those metrics…

ChainLaunch ships an embedded Prometheus instance that scrapes every managed node (see Configure Monitoring). For production you usually want those metrics in a central, long-term, multi-tenant store — Grafana Mimir, Grafana Cloud, AWS Managed Prometheus (AMP), Datadog, Thanos, or Cortex — rather than querying each ChainLaunch host individually.

The supported way to do this is Prometheus remote_write: ChainLaunch keeps scraping locally and forwards a copy of every sample to one or more external endpoints.

The Prometheus endpoint is loopback-by-default — `remote_write` is the right way to centralize

The managed Prometheus instance binds to 127.0.0.1 (loopback) by default in all three deployment modes (Docker, systemd, launchd), and it has no built-in authentication. Do not open the Prometheus port to a remote scraper or federation peer just to centralize metrics — that exposes every metric ChainLaunch collects to anyone who can reach the port.

Instead, configure remote_write so ChainLaunch pushes metrics outbound to your authenticated central store. The Prometheus port stays on loopback, your credentials live only in the ChainLaunch config (write-only, never returned in API responses), and nothing inbound is exposed. If you genuinely must expose Prometheus directly, read Monitoring & Metrics Hardening first.


How it works

When you configure one or more remote_write endpoints, ChainLaunch renders a remote_write: block into the generated prometheus.yml. Prometheus then streams samples to each endpoint as it scrapes. You can configure:

  • Multiple endpoints — fan out to several stores at once (e.g. a local Thanos and Grafana Cloud).
  • Authentication — HTTP basic auth, bearer token, or TLS client certificates per endpoint.
  • TLS — custom CA, client cert/key, server name, and (for testing only) insecure_skip_verify.
  • External labels — labels attached to every series so the central store can tell ChainLaunch instances apart (e.g. cluster, region, env).
  • TSDB retentionretention_time / retention_size to bound the local on-disk window once the central store is the system of record.

Credentials are write-only

basic_auth.password and bearer_token are accepted on write but are never returned in API responses or read back into Terraform state. Re-applying a config without re-specifying a secret keeps the stored value intact. See Notifications for the same secret-redaction pattern.


Configuring via the API

There are two ways to set remote_write and retention through the REST API:

  • POST /api/v1/metrics/deploy (or POST /api/v1/metrics/refresh) — set them as part of a full deploy/redeploy.
  • PATCH /api/v1/metrics/config — update just the observability settings (remote_write + retention + external_labels) without redeploying. This endpoint is ADMIN-only and triggers a live config reload when Prometheus is running (otherwise the changes apply on the next start).

Fields

Field Type Description
remote_write array One or more remote_write endpoints (see below). Send [] (on PATCH) to clear all endpoints.
external_labels object Key/value labels attached to every series and alert sent to external systems. Send {} (on PATCH) to clear.
retention_time string --storage.tsdb.retention.time (e.g. "30d", "90d"). Omitted keeps Prometheus' implicit 15d default.
retention_size string --storage.tsdb.retention.size (e.g. "50GB", "512MB"). Omitted means no size cap.

Each remote_write entry:

Field Type Description
url string (required) The remote_write receive endpoint.
name string Optional name for the endpoint.
remote_timeout string Per-request timeout (e.g. "30s").
basic_auth object { "username": "...", "password": "..." }. Password is write-only.
bearer_token string Bearer token for the endpoint. Write-only.
tls_config object ca_file, cert_file, key_file, server_name, insecure_skip_verify.
headers object Extra HTTP headers (e.g. a tenant ID header).
queue_config object Queue/shard tuning: capacity, max_shards, min_shards, max_samples_per_send, batch_send_deadline, min_backoff, max_backoff.

`tls_config` vs `tls`

The REST API uses tls_config for the TLS block. The Terraform provider uses the nested attribute name tls for the same settings. The examples below use the right name for each surface.

Example: PATCH remote_write + retention

curl -X PATCH -u "$CHAINLAUNCH_USER:$CHAINLAUNCH_PASSWORD" \
  "$CHAINLAUNCH_API_URL/metrics/config" \
  -H "Content-Type: application/json" \
  -d '{
    "retention_time": "30d",
    "retention_size": "50GB",
    "external_labels": {
      "cluster": "fabric-prod-eu",
      "region": "eu-central-1"
    },
    "remote_write": [
      {
        "url": "https://prometheus-prod.grafana.net/api/prom/push",
        "name": "grafana-cloud",
        "basic_auth": {
          "username": "123456",
          "password": "glc_eyJ...your-grafana-cloud-token..."
        }
      }
    ]
  }'

The response echoes the persisted settings with credentials redacted, plus a reload_triggered flag indicating whether a live reload was issued.


Configuring via Terraform

The chainlaunch_metrics_prometheus resource exposes remote_write, retention_time, and retention_size. See the resource reference for the full schema.

resource "chainlaunch_metrics_prometheus" "external_obs" {
  scrape_interval = 15
 
  # Bound the local on-disk window once the central store is authoritative
  retention_time = "30d"
  retention_size = "50GB"
 
  remote_write = [
    {
      url  = "https://prometheus-prod.grafana.net/api/prom/push"
      name = "grafana-cloud"
      basic_auth = {
        username = "123456"
        password = var.grafana_cloud_api_key # write-only, never read back
      }
    },
  ]
}

Terraform uses `tls`, the API uses `tls_config`

In Terraform the TLS block is the nested attribute tls (ca_file, cert_file, key_file, server_name, insecure_skip_verify). Credentials (basic_auth.password, bearer_token) are marked sensitive and are never read back from the API, so they are preserved from prior state on read.


Integration recipes

The following recipes show the remote_write payload for the most common external stacks. Each can be applied via PATCH /metrics/config, POST /metrics/deploy, or the Terraform resource. Only the remote_write[*] object is shown — wrap it in the request body or the Terraform remote_write = [ ... ] list as above.

Grafana Mimir (self-hosted)

Mimir is Grafana's horizontally scalable, multi-tenant long-term store. The receive endpoint is /api/v1/push on the Mimir gateway/distributor. Multi-tenancy is selected with the X-Scope-OrgID header.

API (tls_config):

{
  "url": "https://mimir.internal.example.com/api/v1/push",
  "name": "mimir",
  "headers": {
    "X-Scope-OrgID": "fabric-prod"
  },
  "tls_config": {
    "ca_file": "/etc/prometheus/certs/mimir-ca.pem"
  }
}

Terraform (tls):

remote_write = [
  {
    url  = "https://mimir.internal.example.com/api/v1/push"
    name = "mimir"
    tls = {
      ca_file = "/etc/prometheus/certs/mimir-ca.pem"
    }
    # Note: arbitrary headers (e.g. X-Scope-OrgID) and queue_config are
    # configurable via the REST API. For per-tenant headers, set them with
    # PATCH /metrics/config.
  },
]

If Mimir auth is fronted by a gateway requiring basic auth, add a basic_auth block with the tenant credentials. The X-Scope-OrgID header is set via the headers map on the API (PATCH /metrics/config / POST /metrics/deploy).

Grafana Cloud

Grafana Cloud's hosted Prometheus exposes a remote_write URL like https://prometheus-prod-XX-prod-REGION.grafana.net/api/prom/push. Authenticate with HTTP basic auth where the username is your stack/instance ID and the password is a Grafana Cloud access policy token (glc_...).

API:

{
  "url": "https://prometheus-prod-13-prod-us-east-0.grafana.net/api/prom/push",
  "name": "grafana-cloud",
  "basic_auth": {
    "username": "1234567",
    "password": "glc_eyJ...token..."
  }
}

Terraform:

remote_write = [
  {
    url  = "https://prometheus-prod-13-prod-us-east-0.grafana.net/api/prom/push"
    name = "grafana-cloud"
    basic_auth = {
      username = "1234567"
      password = var.grafana_cloud_api_key
    }
  },
]

Find the exact URL, instance ID, and token under Grafana Cloud → Connections → Hosted Prometheus → Sending metrics with Prometheus.

AWS Managed Service for Prometheus (AMP)

AMP uses an AWS-signed (SigV4) remote_write endpoint: https://aps-workspaces.<region>.amazonaws.com/workspaces/<workspace-id>/api/v1/remote_write.

AMP requires AWS SigV4 signing, which the standard Prometheus basic_auth/bearer_token fields cannot provide on their own. Run a small signing proxy (aws-sigv4-proxy) on the ChainLaunch host (so the Prometheus endpoint stays on loopback) and point remote_write at the proxy, which adds SigV4 headers and forwards to AMP:

API:

{
  "url": "http://127.0.0.1:8005/workspaces/ws-12345678-90ab-cdef-1234-567890abcdef/api/v1/remote_write",
  "name": "aws-amp"
}

Terraform:

remote_write = [
  {
    url  = "http://127.0.0.1:8005/workspaces/ws-12345678-90ab-cdef-1234-567890abcdef/api/v1/remote_write"
    name = "aws-amp"
  },
]

The proxy is invoked with the AMP host and region, e.g.:

aws-sigv4-proxy \
  --name aps \
  --region eu-west-1 \
  --host aps-workspaces.eu-west-1.amazonaws.com \
  --port :8005

Give the host an IAM role (instance profile / IRSA) with aps:RemoteWrite on the workspace. Keep the proxy on 127.0.0.1 so only ChainLaunch's local Prometheus can use it.

Datadog

Datadog accepts Prometheus remote_write at https://api.<datadog-site>/api/v2/series (e.g. api.datadoghq.com, api.datadoghq.eu). Authenticate with your Datadog API key sent as the DD-API-KEY header.

API:

{
  "url": "https://api.datadoghq.com/api/v2/series",
  "name": "datadog",
  "headers": {
    "DD-API-KEY": "<your-datadog-api-key>"
  }
}

The DD-API-KEY header is configured via the headers map on the REST API (PATCH /metrics/config or POST /metrics/deploy). Because the API key travels in a header, set it through the API so it is stored as a write-only secret and is not echoed back.

Confirm the exact intake URL for your Datadog site under Datadog → Integrations → APIs / Remote Write; intake paths occasionally change between agent/intake versions.


Choosing a local retention window

Once a central store is the system of record, you usually shrink the local Prometheus retention so the ChainLaunch host isn't holding months of data twice:

  • A 7–30 day local window covers in-app dashboards and short-term debugging.
  • The central store keeps the long-term, downsampled history.

Set this with retention_time (and optionally retention_size as a disk safety cap):

curl -X PATCH -u "$CHAINLAUNCH_USER:$CHAINLAUNCH_PASSWORD" \
  "$CHAINLAUNCH_API_URL/metrics/config" \
  -H "Content-Type: application/json" \
  -d '{"retention_time": "15d", "retention_size": "20GB"}'

Retention changes apply on the next Prometheus start/restart; remote_write and external_labels changes are applied via a live reload when Prometheus is running.


Verifying the pipeline

  1. Apply the config (PATCH /metrics/config, deploy, or terraform apply).

  2. Check the persisted config is reflected (credentials will show redacted):

    curl -s -u "$CHAINLAUNCH_USER:$CHAINLAUNCH_PASSWORD" \
      "$CHAINLAUNCH_API_URL/metrics/defaults" | jq '{remote_write, external_labels, retention_time, retention_size}'
  3. In your external store, query for the ChainLaunch up metric filtered by your external_labels (e.g. up{cluster="fabric-prod-eu"}) — samples should start arriving within a scrape interval or two.

  4. If nothing arrives, check the ChainLaunch host can reach the endpoint outbound, and look at Prometheus' own prometheus_remote_storage_* metrics for send failures.


See also