External Observability (remote_write)

ChainLaunch ships an embedded Prometheus instance that scrapes every managed node (see Configure Monitoring). For production you usually want those metrics in a central, long-term, multi-tenant store — Grafana Mimir, Grafana Cloud, AWS Managed Prometheus (AMP), Datadog, Thanos, or Cortex — rather than querying each ChainLaunch host individually.

The supported way to do this is Prometheus remote_write: ChainLaunch keeps scraping locally and forwards a copy of every sample to one or more external endpoints.

The Prometheus endpoint is loopback-by-default — `remote_write` is the right way to centralize

The managed Prometheus instance binds to 127.0.0.1 (loopback) by default in all three deployment modes (Docker, systemd, launchd), and it has no built-in authentication. Do not open the Prometheus port to a remote scraper or federation peer just to centralize metrics — that exposes every metric ChainLaunch collects to anyone who can reach the port.

Instead, configure remote_write so ChainLaunch pushes metrics outbound to your authenticated central store. The Prometheus port stays on loopback, your credentials live only in the ChainLaunch config (write-only, never returned in API responses), and nothing inbound is exposed. If you genuinely must expose Prometheus directly, read Monitoring & Metrics Hardening first.

How it works

When you configure one or more remote_write endpoints, ChainLaunch renders a remote_write: block into the generated prometheus.yml. Prometheus then streams samples to each endpoint as it scrapes. You can configure:

Multiple endpoints — fan out to several stores at once (e.g. a local Thanos and Grafana Cloud).
Authentication — HTTP basic auth, bearer token, or TLS client certificates per endpoint.
TLS — custom CA, client cert/key, server name, and (for testing only) insecure_skip_verify.
External labels — labels attached to every series so the central store can tell ChainLaunch instances apart (e.g. cluster, region, env).
TSDB retention — retention_time / retention_size to bound the local on-disk window once the central store is the system of record.

Credentials are write-only

basic_auth.password and bearer_token are accepted on write but are never returned in API responses or read back into Terraform state. Re-applying a config without re-specifying a secret keeps the stored value intact. See Notifications for the same secret-redaction pattern.

Configuring via the API

There are two ways to set remote_write and retention through the REST API:

POST /api/v1/metrics/deploy (or POST /api/v1/metrics/refresh) — set them as part of a full deploy/redeploy.
PATCH /api/v1/metrics/config — update just the observability settings (remote_write + retention + external_labels) without redeploying. This endpoint is ADMIN-only and triggers a live config reload when Prometheus is running (otherwise the changes apply on the next start).

Fields

Field	Type	Description
`remote_write`	array	One or more remote_write endpoints (see below). Send `[]` (on PATCH) to clear all endpoints.
`external_labels`	object	Key/value labels attached to every series and alert sent to external systems. Send `{}` (on PATCH) to clear.
`retention_time`	string	`--storage.tsdb.retention.time` (e.g. `"30d"`, `"90d"`). Omitted keeps Prometheus' implicit 15d default.
`retention_size`	string	`--storage.tsdb.retention.size` (e.g. `"50GB"`, `"512MB"`). Omitted means no size cap.

Each remote_write entry:

Field	Type	Description
`url`	string (required)	The remote_write receive endpoint.
`name`	string	Optional name for the endpoint.
`remote_timeout`	string	Per-request timeout (e.g. `"30s"`).
`basic_auth`	object	`{ "username": "...", "password": "..." }`. Password is write-only.
`bearer_token`	string	Bearer token for the endpoint. Write-only.
`tls_config`	object	`ca_file`, `cert_file`, `key_file`, `server_name`, `insecure_skip_verify`.
`headers`	object	Extra HTTP headers (e.g. a tenant ID header).
`queue_config`	object	Queue/shard tuning: `capacity`, `max_shards`, `min_shards`, `max_samples_per_send`, `batch_send_deadline`, `min_backoff`, `max_backoff`.

`tls_config` vs `tls`

The REST API uses tls_config for the TLS block. The Terraform provider uses the nested attribute name tls for the same settings. The examples below use the right name for each surface.

Example: PATCH `remote_write` + retention

curl -X PATCH -u "$CHAINLAUNCH_USER:$CHAINLAUNCH_PASSWORD" \
  "$CHAINLAUNCH_API_URL/metrics/config" \
  -H "Content-Type: application/json" \
  -d '{
    "retention_time": "30d",
    "retention_size": "50GB",
    "external_labels": {
      "cluster": "fabric-prod-eu",
      "region": "eu-central-1"
    },
    "remote_write": [
      {
        "url": "https://prometheus-prod.grafana.net/api/prom/push",
        "name": "grafana-cloud",
        "basic_auth": {
          "username": "123456",
          "password": "glc_eyJ...your-grafana-cloud-token..."
        }
      }
    ]
  }'

The response echoes the persisted settings with credentials redacted, plus a reload_triggered flag indicating whether a live reload was issued.

Configuring via Terraform

The chainlaunch_metrics_prometheus resource exposes remote_write, retention_time, and retention_size. See the resource reference for the full schema.

resource "chainlaunch_metrics_prometheus" "external_obs" {
  scrape_interval = 15
 
  # Bound the local on-disk window once the central store is authoritative
  retention_time = "30d"
  retention_size = "50GB"
 
  remote_write = [
    {
      url  = "https://prometheus-prod.grafana.net/api/prom/push"
      name = "grafana-cloud"
      basic_auth = {
        username = "123456"
        password = var.grafana_cloud_api_key # write-only, never read back
      }
    },
  ]
}

Terraform uses `tls`, the API uses `tls_config`

In Terraform the TLS block is the nested attribute tls (ca_file, cert_file, key_file, server_name, insecure_skip_verify). Credentials (basic_auth.password, bearer_token) are marked sensitive and are never read back from the API, so they are preserved from prior state on read.

Integration recipes

The following recipes show the remote_write payload for the most common external stacks. Each can be applied via PATCH /metrics/config, POST /metrics/deploy, or the Terraform resource. Only the remote_write[*] object is shown — wrap it in the request body or the Terraform remote_write = [ ... ] list as above.

Grafana Mimir (self-hosted)

Mimir is Grafana's horizontally scalable, multi-tenant long-term store. The receive endpoint is /api/v1/push on the Mimir gateway/distributor. Multi-tenancy is selected with the X-Scope-OrgID header.

API (tls_config):

{
  "url": "https://mimir.internal.example.com/api/v1/push",
  "name": "mimir",
  "headers": {
    "X-Scope-OrgID": "fabric-prod"
  },
  "tls_config": {
    "ca_file": "/etc/prometheus/certs/mimir-ca.pem"
  }
}

Terraform (tls):

remote_write = [
  {
    url  = "https://mimir.internal.example.com/api/v1/push"
    name = "mimir"
    tls = {
      ca_file = "/etc/prometheus/certs/mimir-ca.pem"
    }
    # Note: arbitrary headers (e.g. X-Scope-OrgID) and queue_config are
    # configurable via the REST API. For per-tenant headers, set them with
    # PATCH /metrics/config.
  },
]

If Mimir auth is fronted by a gateway requiring basic auth, add a basic_auth block with the tenant credentials. The X-Scope-OrgID header is set via the headers map on the API (PATCH /metrics/config / POST /metrics/deploy).

Grafana Cloud

Grafana Cloud's hosted Prometheus exposes a remote_write URL like https://prometheus-prod-XX-prod-REGION.grafana.net/api/prom/push. Authenticate with HTTP basic auth where the username is your stack/instance ID and the password is a Grafana Cloud access policy token (glc_...).

API:

{
  "url": "https://prometheus-prod-13-prod-us-east-0.grafana.net/api/prom/push",
  "name": "grafana-cloud",
  "basic_auth": {
    "username": "1234567",
    "password": "glc_eyJ...token..."
  }
}

Terraform:

remote_write = [
  {
    url  = "https://prometheus-prod-13-prod-us-east-0.grafana.net/api/prom/push"
    name = "grafana-cloud"
    basic_auth = {
      username = "1234567"
      password = var.grafana_cloud_api_key
    }
  },
]

Find the exact URL, instance ID, and token under Grafana Cloud → Connections → Hosted Prometheus → Sending metrics with Prometheus.

AWS Managed Service for Prometheus (AMP)

AMP uses an AWS-signed (SigV4) remote_write endpoint: https://aps-workspaces.<region>.amazonaws.com/workspaces/<workspace-id>/api/v1/remote_write.

AMP requires AWS SigV4 signing, which the standard Prometheus basic_auth/bearer_token fields cannot provide on their own. Run a small signing proxy (aws-sigv4-proxy) on the ChainLaunch host (so the Prometheus endpoint stays on loopback) and point remote_write at the proxy, which adds SigV4 headers and forwards to AMP:

API:

{
  "url": "http://127.0.0.1:8005/workspaces/ws-12345678-90ab-cdef-1234-567890abcdef/api/v1/remote_write",
  "name": "aws-amp"
}

Terraform:

remote_write = [
  {
    url  = "http://127.0.0.1:8005/workspaces/ws-12345678-90ab-cdef-1234-567890abcdef/api/v1/remote_write"
    name = "aws-amp"
  },
]

The proxy is invoked with the AMP host and region, e.g.:

aws-sigv4-proxy \
  --name aps \
  --region eu-west-1 \
  --host aps-workspaces.eu-west-1.amazonaws.com \
  --port :8005

Give the host an IAM role (instance profile / IRSA) with aps:RemoteWrite on the workspace. Keep the proxy on 127.0.0.1 so only ChainLaunch's local Prometheus can use it.

Datadog

Datadog accepts Prometheus remote_write at https://api.<datadog-site>/api/v2/series (e.g. api.datadoghq.com, api.datadoghq.eu). Authenticate with your Datadog API key sent as the DD-API-KEY header.

API:

{
  "url": "https://api.datadoghq.com/api/v2/series",
  "name": "datadog",
  "headers": {
    "DD-API-KEY": "<your-datadog-api-key>"
  }
}

The DD-API-KEY header is configured via the headers map on the REST API (PATCH /metrics/config or POST /metrics/deploy). Because the API key travels in a header, set it through the API so it is stored as a write-only secret and is not echoed back.

Confirm the exact intake URL for your Datadog site under Datadog → Integrations → APIs / Remote Write; intake paths occasionally change between agent/intake versions.

Choosing a local retention window

Once a central store is the system of record, you usually shrink the local Prometheus retention so the ChainLaunch host isn't holding months of data twice:

A 7–30 day local window covers in-app dashboards and short-term debugging.
The central store keeps the long-term, downsampled history.

Set this with retention_time (and optionally retention_size as a disk safety cap):

curl -X PATCH -u "$CHAINLAUNCH_USER:$CHAINLAUNCH_PASSWORD" \
  "$CHAINLAUNCH_API_URL/metrics/config" \
  -H "Content-Type: application/json" \
  -d '{"retention_time": "15d", "retention_size": "20GB"}'

Retention changes apply on the next Prometheus start/restart; remote_write and external_labels changes are applied via a live reload when Prometheus is running.

Verifying the pipeline

Apply the config (PATCH /metrics/config, deploy, or terraform apply).

Check the persisted config is reflected (credentials will show redacted):

curl -s -u "$CHAINLAUNCH_USER:$CHAINLAUNCH_PASSWORD" \
  "$CHAINLAUNCH_API_URL/metrics/defaults" | jq '{remote_write, external_labels, retention_time, retention_size}'

In your external store, query for the ChainLaunch up metric filtered by your external_labels (e.g. up{cluster="fabric-prod-eu"}) — samples should start arriving within a scrape interval or two.
If nothing arrives, check the ChainLaunch host can reach the endpoint outbound, and look at Prometheus' own prometheus_remote_storage_* metrics for send failures.

External Observability (remote_write)

How it works

Configuring via the API

Fields

Example: PATCH remote_write + retention

Configuring via Terraform

Integration recipes

Grafana Mimir (self-hosted)

Grafana Cloud

AWS Managed Service for Prometheus (AMP)

Datadog

Choosing a local retention window

Verifying the pipeline

See also

Example: PATCH `remote_write` + retention