ChainLaunch

Troubleshooting Guide

This guide helps you diagnose and resolve common issues with ChainLaunch.

This guide helps you diagnose and resolve common issues with ChainLaunch.

Common Issues

Node Won't Start

Symptoms:

  • Node status remains "Stopped" after clicking start
  • Error message in logs

Diagnosis:

  1. Check node logs: Go to the nodes list, and enter the node that is failing. Then at the bottom check the logs.

  2. Check system resources:

    • Verify CPU and memory available
    • Check disk space: df -h
  3. Check port availability:

    # Linux/macOS
    lsof -i :30303  # P2P port
    lsof -i :8545   # JSON-RPC port

Solutions:

Error Message Solution
Port already in use Change node port in configuration or kill process using port
Insufficient disk space Free up disk space or mount larger volume
Permission denied Check file permissions on node data directory
Connection refused Check firewall rules and network connectivity
Out of memory Increase memory allocation or reduce node count

Nodes Not Discovering Each Other

Symptoms:

  • Block height not increasing
  • Peer count = 0
  • Consensus not starting (Fabric/Besu)

Diagnosis:

  1. Check peer count:

    # For Besu via RPC
    curl -X POST http://localhost:8545 \
      -H "Content-Type: application/json" \
      -d '{
        "jsonrpc": "2.0",
        "method": "net_peerCount",
        "params": [],
        "id": 1
      }'
  2. Check network connectivity between nodes:

    # Test if node A can reach node B
    ping node-b-ip
    telnet node-b-ip 30303
  3. Check enode URLs (Besu):

    # Get node's enode
    curl -X POST http://localhost:8545 \
      -H "Content-Type: application/json" \
      -d '{
        "jsonrpc": "2.0",
        "method": "admin_nodeInfo",
        "params": [],
        "id": 1
      }'

Solutions:

Issue Solution
Firewall blocking ports Open P2P port (30303) and RPC port (8545) in firewall
Nodes on different networks Verify genesis block hash matches across nodes
Bootnode not running Start bootnode and configure its address in other nodes
DNS not resolving Use IP addresses instead of hostnames
Network policy restricting traffic Review Kubernetes network policies or security groups

Backup Issues

Symptoms:

  • Backup job fails with "connection refused" or "access denied"
  • Scheduled backups not running
  • Restore operation fails

Diagnosis:

  1. Check backup target configuration:

    curl http://localhost:8100/api/v1/backups/targets
  2. Verify S3 connectivity:

    # Test S3 access with AWS CLI
    aws s3 ls s3://your-backup-bucket/
  3. Check backup status:

    curl http://localhost:8100/api/v1/backups

Solutions:

Issue Solution
Access Denied on S3 Verify IAM credentials and bucket policy allow read/write access
NoSuchBucket Ensure the S3 bucket exists and the region is correct
RequestTimeTooSkewed Sync the system clock (ntpdate or timedatectl)
Scheduled backup not triggered Verify the backup schedule is active and the cron expression is valid
Restore fails with checksum error The backup may be corrupted; try restoring from a different snapshot
credential not found Re-configure the backup target with valid credentials (static, instance role, or named profile)

Monitoring Issues

Symptoms:

  • Prometheus metrics endpoint not responding
  • Grafana dashboards show "No data"
  • Node metrics not updating

Diagnosis:

  1. Check if monitoring is enabled:

    curl http://localhost:8100/api/v1/settings
  2. Test the Prometheus metrics endpoint directly:

    curl http://localhost:9090/metrics
  3. Verify node metrics are exposed:

    # For Besu nodes
    curl http://localhost:9545/metrics
     
    # For Fabric peers
    curl http://localhost:9443/metrics

Solutions:

Issue Solution
Prometheus not starting Check that port 9090 is not in use; verify Prometheus configuration file is valid
Metrics endpoint returns 404 Ensure monitoring was enabled when creating the node
Dashboards show "No data" Verify Prometheus is scraping the correct targets; check the time range in Grafana
High cardinality warnings Reduce the number of custom labels or increase Prometheus memory
Metrics stop updating Restart the node; check if the process is still running

Authentication Issues

Symptoms:

  • Login returns 401 Unauthorized
  • Session expires unexpectedly
  • API key not accepted

Diagnosis:

  1. Test authentication:

    # Basic auth
    curl -u admin:password http://localhost:8100/api/v1/nodes
     
    # API key
    curl -H "X-API-Key: clpro_..." http://localhost:8100/api/v1/nodes
  2. Check server logs for auth errors: Look for authentication failed or token expired messages in the ChainLaunch server logs.

Solutions:

Issue Solution
401 Unauthorized with correct credentials Ensure you are using the correct authentication method (Basic Auth vs API Key)
Session expires too quickly Check session timeout configuration in settings
API key rejected Verify the key has not been revoked; regenerate if necessary
OIDC login fails Verify the OIDC provider configuration (issuer URL, client ID, client secret)
403 Forbidden The authenticated user lacks the required RBAC role (ADMIN, OPERATOR, or VIEWER) for the requested resource

Besu Node Issues

Symptoms:

  • Besu validator not producing blocks
  • Consensus stalled across the network
  • Peers not connecting to the network

Diagnosis:

  1. Check peer count:

    curl -X POST http://localhost:8545 \
      -H "Content-Type: application/json" \
      -d '{
        "jsonrpc": "2.0",
        "method": "net_peerCount",
        "params": [],
        "id": 1
      }'
  2. Check sync status:

    curl -X POST http://localhost:8545 \
      -H "Content-Type: application/json" \
      -d '{
        "jsonrpc": "2.0",
        "method": "eth_syncing",
        "params": [],
        "id": 1
      }'
  3. Check latest block number:

    curl -X POST http://localhost:8545 \
      -H "Content-Type: application/json" \
      -d '{
        "jsonrpc": "2.0",
        "method": "eth_blockNumber",
        "params": [],
        "id": 1
      }'
  4. Check node status via ChainLaunch API:

    curl http://localhost:8100/api/v1/nodes

Solutions:

Issue Solution
Validator not producing blocks Ensure the validator key is correctly configured and the node is part of the validator set
Consensus stalled Check that a majority of validators are online; IBFT 2.0 / QBFT requires 2/3+1 validators
Genesis block mismatch All nodes must use the same genesis file; re-initialize nodes with the correct genesis
Peer discovery not working Verify the bootnode enode URL is correct and reachable; check that the P2P port (30303) is open
Invalid block errors Check that all validators are running the same Besu version
High memory usage Increase the JVM heap size (-Xmx) or enable pruning to reduce state storage
RPC endpoint not responding Verify RPC is enabled (--rpc-http-enabled) and the host/port settings are correct

Fabric-X Issues

Fabric-X has a few platform-specific failure modes that don't apply to classic Fabric. The Fabric-X Quickstart troubleshooting table covers the most common ones; here's the summary:

Symptom Likely cause Fix
Quickstart phase 5 (join) times out on the first node Cold Docker Desktop bind-mount cache Retry the failing node individually — subsequent ones will be fast once the cache is warm. The default per-node timeout is 240s.
dial ... context deadline exceeded on namespace create Fabric-X local-dev mode not enabled (macOS / Windows Docker Desktop) Set CHAINLAUNCH_FABRICX_LOCAL_DEV=true on the server process and recreate the network — addresses get rewritten to host.docker.internal so containers can dial the host.
invalid mount config ... bind source path does not exist Cold Docker Desktop bind-mount cache Same as the join timeout — retry.
Port already in use on join Another Fabric-X network or service on the same host Run with a different --base-port band, or --clean to wipe the prior bundle.
Network status stuck at genesis_block_created Normal — the network row's status field doesn't transition to ACTIVE. Container readiness is reported on each node row instead. No action needed. Check Nodes filtered by platform FABRICX for per-node status.
Stale TLS certs after re-running --clean Bind-mount data outlives the container Re-run with --data-path /path/to/server/data so --clean also purges fabricx-orderers/ and fabricx-committers/ directories.

For deeper Fabric-X diagnostics see Fabric-X Architecture (port layout and component data flow) and Fabric-X Monitoring (Prometheus /metrics endpoints per role).

See Also