12 PRs in 4 Days: How We Hardened the Hyperledger Bevel Operator Fabric

March 30, 2026 (3mo ago)

David Viejo

@davidviejodev

Rate this

Sometimes a project needs a focused push. Not a new feature, not a rewrite -- just a week of relentless quality work. That's what happened this past week on the Bevel Operator Fabric. Between March 27 and March 30, we merged 12 pull requests covering security, correctness, testing, and infrastructure. Here's what changed and why it matters.

I created the Bevel Operator Fabric under the Hyperledger Foundation and have maintained it for over six years. This hardening sprint was the most concentrated quality push the project has seen -- and the codebase is measurably better for it.

Kubernetes operator for Fabric

TL;DR: We resolved 12 CVEs in Go and npm dependencies, fixed 4 Gateway API namespace bugs that broke routing for peers and orderers, patched resource leaks in long-running operator pods, added 54 new unit tests across 7 files, and removed the deprecated kube-rbac-proxy sidecar -- all in 4 days across PRs #299-#310.

What Triggered This Sprint?

Dependabot had been flagging vulnerabilities for weeks. A few community-reported bugs sat open. And the codebase had zero test coverage on several controllers. None of these were emergencies on their own, but together they painted a picture: the project needed a hardening pass.

We blocked out four days, triaged the backlog, and went after it systematically. Security first, then correctness bugs, then testing, then infrastructure.

This is the same approach we use at ChainLaunch for our own infrastructure -- prioritize security, then correctness, then coverage. The order matters because each phase builds confidence for the next.

How Did We Address 12 CVEs in One Week?

The dependency audit revealed 12 CVEs across Go modules and npm packages. The most critical was an authorization bypass in google.golang.org/grpc (upgraded from v1.67.0 to v1.79.3). We also patched arbitrary code execution via PATH hijacking in the OpenTelemetry SDK, privilege escalation in Docker and containerd, and memory exhaustion vectors in Helm.

On the npm side, the docs website had accumulated 290 vulnerabilities in transitive dependencies. We upgraded Docusaurus from 3.8.1 to 3.9.2 and added yarn resolutions for node-forge, minimatch, webpack, and serialize-javascript. The audit went from 290 vulnerabilities to zero.

Beyond dependencies, we ran gosec across the entire codebase. File permissions were tightened from 0644/0777 to 0600/0750 wherever secrets or sensitive output was written. Integer conversion overflows were addressed with bounds-checked helper functions. These changes landed in PRs #301, #302, and #305.

key management best practices

Our approach: We didn't just bump versions and hope. Each CVE was mapped to its advisory, the upgrade was tested against the operator's integration suite, and gosec was configured to lint only new issues so we could ship incrementally without drowning in pre-existing findings.

Category	CVEs Fixed	Key Upgrades
gRPC	1 (CRITICAL)	v1.67.0 to v1.79.3 -- authorization bypass
OpenTelemetry	1 (HIGH)	PATH hijacking in SDK
Docker/containerd	2	Privilege escalation vectors
Helm	1	Memory exhaustion
npm (docs site)	7	Docusaurus 3.8.1 to 3.9.2, node-forge, webpack
File permissions	--	0644/0777 tightened to 0600/0750

Free resource

3 Config Mistakes That Break Fabric Networks in Production

The checklist our team uses before every Fabric deployment — covers peer gossip, orderer raft settings, and CA enrollment traps that cause 80% of production failures.

No spam. Unsubscribe anytime.

Book a Demo

What Gateway API Namespace Bugs Were Hiding?

Four separate namespace assignment bugs were silently breaking Gateway API routing. They'd gone unnoticed because most users set namespaces explicitly -- the bugs only surfaced when relying on defaults.

The peer controller (#309) had a copy-paste error: when GatewayApiNamespace was empty, the fallback assigned to gatewayApiName instead of gatewayApiNamespace. This overwrote the gateway name with "default" and left the namespace empty.

The ordnode controller (#308) had the same bug plus a second one: AdminGatewayApi was reading its namespace from spec.GatewayApi instead of spec.AdminGatewayApi.

The admin TLSRoute template (#310) referenced gatewayApi.gatewayNamespace when it should have used adminGatewayApi.gatewayNamespace.

The CA controller (#307) never propagated the IngressGateway field to Helm chart values, so setting --istio-ingressgateway on CA creation had no effect.

All four were one-line or two-line fixes. They're the kind of bugs that make you appreciate the value of default-path testing.

deploying Fabric networks

What Resource Leaks Were Fixed?

PR #303 tackled two resource leaks and a namespace hardcoding issue that had been open for months:

Temp file leak (issue #252): The GetClient function created temporary directories for certificate files but never cleaned them up. In long-running operator pods, this caused filesystem bloat. The fix adds a cleanup function that callers must defer.

HTTP connection leak (issue #141): The fop export command wasn't closing HTTP response bodies, leading to "Unexpected EOF" errors under sustained load. A defer res.Body.Close() fixed it.

Hardcoded namespace (issues #296, #297): The mainchannel controller was falling back to "default" namespace for configmaps instead of using the resource's own namespace. We replaced the hardcoded string with ObjectMeta.Namespace -- except for FabricMainChannel, which is cluster-scoped and correctly needs the "default" fallback.

Each fix came with a regression test. That's non-negotiable for bugs that stayed open this long.

monitoring blockchain infrastructure

Why Make CA User Registration Idempotent?

PR #304 fixed a paper-cut that frustrated every new user: re-running a CA user registration command would fail if the user already existed. Tutorials broke on retry. Scripts needed manual cleanup.

The fix is simple -- check if the user exists, print a message, and succeed. It closes issue #284 and makes the operator significantly more script-friendly.

Why this matters: Idempotency isn't a feature, it's a requirement for any Kubernetes operator. If your custom resources can be reconciled multiple times, every side effect in the reconciliation loop must handle being called again without error. This registration command was the last non-idempotent operation in the CA workflow.

This is something we enforce strictly in ChainLaunch as well -- every API operation and infrastructure action must be safely retriable. In blockchain infrastructure, where certificate authorities and cryptographic materials are involved, a failed-then-retried registration that creates duplicate identities can cause real production incidents.

Free resource

3 Config Mistakes That Break Fabric Networks in Production

The checklist our team uses before every Fabric deployment — covers peer gossip, orderer raft settings, and CA enrollment traps that cause 80% of production failures.

No spam. Unsubscribe anytime.

Book a Demo

How Did We Expand Test Coverage?

We added 54 new test cases across 7 controller files that previously had zero coverage:

Controller	Tests Added	What's Covered
followerchannel	6	Input validation
console	10	Validation + GetConfig mapping
chaincode/install	3	CCAAS package generation
chaincode/deploy	8	Resource naming
mainchannel	10	Validation + ConfigValidator refactor
CA (regression)	17	DNS names, IP extraction, cert renewal
Integration suite	2	GinkgoRecover + metrics server conflict

The CA regression tests are particularly valuable -- they cover GetDNSNames (extracting DNS names from mixed host/IP lists), GetIPAddresses (ensuring localhost is always included), and DoesCertNeedsToBeRenewed (certificate renewal detection). These were the exact functions involved in past production incidents.

The integration suite also got a fix: GinkgoRecover was added to the manager start goroutine so test failures report the actual error instead of panicking the whole process, and the metrics server bind address was set to "0" to prevent port 8080 conflicts.

To run the new test suite:

# Unit tests
go test ./controllers/... -short
 
# Integration suite (requires KIND cluster)
make test-e2e

What Changed in the Infrastructure?

PR #299 -- a community contribution from @Mau-MR -- removed the deprecated kube-rbac-proxy sidecar container. The metrics server now uses controller-runtime's native authentication and authorization for incoming scrape requests. This simplifies the deployment spec, reduces the container count, and eliminates a dependency that was no longer maintained upstream.

We also overhauled the CI pipeline in PR #300: golangci-lint was upgraded to v1.64.8, configured to lint only new issues, and stripped down to essential linters (errcheck, govet, staticcheck, gofmt, gosec, misspell). The previous config had over 1,000 pre-existing lint findings that made CI useless -- now it catches real issues without noise.

Lesson learned: Don't try to fix every existing lint issue before enabling a linter. Use only-new-issues: true and tighten incrementally. A linter that's always red gets ignored. A linter that's green and catches new regressions gets respected.

This same principle applies to any blockchain deployment tooling -- incremental improvement beats perfection-or-nothing.

The Full Sprint in Numbers

Here's what we tackled across PRs #299-#310:

12 CVEs resolved across Go and npm dependencies
4 Gateway API bugs fixed in peer, ordnode, and CA controllers
3 resource leaks patched (temp files, HTTP connections, namespace hardcoding)
54 new test cases across 7 previously untested controller files
1 deprecated sidecar removed (kube-rbac-proxy)
290 to 0 npm vulnerabilities in the docs site

The codebase is measurably more secure, more correct, and more tested than it was a week ago. Next up: expanding integration test coverage for the Gateway API paths, adding cert-manager integration tests, and working toward the next stable release.

Frequently Asked Questions

Were any of the CVEs actively exploited?

None of the 12 CVEs had known active exploits against the Bevel Operator specifically. However, the gRPC authorization bypass (CRITICAL) and the OpenTelemetry PATH hijacking (HIGH) both had public proof-of-concept code, making patching urgent regardless of whether exploitation had been observed in the wild.

Can I upgrade without breaking changes?

Yes. All changes in PRs #299-#310 are backward-compatible. The kube-rbac-proxy removal (#299) requires updating RBAC manifests if you deploy manually, but the Helm chart handles this automatically. The Gateway API fixes only correct behavior that was already broken.

How do I run the new test suite?

Run go test ./controllers/... -short for unit tests. The integration suite requires a running KIND cluster and is triggered via make test-e2e. All 54 new tests are included in the standard go test run.

Does the CA idempotent registration work with existing deployments?

Yes. If you upgrade the operator, the register command will detect already-registered users and succeed without modification. No migration is needed.

How does this relate to ChainLaunch?

ChainLaunch uses the Bevel Operator Fabric as one of its supported deployment backends for Kubernetes-based Fabric networks. These fixes improve reliability for anyone using the operator directly or through ChainLaunch's Kubernetes deployment mode. The security and correctness improvements flow through to all downstream consumers.

If you're using the Bevel Operator Fabric, update to the latest main branch to pick up all of these fixes. And if you've been hitting any of the bugs mentioned here -- they're gone now.

For teams that want managed Fabric infrastructure without maintaining operators themselves, ChainLaunch handles all of this -- node provisioning, key management, monitoring, and upgrades -- through a single control plane with a free self-hosted tier.

David Viejo is the founder of ChainLaunch and a Hyperledger Foundation contributor. He created the Bevel Operator Fabric project and has been building blockchain infrastructure tooling since 2020.

Free resource

3 Config Mistakes That Break Fabric Networks in Production

The checklist our team uses before every Fabric deployment — covers peer gossip, orderer raft settings, and CA enrollment traps that cause 80% of production failures.

No spam. Unsubscribe anytime.

Book a Demo

Fabric-X Insurance Demo: 5-Org P&C Reinsurance in 10 Minutes

Run a working Fabric-X insurance demo: InsurerA, InsurerB, Reinsurer, Broker, Regulator + token-sdk-x private settlement. Step-by-step with chaindeploy.

chaindeploy v0.4.0: Fabric-X is Now Generally Available

chaindeploy v0.4.0 promotes Fabric-X out of beta. One-click quickstart from the web UI or CLI, two MSP topologies, namespaces, postgres-backed queries, and a linux-arm64 binary — all in a single Apache 2.0 Go binary.

Install ChainLaunch on a Hetzner VPS with deploy.sh

Install ChainLaunch on a Hetzner VPS in 4 minutes with deploy.sh — a 6-step wizard for Docker, TLS, and systemd. Includes local macOS/Linux path.

Stay sharp on enterprise blockchain.

We publish when we have something worth saying — tutorials, cost breakdowns, and production lessons from real deployments.

Work email only · Unsubscribe anytime

From the founder

Skip weeks of setup — get to production in minutes.

Most teams spend weeks on infrastructure before writing a single line of business logic. Book a call and I'll show you how ChainLaunch cuts that to minutes — and whether it's the right fit for your project.