In my homelab experiments, I wanted to build something closer to production-grade high availability (HA) — but using only tiny NanoPi NEOs in my homelab and the travel router

The question was simple: Could I make services survive reboots, crashes, or node failures?
The answer: yes, with Corosync and Pacemaker.


What Is Corosync?

Corosync is the messaging layer of the cluster. It’s responsible for cluster membership: who’s online, who’s offline, and who is the leader. It uses a reliable communication protocol so all nodes can agree on the current state of the cluster.

Think of Corosync as the heartbeat and gossip system — constantly checking if everyone is alive and spreading that information quickly.


What Is Pacemaker?

Pacemaker is the cluster resource manager. It decides what runs where, ensures resources are only active on one node at a time, and manages failover when something goes wrong.

If Corosync is the cluster’s nervous system, Pacemaker is the brain that makes decisions.


The Lab Cluster

I built a three-node cluster on NanoPi NEOs running Debian Bookworm. Each node ran Corosync, Pacemaker, and the cluster management service.

Together, they formed a small but fully functional HA cluster.


The Resources

I kept it minimal but practical:
- A floating IP address so clients always connect to the active node.
- An nginx service colocated with the VIP.

When you connect to the cluster’s IP, Pacemaker ensures nginx is serving you, no matter which NanoPi is actually hosting it.


The First Success

After the cluster came online, I was able to see the VIP and nginx both active on one node. Pacemaker had elected a controller, confirmed quorum, and placed the resources where it saw fit.

Seeing the cluster make decisions on its own was the “aha!” moment that made the work worthwhile.


Lessons Learned

Fencing and STONITH

In production, fencing is essential — it guarantees that only one node ever controls a resource by powering off misbehaving nodes. In my homelab, I chose to disable it, since I don’t have IPMI or PDUs to manage power remotely.

That means I give up protection against split-brain, but for a simple VIP and nginx service the risk is acceptable.


Constraints

I learned that constraints are important to make sure resources stay together. nginx must always run alongside the VIP, and the VIP should come up before nginx starts. Defining these rules makes the cluster more predictable.


Testing Failover

Putting a node into standby instantly moved nginx and the VIP to another host. It was a simple but powerful way to watch high availability in action.


Split-Brain Concerns

Without fencing, a network partition could result in two nodes both running the VIP. For a stateless web server, this would just cause some confusion on the network. For a database or shared storage, it would be a disaster.

This highlighted why fencing is so critical in production environments.


Homelab Takeaways

  • Corosync provides the cluster heartbeat and membership.
  • Pacemaker is the brain that manages resources and failover.
  • NanoPi NEOs are surprisingly capable of running a proper HA stack.
  • Disabling fencing makes things easier in a lab, but it does mean living without guaranteed consistency.

Next Steps

  • Explore watchdog-based fencing, so the NanoPis can reboot themselves if needed.
  • Try adding a stateful resource, like Valkey or a lightweight database, to push the cluster harder.
  • Continue building out the Survival Computing lab to prove that redundancy and resilience can exist even on small, low-power devices.

Corosync and Pacemaker turned three little NanoPis into a reliable HA cluster — and that’s exactly the kind of survival computing I set out to prove possible.