Recent Power outage in my Homelab

Due to recent snow storms, I had a power outage at home. How did my servers handle it?

Posted by Muddy on January 09, 2024

Recent Weather in New England

As some of you may have heard, here in New England, we have been getting some snow and some rain (more recently today) and this has casued some power flickers here in my homelab. Of course I have a UPS for my servers and network. And yes I do have an EcoFlow Delta Max if I got really desprite for power. But overall, how did everything work out?

What UPS units do I have?

I have a total of 3 UPS units that run my critical workloads. 2 are located in my server rack, and one is housed in my electric/network closet across the basement. This allows me to keep the network online, while the server rack goes down. It runs my critical workloads, using the Beelink Box I picked up last month :)

Here is what is on those units for loads:

  1. CyberPower 2200VA Rack Mount UPS
    1. Powers all 3 4U Proxmox nodes in the rack, including my GPU VM for gaming
    2. Aruba 24-port (10gb) switch
  2. APC 700VA Rack Mount UPS
    1. Synology NAS (8-Bay, runs my Plex and backups for the whole k3s and Proxmox clusters)
    2. Unifi 16-port POE switch (used for managment and essensial services like IOT)
  3. APC 750VA (old style, still works great!)
    1. Router (pfsense, running on a mini PC)
    2. Fiber converter (I have fiber to the home)
    3. Beelink Mini PC (Home assistant, GitLab server and runner, Proxmox)
    4. Unifi 8-Port POE (Provides connectivity for the server rack and rest of network)

How did my servers handle it?

Overall, great! I was actually sleeping but I went back through the logs and saw it was about a 1.5hr outage. Not a ton of time, but it was enough to drain UPS units in the rack.

The network closet stayed on the whole time! Amazing! So I had wifi, even durring the power outage! And my Beelink box was included in that! Which is crazy, considering how much it runs... Home-Assistant was avalible the whole time (which is good to keep an eye on the door locks, ect.)

The CyberPower UPS (with the most load on it) ran for about 45 mins, no problems at all. After that, it informed each proxmox node of the shutdown and all vms were gracefully stopped (even k3s nodes!) Nice! NUT (network UPS tools is to thank for this)

APC 700VA did ok as well! It was not loaded at all (around 10% load with the Synology and switch, which is nothing at all) it ran for about 45 mins as well. After that, the Synolgy went down gracefully.

What happened when power came back?

Everything started right back up! Proxmox came online, the k3s nodes came online, Longhorn started a repair, and the Synology came online.

Thanks to the network closet for staying online, the network was ready to go for when the servers came online. Most use static IPs, but even just being able to get out to the internet, it was super helpful to be able to pull my Docker images right away, no waiting, and everthing came right back up!

What do I need to improve?

Honestly, I don't think anything... Right now, this does what I want it to do.

The only complaint I have, which is not specific to the UPS or power outages, is that I wish Longhorn restored faster... I am certain it is just my hardware or config, but just something I needs to poke at.

Overall I was very impressed and happy I didn't have to spend the day rebuilding the k3s cluster and redeploying everything, not that it takes a long time :D ( I automated this using GitLab CI/CD). Still though, having solid software is key here, and I think K3s has shown why it was worth the switch from MicroK8s, despite MicroK8s being named "Production Grade".