Fedora 41 Upgrade - Gone bad
Thursday, October 31. 2024
As scheduled to end of October 2024: Announcing Fedora 41 from Red Hat.
Such distro has mechanism to do in-place upgrade: Upgrading Fedora Linux Using DNF System Plugin. It is based on DNF System Upgrade.
Priciple is to run dnf system-upgrade download --releasever=41
, tap Yes couple times and then dnf system-upgrade reboot
. It works (mostly) and I have used such in-place upgrade many times on a VM running in Hetzner.
If you haven't read between the lines yet, let's state the obvious: I'm posting this in a scenario where everyhing didn't go as planned.
Imagine a virtual machine running in a data center far far away. There is interaction via SSH or if needed, a browser-based console can be used for dire needs. A failed update was indeed such.
41 Upgrade Begins - And Fails
Simultaneously with dnf system-upgrade reboot
, I start sending ICMP echo requests to my VM to see the point in time when it begins pinging me back. This is a clear indication of upgrade being finished. I waited roughly 20 minutes without a response. Such a long time is an obvious indicator of a problem. Subsequently I logged in Hetzner's portal to pop open a console. Console showed me an upgraded system in the middle of a reboot-cycle. Stuck doing nothing.
That being unexpected, I chose to go for a Ctrl-Alt-Del. My wish came trough, a request to reboot nicely recycled the stuck system and a login-prompt greeted me on the console. Still, ping didn't. On the console, the only single keyboard layout made available is hard-coded ANSI US. On hardware, all my keyboards have layout ISO Finnish. That makes those elusive often used characters like dash (-), slash (/), pipe (|) or colon (:) be in very very different places slowing the entire process.
On the Console - Missing Package
Poking around the system on console indicated an upgraded VM. Everything else on the system was nice & peachy besides networking. There was no IP-addresses assigned. Actually entire NetworkManager was missing from the system. It did not exist. At all! Also every single bit of configuration at /etc/NetworkManager/
was absent.
Transferrring the muich-needed RPM-package NetworkManager-1.50.0-1.fc41 by eyeballing a rather dumb virtual console is fruitless. A quick analysis of the thing ended with a harsh one: it doesn't support any sensible means of transmitting files. Receiving any sent data with copy/paste or any other low-level means was absent. Something else was needed.
The Fix - Scraping
I opted to fix the network by hand. ip
-command was installed in the system and it worked perfectly. That's all I needed. Or, almost all.
In my infinite wisdom, I didn't have any of the IP-details at hand. I reasoned to myself the system upgrade having worked flawlessly multiple times before this. I didn't NEED to save IPv4 or IPv6 -addresses, their routing setup or DNS-resolvers. I knew from tinkering with these boxes that on x86-64 architecture Hetzner VMs all those details are static, manually set to Hetzner-assigned values. Their modern setup on Arm v8 does utilize DHCP for IPv4. My box was on a traditional rack and I couldn't count on automation to assist on this one.
Scraping all the bits and pieces of information was surprisingly easy. My own DNS-records helped. After the fact, I realized a shortcoming, if I would have looked at the bottom of the web-console, those IP-addresses would have been shown there. At the time I didn't. Routing defaults can be found from documentation such as Static IP configuration.
Now I knew what to set for the values.
The Fix - Manual Labor
Now the "fun" begun. I need to setup IPv4 address complete with routing to restore functionality of dnf
-command. This would make it possible to install NetworkManager to get nmcli
-command back.
Sequence is as follows:
ip addr add 192.0.2.1/32 dev eth0 ip route add 172.31.1.1 dev eth0 src 192.0.2.1 ip route add default via 173.31.1.1 src 192.0.2.1
Btw. see RFC5737 for IPv4-addresses and RFC3849 for IPv6-addresses reserved for documentation. I'm not going to post my box's coordinates here.
Fedora DNS-setup is via systemd-resolved, checking file /etc/systemd/resolved.conf
. File had survived the update intact. It still had the content of:
DNS=185.12.64.1 185.12.64.2 2a01:4ff:ff00::add:1
A perfect & valid solution.
The Fix - Managing Network
Ta-daaa! Ping worked. dnf
worked! Everything worked! The joy!
At this point running dnf install NetworkManager
wasn't much. Trying to figure out what was wrong proved to be a headache.
On initial glance nmcli conn show
:
NAME UUID TYPE DEVICE eth0 12345678-1234-1234-1234-123456789abc ethernet --
What!? Why isn't my eth0-connection associated with a device! No amount of attempts, tinkering, cursing nor yelling helped. I could not associate a device with the connection. My only option was to drop the hammer on the thing: nmcli conn del eth0
Now my eth0 didn't work as it didn't exist. A delete made sure of it. Next, getting it back:
nmcli conn add type ethernet ifname eth0 con-name eth0 ipv4.method manual ipv4.addr 192.0.2.1 nmcli conn modify eth0 ipv4.gateway 172.31.1.1 nmcli conn modify eth0 ipv6.addr 2001:db8::1/64 nmcli conn modify eth0 ipv6.gateway fe80::1
Final twist was to make those changes effective: nmcli device reapply eth0
IPv6 begun operating, IPv4 was unchanged. Everything was where it should have been after the upgrade.
That was it for NetworkManager, moving on.
Outcome
The only true measure of a possible success is a system reboot. If my tinkering survived a normal cycle, then all was good. Nothing special to report on that. Everything stuck and survived a rinse-cycle. File eth0.nmconnection
stored NetworkManager configs as expected.
Why this thing exploded remains unknown. Missing any critical pieces of a system is always a disaster. Surviving this one with very little damage was lucky. I may need to increase my odds and re-image the VM. My guess is, this box survives only so-and-so many upgrades. I may have used all of the lives it has.
Azure and Friends Tampere #T07
Friday, October 18. 2024
My employer opted to host a meetup. As they needed somebody to give a presentation there, obviously, I stepped up.
Thanks for all the participants!
For those interested, my presentation on Microsoft Fabric Real-Time Intelligence.
PostgreSQL 17 upgraded into Blog
Monday, October 14. 2024
On 26th of September, PostgreSQL Global Development Group announced the release of version 17.
Here is an easy one: Can you guess at which point I made the upgrade?
The slope is a maintenance break. Datadog wasn't measuring HTTP-performance while I was tinkering to make the actual upgrade.
What worries me is the performance being itsy-bitsy worse with version 17. Graph is smooth as silk. However, crunching the numbers to smooth the zig-zag, 16 seems to have better performance on average. Difference isn't big, but it is there. Maybe I'm missing a new setting to improve cache performance or something?