Trouble

Replacement

Finally

Details of the trouble

Data Recovery

Going Mechanical

New SMART

Requirements

Scope / Target daemon

Addressing the list of requirements

Final words

SSD Trouble - Replacement of a tired unit

Arch Linux 6.15.5 upgrade fail

Wifi-6 USB on a Linux - Working!

Confessions of a Server Hugger - Fixing a RAID Array

Fedora 41 Upgrade - Gone bad

41 Upgrade Begins - And Fails

On the Console - Missing Package

The Fix - Scraping

The Fix - Manual Labor

The Fix - Managing Network

Outcome

Wifi-6 USB on a Linux - BrosTrend AX1

Broadcom Wi-Fi in a Linux - Fedora 40 case

System update of 2024

Fedora Linux 39 - Upgrade

Nuvoton NCT6793D lm_sensors output

File 'repomd.xml' from repository is unsigned

Writing a secure Systemd daemon with Python

Systemd daemon

Python venv isolation

Asynchronous code

Security: Capabilities

Security: SElinux

RPM Packaging

MacBook Pro - Fedora 36 sleep wake - part 2

MacBook Pro - Fedora 36 sleep wake

ArchLinux - Pacman - GnuPG - Signature trust fail

Sunday, August 31. 2025

Thursday, July 31. 2025

Sunday, January 26. 2025

Sunday, January 12. 2025

Thursday, October 31. 2024

Thursday, August 29. 2024

Tuesday, August 27. 2024

Thursday, February 29. 2024

Sunday, November 12. 2023

Monday, July 3. 2023

Thursday, March 23. 2023

Sunday, March 5. 2023

Friday, September 30. 2022

Thursday, August 25. 2022

Wednesday, July 27. 2022

Operating multiple physical computers is a chore. Things do happen, especially at times when you don't expect any trouble. On a random Saturday morning, an email sent by a system daemon during early hours would look something like this:

The following warning/error was logged by the smartd daemon:

Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!

Device info:
SAMSUNG MZ7PC128HAFU-000L1, S/N:S0U8NSAC900712, FW:CXM06L1Q, 128 GB

For details see host's SYSLOG.

Aow crap! I'm about to lose data unless rapid action is taken.

Details from journalctl -u smartd:

Aug 30 00:27:40 smartd[1258]: Device: /dev/sda [SAT], FAILED SMART self-check. BACK UP DATA NOW!
Aug 30 00:27:40 smartd[1258]: Sending warning via /usr/libexec/smartmontools/smartdnotify to root ...
Aug 30 00:27:40 smartd[1258]: Warning via /usr/libexec/smartmontools/smartdnotify to root: successful

Then it hit me: My M.2 SSD is a WD. What is this Samsung I'm getting alerted about? Its this one:

Oh. THAT one! It's just a 2.5" S-ATA SSD used for testing stuff. I think I have a Windows VM running on it. If you look closely, there is word "FRU P/N" written in block letters. Also under the barcode there is "Lenovo PN" and "Lenovo C PN". Right, this unit manufactured in September 2012 was liberated from a Laptop needing more capacity. Then it ran one Linux box for a while and after I upgraded that box, drive ended up gathering dust to one of my shelves. Then I popped it back into another server and used it for testing.
It all starts coming back to me.

More details with parted /dev/sda print:

Model: ATA SAMSUNG MZ7PC128 (scsi)
Disk /dev/sda: 128GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End    Size    File system  Name                          Flags
 1      1049kB  106MB  105MB   fat32        EFI system partition          boot, esp, no_automount
 2      106MB   123MB  16.8MB               Microsoft reserved partition  msftres, no_automount
 3      123MB   127GB  127GB   ntfs         Basic data partition          msftdata, no_automount
 4      127GB   128GB  633MB   ntfs                                       hidden, diag, no_automount

Oh yes, Definitely a Windows-drive. Further troubleshooting with smartctl /dev/sda -x:

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  9 Power_On_Hours          -O--CK   090   090   000    -    47985
 12 Power_Cycle_Count       -O--CK   095   095   000    -    4057
177 Wear_Leveling_Count     PO--C-   017   017   017    NOW  2998
178 Used_Rsvd_Blk_Cnt_Chip  PO--C-   093   093   010    -    126
179 Used_Rsvd_Blk_Cnt_Tot   PO--C-   094   094   010    -    244
180 Unused_Rsvd_Blk_Cnt_Tot PO--C-   094   094   010    -    3788
190 Airflow_Temperature_Cel -O--CK   073   039   000    -    27
195 Hardware_ECC_Recovered  -O-RC-   200   200   000    -    0
198 Offline_Uncorrectable   ----CK   100   100   000    -    0
199 UDMA_CRC_Error_Count    -OSRCK   253   253   000    -    0
233 Media_Wearout_Indicator -O-RCK   198   198   000    -    195

Just to keep this blog post brief, above is a shortened list of the good bits. Running the command spits out ~150 lines of information on the drive. Walking through what we see:

Power on hours: ~48.000 is roughly 5,5 years.
- Since the unit manufacture of Sep -12 it has been powered on for over 40% of the time.
- Thank you for your service!
Power cycle count: ~4000, well ... that's a few
Wear level: ~3000. Or when processed 17. I have no idea what the unit of this would be or the meaning of this reading.
Reserve blocks: 126 reserve used, still 3788 unused.
- That's good. Drive's internal diagnostics has found unreliable storage and moved my precious data out of it into reserve area.
- There is still plenty of reserve remaining.
- The worrying bit is obvious: bad blocks do exist in the drive.
ECC & CRC errors: 0. Reading and writing still works, no hiccups there.
Media wear: 195. Again, no idea of the unit nor meaning. Maybe a downwards counter?

Yeah. Let's state the obvious. Going for the cheapest available unit is perfectly ok in this scenario. The data I'm about to lose won't be the most precious one. However, every single time I lose data, that's a tiny chunk stripped directly from my soul. I don't want any of that to happen.

A simple transfer time dd if=/dev/sda of=/dev/sdd:

250069680+0 records in
250069680+0 records out
128035676160 bytes (128 GB, 119 GiB) copied, 4586.76 s, 27.9 MB/s

real    76m26.771s
user    4m30.605s
sys     14m49.729s

Hour and 16 minutes later my Windows-image was on a new drive. I/O-speed of 30 MB/second isn't much. With M.2 I'm used to a whole different readings. Do note, the replacement drive has twice the capacity. As it stands, 120 GB is plenty for the use-ase.

Some assembly with Fractal case:

Four phillips screws to the bottom of the drive. Plugging cables back. That's a solid 10 minute job. Closing the side cover of the case and booting the server to validate everything still working as expected.

Doing a 2nd round of smartctl /dev/sda -x on the new drive:

Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAGS    VALUE WORST THRESH FAIL RAW_VALUE
  1 Raw_Read_Error_Rate     -O--CK   100   100   000    -    0
  9 Power_On_Hours          -O--CK   100   100   000    -    1
 12 Power_Cycle_Count       -O--CK   100   100   000    -    4
148 Unknown_Attribute       ------   100   100   000    -    0
149 Unknown_Attribute       ------   100   100   000    -    0
167 Write_Protect_Mode      ------   100   100   000    -    0
168 SATA_Phy_Error_Count    -O--C-   100   100   000    -    0
169 Bad_Block_Rate          ------   100   100   000    -    54
170 Bad_Blk_Ct_Lat/Erl      ------   100   100   010    -    0/47
172 Erase_Fail_Count        -O--CK   100   100   000    -    0
173 MaxAvgErase_Ct          ------   100   100   000    -    2 (Average 1)
181 Program_Fail_Count      -O--CK   100   100   000    -    0
182 Erase_Fail_Count        ------   100   100   000    -    0
187 Reported_Uncorrect      -O--CK   100   100   000    -    0
192 Unsafe_Shutdown_Count   -O--C-   100   100   000    -    3
194 Temperature_Celsius     -O---K   026   035   000    -    26 (Min/Max 23/35)
196 Reallocated_Event_Count -O--CK   100   100   000    -    0
199 SATA_CRC_Error_Count    -O--CK   100   100   000    -    131093
218 CRC_Error_Count         -O--CK   100   100   000    -    0
231 SSD_Life_Left           ------   099   099   000    -    99
233 Flash_Writes_GiB        -O--CK   100   100   000    -    173
241 Lifetime_Writes_GiB     -O--CK   100   100   000    -    119
242 Lifetime_Reads_GiB      -O--CK   100   100   000    -    1
244 Average_Erase_Count     ------   100   100   000    -    1
245 Max_Erase_Count         ------   100   100   000    -    2
246 Total_Erase_Count       ------   100   100   000    -    10512
                            ||||||_ K auto-keep
                            |||||__ C event count
                            ||||___ R error rate
                            |||____ S speed/performance
                            ||_____ O updated online
                            |______ P prefailure warning

Whoa! That's the fourth power on to a drive unboxed from a retail packaking. Three of them had to be in the manufacturing plant. Power on hours reads 1, that's not much. SSD life left 99 (I'm guessing %).

All's well. No data lost. Just my stress level jumping up.

My thinking is: If that new drive survives next 3 years running a Windows on top of a Linux, then it has served its purpose.

by Jari Turkia in Hardware, Linux at 13:22 | Comments (0) | Share in LinkedIn

On my Arch, I was doing the basic update with pacman -Syu. There was an announcement on linux-6.15.5.arch1-1. Nice!

Aaaaaand it failed.

(75/75) checking for file conflicts                [####################] 100%
error: failed to commit transaction (conflicting files)
linux-firmware-nvidia: /usr/lib/firmware/nvidia/ad103 exists in filesystem
linux-firmware-nvidia: /usr/lib/firmware/nvidia/ad104 exists in filesystem
linux-firmware-nvidia: /usr/lib/firmware/nvidia/ad106 exists in filesystem
linux-firmware-nvidia: /usr/lib/firmware/nvidia/ad107 exists in filesystem
Errors occurred, no packages were upgraded.

Oh. How unfortunate, that. How to get past that obstacle? I tried all kinds of pacman -S --overwrite "*" linux-firmware-nvidia and such, but kept failing. I was just getting error messages spat at me. That's a weird package as it contains number of subpackages, which in reality don't exist at all. Confusing!

The winning sequence was to first let the thing go pacman --remove linux-firmware and follow that with install pacman -S linux-firmware.

Before: Linux version 6.14.4-arch1-2
A reboot later: Linux version 6.15.5-arch1-1

Maybe that's why people think Arch isn't for regular users. Its only for nerds.

by Jari Turkia in Linux at 15:58 | Comments (2) | Share in LinkedIn

Last summer I wrote about an attempt to get 802.11ax / Wifi 6 to work on a Linux. Spoiler: It didn't.

A week ago, the author of many Realtek-drivers, Nick Morrow contacted me to inform of a new driver version for RTL8832BU and RTL8852BU Chipsets.

After ./install-driver.sh, the kernel module 8852bu is installed. Dmesg will still display product as 802.11ac WLAN Adapter, however incorrect information that will be. After couple of retries, I managed to get WPA3 authentication working.

Ah joy. The USB-stick works! Performs quite fast also.
Very quirky driver, still. I can't seem to get the thing working on every plugin. Need to try multiple times. Typical failure is "No secrets were provided" -error with "state change: need-auth -> failed (reason 'no-secrets', managed-type: 'full')" in message log. I have absolutely no idea why this is happening, the built-in Realtek works every time.

by Jari Turkia in Hardware, Linux at 17:52 | Comments (0) | Share in LinkedIn

I have to confess: I'm a server hugger. Everything is in cloud or going there. My work is 100% in the clouds, my home pretty much is not.

There are drawbacks.

5.58am, fast asleep, there is a faint beeping waking you. It's relentless and won't go way. Not loud one to alarm you on fire, but not silent one to convince you to go back to sleep. Yup. RAID-controller.

What I have is a LSI MegaRAID SAS 9260-4i. The controller is from 2013. Year later LSI ceased to exist by aquisition. Also the product is rather extinct, Broadcom isn't known for their end user support. As there is proper Linux-driver and tooling after 11 years, I'm still running the thing.

A trivial MegaCli64 -AdpSetProp -AlarmSilence -aALL makes the annoying beep go silent. Next, status of the volume: MegaCli64 -LDInfo -Lall -aALL reveals the source for alarm:

Adapter 0 -- Virtual Drive Information:
Virtual Drive: 0 (Target Id: 0)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 7.276 TB
Sector Size         : 512
Mirror Data         : 7.276 TB
State               : Degraded
Strip Size          : 64 KB
Number Of Drives    : 2

Darn! Degraded. Uh/oh. One out of two drives in a RAID-1 mirror is gone.

In detail, drive list MegaCli64 -PDList -a0 (for clarity, I'm omitting a LOT of details here):

Adapter #0

Enclosure Device ID: 252
Slot Number: 0
Drive's position: DiskGroup: 0, Span: 0, Arm: 1
Device Id: 7
PD Type: SATA

Raw Size: 7.277 TB [0x3a3812ab0 Sectors]
Firmware state: Online, Spun Up
Connected Port Number: 1(path0) 
Inquiry Data:             ZR14F8DXST8000DM004-2U9188                      0001    
Port status: Active
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : No


Enclosure Device ID: 252
Slot Number: 1
Drive's position: DiskGroup: 0, Span: 0, Arm: 0
Device Id: 6
PD Type: SATA

Raw Size: 7.277 TB [0x3a3812ab0 Sectors]
Firmware state: Failed
Connected Port Number: 0(path0) 
Inquiry Data:             ZR14F8PSST8000DM004-2U9188                      0001    
Port's Linkspeed: 6.0Gb/s 
Drive has flagged a S.M.A.R.T alert : No

For slots 0-3, the one connected to cable #1 is off-line. I've never go the idea why ports have different numbering to slots. When doing the mechanical installation with physical devices, it is easy to verify cables matching the slot numbers, not port numbers.

From this point on, everything became clear. Need to replace the 8 TB Seagate BarraCudas with a fresh pair of drives. Time was of the essence, and 6 TB WD Reds were instantly available.

To recovery!

New Reds where in their allotted trays. BarraCudas where on my floor hanging from the cables.
Btw. for those interested, case is Fractal Define R6. Rack servers are NOISY! and I really cannot have them inside the house.

Creating a new array: MegaCli64 -CfgLdAdd -r1 [252:2,252:3] WT RA Direct NoCachedBadBBU -a0. Verify the result: MegaCli64 -LDInfo -L1 -a0

Virtual Drive: 1 (Target Id: 1)
Name                :
RAID Level          : Primary-1, Secondary-0, RAID Level Qualifier-0
Size                : 5.457 TB
Sector Size         : 512
Mirror Data         : 5.457 TB
State               : Optimal
Strip Size          : 64 KB
Number Of Drives    : 2
Span Depth          : 1
Default Cache Policy: WriteThrough, ReadAhead, Direct, No Write Cache if Bad BBU
Current Cache Policy: WriteThrough, ReadAhead, Direct, No Write Cache if Bad BBU
Default Access Policy: Read/Write
Current Access Policy: Read/Write
Disk Cache Policy   : Disk's Default
Encryption Type     : None
Is VD Cached: No

To my surprise, the RAID-volume hot-plugged into Linux also! ls -l /dev/sdd resulted in a happy:

brw-rw----. 1 root disk 8, 48 Jan  5 09:32 /dev/sdd

Hot-plug was also visible in dmesg:

kernel: scsi 6:2:1:0: Direct-Access     LSI      MR9260-4i        2.13 PQ: 0 ANSI: 5
kernel: sd 6:2:1:0: [sdd] 11719933952 512-byte logical blocks: (6.00 TB/5.46 TiB)
kernel: sd 6:2:1:0: Attached scsi generic sg4 type 0
kernel: sd 6:2:1:0: [sdd] Write Protect is off
kernel: sd 6:2:1:0: [sdd] Write cache: disabled, read cache: enabled, supports DPO and FUA
kernel: sd 6:2:1:0: [sdd] Attached SCSI disk

Next up: Onboarding the new capacity while transferring data out of the old one. With Linux's Logical Volume Manager, or LVM, this is surprisingly easy. Solaris/BSD people are screaming: "It's sooooo much easier with ZFS!" and they would be right. Its capabilities are 2nd to none. However, what I have is Linux, a Fedora Linux, so LVM it is.

Creating LVM partition: parted /dev/sdd

GNU Parted 3.6
Using /dev/sdd
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mktable gpt    

(parted) mkpart LVM 0% 100%
(parted) set 1 lvm on                                                     
(parted) p
Model: LSI MR9260-4i (scsi)
Disk /dev/sdd: 6001GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name  Flags
 1      1049kB  6001GB  6001GB               LVM   lvm

(parted) q

With LVM, inform of a new physical volume: pvcreate /dev/sdd1

  Physical volume "/dev/sdd1" successfully created.
  Not creating system devices file due to existing VGs.

Extend the LVM volume group to the new physical volume: vgextend My_Precious_vg0 /dev/sdd1

Finally, inform LVM to vacate all data from degraded RAID-mirror. As VG has two PVs in it, this effectively copies all the data. On-the-fly. With no downtime. System running all the time. Command is: pvmove /dev/sdb1 /dev/sdd1

Such moving isn't fast. With time, the measured wallclock-time for command execution was 360 minutes. That's 6 hours! Doing more math with lvs -o +seg_pe_ranges,vg_extent_size, indicates PV extent size to be 32 MiB. On the PV, 108480 extents were allocated to VGs. That's 3471360 MiB in total. For 6 hour transfer, that's 161 MiB/s on average. To set that value into real World, my NVMe SSD benchmarks 5X faster on write. To repeat the good side: my system was running all the time, services were on-line without noticeable performance issues.

Before tearing down the hardware, final thing with LVM is to vacate broken array from VG: vgreduce My_Precious_vg0 /dev/sdb1 followed by pvremove /dev/sdb1.

Now that LVM was in The Desired State®, final command to run was to remove degraded volume from LSI: MegaCli64 -CfgLdDel -L0 -a0

To conclude this entire shit-show, it was time to shutdown system, remove BarraCudas and put case back together. After booting the system, annoying beep was gone.

Dust having settled, it was time to take a quick looksie on the old drives. Popping BarraCudas to a USB3.0 S-ATA -bridge revealed both drives being functional. Drives weren't that old, 2+ years of 24/7 runtime on them. Still today, I don't know exactly what happened. I only know LSI stopped loving one of the drives for good.

by Jari Turkia in Hardware, Linux at 19:55 | Comments (0) | Share in LinkedIn

As scheduled to end of October 2024: Announcing Fedora 41 from Red Hat.

Such distro has mechanism to do in-place upgrade: Upgrading Fedora Linux Using DNF System Plugin. It is based on DNF System Upgrade.

Priciple is to run dnf system-upgrade download --releasever=41, tap Yes couple times and then dnf system-upgrade reboot. It works (mostly) and I have used such in-place upgrade many times on a VM running in Hetzner.

If you haven't read between the lines yet, let's state the obvious: I'm posting this in a scenario where everyhing didn't go as planned.

Imagine a virtual machine running in a data center far far away. There is interaction via SSH or if needed, a browser-based console can be used for dire needs. A failed update was indeed such.

Simultaneously with dnf system-upgrade reboot, I start sending ICMP echo requests to my VM to see the point in time when it begins pinging me back. This is a clear indication of upgrade being finished. I waited roughly 20 minutes without a response. Such a long time is an obvious indicator of a problem. Subsequently I logged in Hetzner's portal to pop open a console. Console showed me an upgraded system in the middle of a reboot-cycle. Stuck doing nothing.

That being unexpected, I chose to go for a Ctrl-Alt-Del. My wish came trough, a request to reboot nicely recycled the stuck system and a login-prompt greeted me on the console. Still, ping didn't. On the console, the only single keyboard layout made available is hard-coded ANSI US. On hardware, all my keyboards have layout ISO Finnish. That makes those elusive often used characters like dash (-), slash (/), pipe (|) or colon (:) be in very very different places slowing the entire process.

Poking around the system on console indicated an upgraded VM. Everything else on the system was nice & peachy besides networking. There was no IP-addresses assigned. Actually entire NetworkManager was missing from the system. It did not exist. At all! Also every single bit of configuration at /etc/NetworkManager/ was absent.

Transferrring the muich-needed RPM-package NetworkManager-1.50.0-1.fc41 by eyeballing a rather dumb virtual console is fruitless. A quick analysis of the thing ended with a harsh one: it doesn't support any sensible means of transmitting files. Receiving any sent data with copy/paste or any other low-level means was absent. Something else was needed.

I opted to fix the network by hand. ip-command was installed in the system and it worked perfectly. That's all I needed. Or, almost all.

In my infinite wisdom, I didn't have any of the IP-details at hand. I reasoned to myself the system upgrade having worked flawlessly multiple times before this. I didn't NEED to save IPv4 or IPv6 -addresses, their routing setup or DNS-resolvers. I knew from tinkering with these boxes that on x86-64 architecture Hetzner VMs all those details are static, manually set to Hetzner-assigned values. Their modern setup on Arm v8 does utilize DHCP for IPv4. My box was on a traditional rack and I couldn't count on automation to assist on this one.

Scraping all the bits and pieces of information was surprisingly easy. My own DNS-records helped. After the fact, I realized a shortcoming, if I would have looked at the bottom of the web-console, those IP-addresses would have been shown there. At the time I didn't. Routing defaults can be found from documentation such as Static IP configuration.

Now I knew what to set for the values.

Now the "fun" begun. I need to setup IPv4 address complete with routing to restore functionality of dnf-command. This would make it possible to install NetworkManager to get nmcli-command back.

Sequence is as follows:

ip addr add 192.0.2.1/32 dev eth0
ip route add 172.31.1.1 dev eth0 src 192.0.2.1
ip route add default via 173.31.1.1 src 192.0.2.1

Btw. see RFC5737 for IPv4-addresses and RFC3849 for IPv6-addresses reserved for documentation. I'm not going to post my box's coordinates here.

Fedora DNS-setup is via systemd-resolved, checking file /etc/systemd/resolved.conf. File had survived the update intact. It still had the content of:

DNS=185.12.64.1 185.12.64.2 2a01:4ff:ff00::add:1

A perfect & valid solution.

Ta-daaa! Ping worked. dnf worked! Everything worked! The joy!

At this point running dnf install NetworkManager wasn't much. Trying to figure out what was wrong proved to be a headache.

On initial glance nmcli conn show:

NAME  UUID                                  TYPE      DEVICE
eth0  12345678-1234-1234-1234-123456789abc  ethernet  --

What!? Why isn't my eth0-connection associated with a device! No amount of attempts, tinkering, cursing nor yelling helped. I could not associate a device with the connection. My only option was to drop the hammer on the thing: nmcli conn del eth0

Now my eth0 didn't work as it didn't exist. A delete made sure of it. Next, getting it back:

nmcli conn add type ethernet ifname eth0 con-name eth0 ipv4.method manual ipv4.addr 192.0.2.1
nmcli conn modify eth0 ipv4.gateway 172.31.1.1
nmcli conn modify eth0 ipv6.addr 2001:db8::1/64
nmcli conn modify eth0 ipv6.gateway fe80::1

Final twist was to make those changes effective: nmcli device reapply eth0

IPv6 begun operating, IPv4 was unchanged. Everything was where it should have been after the upgrade.
That was it for NetworkManager, moving on.

The only true measure of a possible success is a system reboot. If my tinkering survived a normal cycle, then all was good. Nothing special to report on that. Everything stuck and survived a rinse-cycle. File eth0.nmconnection stored NetworkManager configs as expected.

Why this thing exploded remains unknown. Missing any critical pieces of a system is always a disaster. Surviving this one with very little damage was lucky. I may need to increase my odds and re-image the VM. My guess is, this box survives only so-and-so many upgrades. I may have used all of the lives it has.

by Jari Turkia in Linux at 23:02 | Comments (0) | Share in LinkedIn

My previous post was about 10+ year old laptops. At that time Broadcom ruled the chipset Wifi chipset market.

Since those days balance has shifted. Today, most common chipset for Wifi is Realtek. I also have couple Mediatek Wi-Fi chips working perfectly with Linux. To repeat what I said previously: These guys have their Linux support via open-source drivers. Broadcom doesn't. Hm. I dunno, maybe that's what made them decline and the other guy thrive? Most certainly, I wish it was their open-source support. 😁

So, my old laptop lost wireless connectivity and I needed SOMETHING to get the thing going to The Internet. I happened to have a brand new USB-stick on a test drive. As there aren't many Linux-supported chipsets, most USB-sticks won't work with 802.11ax / Wi-Fi 6 and you have to settle for slower speeds combined with less security. This product is supposed to beat competition on that.

Spoiler: I doesn't! (yet)

The product is BrosTrend AX1. The speed rating is AX1800 and it's supposed to be WiFi 6 Linux Compatible WiFi Adapter:

lsusb information:

Bus 001 Device 013: ID 0bda:b832 Realtek Semiconductor Corp. 802.11ac WLAN Adapter
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               2.00
  bDeviceClass            0 [unknown]
  bDeviceSubClass         0 [unknown]
  bDeviceProtocol         0 
  bMaxPacketSize0        64
  idVendor           0x0bda Realtek Semiconductor Corp.
  idProduct          0xb832 802.11ac WLAN Adapter
  bcdDevice            0.00
  iManufacturer           1 Realtek
  iProduct                2 802.11ac WLAN Adapter

Btw. See how USB-identification is for 802.11ac. Confusing, that.

At this point, there is no pre-baked Linux kernel module. You can build one in a jiffy. Source code is at https://github.com/morrownr/rtl8852bu. Getting that C-code to work in your favor, after git clone, you need to run ./install-driver.sh. This script does a lot for you. At end, the script even stealthily copies the kernel module into proper directory to make loading the driver very easy. This is confusing and not all module builds do the install without asking.

When I modprobe 8852bu on Fedora 40, module does little bit of whining, but comes up. For the crash I sent some information to author: https://github.com/morrownr/rtl8852bu/issues/38

On my laptop, connection to 802.11ac / Wi-Fi 5 works ok. I suspect, there is something off with WPA3 as connections to 802.11ax / Wi-Fi 6 simply keep asking for network password in a forever loop. But hey! It worked. I got to The Net and was able to fix things. The thought of 802.11ax support is left unanswered. If USB-information doesn't state 802.11ax either, does the stick really support it or not? I dunno.

I'm hoping, WPA3-issue would be fixed one day, so that I'd be able to join any network of my choosing, not the one the device is capable of going.

Update - 20th January 2025: There is an updated driver with full 802.11ax -support in Linux.

by Jari Turkia in Hardware, Linux at 20:48 | Comments (4) | Share in LinkedIn

Is that combo really impossible?

I've been running Linux in multiple old/oldish laptops. Broadcom is such a popular chip vendor, it is the most typical choice regardless of the laptop manufacturer. As Broadcom is full of idiots, their device drivers are proprietary. In their infinte wisdom, they pre-build Linux binaries and hand them out. This, obviously, is much better than not having the closed-source driver package at all. However, they really don't care. Every once in a while something happens in Linux kernel and their driver becomes less-than-operational.

Also, by lack of one official source, there are number of packages made out of the binary distribution. Key naming convention will include letters W and L in them, so you as an end user have to know wl stands for Broadcom BCM4311, BCM4312, BCM4313, BCM4321, BCM4322, BCM43224, BCM43225, BCM43227, BCM43228, BCM43142, BCM4331, BCM4352, BCM4360 devices. Simple! (No, not simple)
As an example: Fedora doesn't support Broadcom at all (as default), Arch Linux has packages brcm80211, b43 and broadcom-wl, Debian has broadcom-wl as non-free package.

Recently my Fedora 40 refused to find a working Wifi. It all started on from 6.9.10 up to 6.10.5. Oh, I forgot to mention, altough Fedora doesn't have the support, there exist number of alternative RPM-repos for Fedora carrying Broadcom. An example, RPM Fusion., non-free package of broadcom-wl. To make this mess even messier, RPM Fusion also has kmod-wl and akmod-wl.

Ok, many packages, back to the problem: my laptop lost Wifi and I really struggled to figure out why, which kernel module caused the problem and for what reason. At the time both broadcom-wl and kmod-wl were installed.

This is VERY confusing! You have to be super-smart to understand much of my less-than-coherent story above. Unfortunately, that is the reality. Everything gets convoluted, confusing and chaos.

In RPM Fusion Bugzilla, there is a bug #6991 Kernel hangs due to broadcom wifi driver. This bug report is specifically for akmod-wl, which was not installed in my laptop. Using an USB-dongle, uninstalling both broadcom-wl and kmod-wl, followed by installing akmod-wl did not solve the problem either. Unlike with original packages, with akmod-wl there was no kernel crash on modprobe. With this package NetworkManager didn't work either. Weird.

When I wrote comments to bug report and Mr. Vieville, author of akmod-wl, replied with a suggestion. There existed an unreleased version of 6.30.223.271-53. Little bit of dnf installing and testing ... it works! Now my laptop had native Wifi-connectivity and I could un-plug the USB-dongle.

This incident left me really confused and happy.

by Jari Turkia in Linux at 20:01 | Comments (3) | Share in LinkedIn

I''ve been way too busy with my dayjob to do any blogging nor system maintenance.

Ever since S9y 2.4.0 update, my blog has been in a disarray. This has been a tough one for me as I'd love to run my system with much better quality.

Ultimately I had to find the required time and do tons of maintenance to get out of the sad state.

Mending activities completed:

Changed hosting provider to Hetzner
Rented a much beefier VM for the system
Changed host CPU:
- manufacturer from Intel to ?
- architecture from AMD64 into ARMv8
- generation from Pentium 4 into something very new
Upgraded OS into Oracle Linux 9
Upgraded DB into PostgreSQL 16
Allocated more RAM for web server
Tightened up security even more
Made Google Search happier
... fixeds a ton of bugs in Brownpaper skin

Qualys SSL Report still ranks this site as A+ having HTTP/2. Netcraft Site report still ranks this site into better half of top-1-Million in The World.

Now everything should be so much better. Now this AI-generated image portrays a fixed computer system:

by Jari Turkia in Blog, Linux at 22:45 | Comments (5) | Share in LinkedIn

Twenty years with Fedora Linux. Release 39 is out!

I've ran Red Hat since 90s. Version 3.0.3 was released in 1996 and ever since I've had something from them running. As they scrapped Red Hat Linux in 2003 and went for Fedora Core / RHEL, I've had something from those two running. For the record: I hate those semi-working everything-done-only-half-way Debian/Ubuntu crap. When I want to toy around with something that doesn't quite fit, I choose Arch Linux.

To get your Fedora in-place-upgraded, process is rather simple. Docs are in article Performing system upgrade.

First you make sure existing system is upgraded to all latest stuff with a dnf --refresh upgrade. Then make sure upgrade tooling is installed: dnf install dnf-plugin-system-upgrade. Now you're good to go for download all the new packages with a: dnf system-upgrade download --releasever=39

Note: This workflow has existed for a long time. It is likely to work also in future. Next time all you have to do is replace the release vesion with the one you want to upgrade into.

Now all prep is done and you're good to go for the actual upgrade. To state the obvious: this is the dangerous part. Everything before this has been a warm-up run.

dnf system-upgrade reboot

If you have a console, you'll see the progress.

Installing the packages and rebooting into the new release shouldn't take too many minutes.

When back in, verify: cat /etc/fedora-release ; cat /proc/version
This resulted in my case:

Fedora release 39 (Thirty Nine) Linux version 6.5.11-300.fc39.x86_64 (mockbuild@d23353abed4340e492bce6e111e27898) (gcc (GCC) 13.2.1 20231011 (Red Hat 13.2.1-4), GNU ld version 2.40-13.fc39) #1 SMP PREEMPT_DYNAMIC Wed Nov 8 22:37:57 UTC 2023

Finally, you can do optional clean-up: dnf system-upgrade clean && dnf clean packages

That's it! You're done for at least for next 6 months.

by Jari Turkia in Linux at 21:11 | Comments (0) | Share in LinkedIn

LM-Sensors is set of libraries and tools for accessing your Linux server's motheboard sensors. See more @ https://github.com/lm-sensors/lm-sensors.

If you're ever wondered why in Windows it is tricky to get readings from your CPU-fan rotation speed or core temperatures from you fancy GPU without manufacturer utilities. Obviously vendors do provide all the possible readings in their utilities, but people who would want to read, record and store the data for their own purposes, things get hairy. Nothing generic exists and for unknown reason, such API isn't even planned.

In Linux, The One toolkit to use is LM-Sensors. On kernel side, there exists The Linux Hardware Monitoring kernel API. For this stack to work, you also need a kernel module specific to your motherboard providing the requested sensor information via this beautiful API. It's also worth noting, your PC's hardware will have multiple sensors data providers. An incomplete list would include: motherboard, CPU, GPU, SSD, PSU, etc.

Now that sensors-detect found all your sensors, confirm sensors will output what you'd expect it to. In my case there was a major malfunction. On a boot, following thing happened when system started sensord (in case you didn't know, kernel-stuff can be read via dmesg):

systemd[1]: Starting lm_sensors.service - Hardware Monitoring Sensors... kernel: nct6775: Enabling hardware monitor logical device mappings. kernel: nct6775: Found NCT6793D or compatible chip at 0x2e:0x290 kernel: ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (_GPE.HWM) (20221020/utaddress-204) kernel: ACPI: OSL: Resource conflict; ACPI support missing from driver? systemd[1]: Finished lm_sensors.service - Hardware Monitoring Sensors.

This conflict resulted in no available mobo readings! NVMe, GPU and CPU-cores were ok, the part I was mostly looking for was fan RPMs and mobo temps just to verify my system health. No such joy. Uff.

It seems, this particular Linux kernel module has issues. Or another way to state it: mobo manufacturers have trouble implementing Nuvoton chip into their mobos. On Gentoo forums, there is a helpful thread: [solved] nct6775 monitoring driver conflicts with ACPI
Disclaimer: For ROG Maximus X Code -mobo adding acpi_enforce_resources=no into kernel parameters is the correct solution. Results will vary depending on what mobo you have.

Such ACPI-setting can be permanently enforced by first querying about the Linux kernel version being used (I run a Fedora): grubby --info=$(grubby --default-index). The resulting kernel version can be updated by: grubby --args="acpi_enforce_resources=no" --update-kernel DEFAULT. A reboot shows fix in effect, ACPI Warning is gone and mobo sensor data can be seen.

As a next step you'll need userland tooling to interpret the raw data into human-readable information with semantics. A new years back, I wrote about Improving Nuvoton NCT6776 lm_sensors output. It's mainly about bridging the flow of zeros and ones into something having meaning to humans. This is my LM-Sensors configuration for ROG Maximus X Code:

chip "nct6793-isa-0290" # 1. voltages ignore in0 ignore in1 ignore in2 ignore in3 ignore in4 ignore in5 ignore in6 ignore in7 ignore in8 ignore in9 ignore in10 ignore in11 label in12 "Cold Bug Killer" set in12_min 0.936 set in12_max 2.613 set in12_beep 1 label in13 "DMI" set in13_min 0.550 set in13_max 2.016 set in13_beep 1 ignore in14 # 2. fans label fan1 "Chassis fan1" label fan2 "CPU fan" ignore fan3 ignore fan4 label fan5 "Ext fan?" # 3. temperatures label temp1 "MoBo" label temp2 "CPU" set temp2_max 90 set temp2_beep 1 ignore temp3 ignore temp5 ignore temp6 ignore temp9 ignore temp10 ignore temp13 ignore temp14 ignore temp15 ignore temp16 ignore temp17 ignore temp18 # 4. other set beep_enable 1 ignore intrusion0 ignore intrusion1

I'd like to credit Mr. Peter Sulyok on his work about ASRock Z390 Taichi. This mobo happens to use the same Nuvoton NCT6793D -chip for LPC/eSPI SI/O (I have no idea what those acronyms are for, I just copy/pasted them from the chip data sheet). The configuration is in GitHub for everybody to see: https://github.com/petersulyok/asrock_z390_taichi

Also, I''d like to state my ignorance. After reading less than 500 pages of the NCT6793D data sheet, I have no idea what is:

Cold Bug Killer voltage
DMI voltage
AUXTIN1 is or exactly what temperature measurement it serves
PECI Agent 0 temperature
PECI Agent 0 Calibration temperature

Remember, I did mention semantics. From sensors-command output I can read a reading, what it translates into, no idea! Luckily there are some of the readings seen are easy to understand and interpret. As an example, fan RPMs are really easy to measure by removing the fan from its connector. Here is an excerpt from my mobo manual to explain fan-connectors:

As data quality is taken care of and output is meaningful, next step is to start recording data. In LM-Sensors, there is sensord for that. It is a system service taking a snapshot (you can define the frequency) and storing it for later use. I'll enrich the stored data points with system load averages, this enables me to estimate a relation with high temperatures and/or fan RPMs with how much my system is working.

Finally, all data gathered into a RRDtool database can be easily visualized with rrdcgi into HTML + set of PNG-images to present a web page like this:

Nice!

by Jari Turkia in Hardware, Linux at 16:57 | Comments (0) | Share in LinkedIn

For all those years I've been running SUSE-Linux, I've never bumped into this one while running a trivial zypper update:

Warning: File 'repomd.xml' from repository 'Update repository with updates from SUSE Linux Enterprise 15' is unsigned.

    Note: Signing data enables the recipient to verify that no modifications occurred
    after the data were signed. Accepting data with no, wrong or unknown signature can
    lead to a corrupted system and in extreme cases even to a system compromise.
    Note: File 'repomd.xml' is the repositories master index file. It ensures the
    integrity of the whole repo.

Warning: We can't verify that no one meddled with this file, so it might not be
trustworthy anymore! You should not continue unless you know it's safe.

continue? [yes/no] (no):

This error slash warning being weird and potentially dangerous, my obvious reaction was to hit ctrl-c and go investigate. First, my package verification mechanism should be intact and be able to verify if updates downloaded are unaltered or not. Second, there should have not been any breaking changes into my system, at least I didn't make any. As my system didn't seem to breached, I assumed a system malfunction and went investigating.

Quite soon, I learned this is less than rare event. It has happeded multiple times for other people. According to article Signature verification failed for file ‘repomd.xml’ from repository ‘openSUSE-Leap-42.2-Update’ there exists a simple fix.

By running two command zypper clean --all and zypper ref, the problem should dissolve.

Yes, that is the case. After a simple wash/clean/rinse -cycle zypper update worked again.
It was just weird to bump into that for the first time I'd assume this would have occurred some time earlier.

by Jari Turkia in Linux at 21:46 | Comments (0) | Share in LinkedIn

This is a deep dive into systems programing using Python. For those unfamiliar with programming, systems programming sits on top of hardware / electronics design, firmware programming and operating system programming. However, it is not applications programming which targets mostly end users. Systems programming targets the running system. Mr. Yadav of Dark Bears has an article Systems Programming is Hard to Do – But Somebody’s Got to Do it, where he describes the limitations and requirements of doing so.

Typically systems programming is done with C, C++, Perl or Bash. As Python is gaining popularity, I definitely want to take a swing at systems programming with Python. In fact, there aren''t many resources about the topic in the entire Internet.

This is the list of basic requirements I have for a Python-based system daemon:

Run as service: Must run as a Linux daemon, https://man7.org/linux/man-pages/man7/daemon.7.html
- Start runing on system boot and stop running on system shutdown
Modern: systemd-compatible, https://systemd.io/
- Not interested in ancient SysV init support anymore, https://danielmiessler.com/study/the-difference-between-system-v-and-systemd/
Modern: D-bus -connected
- Service provided will have an interface on system D-bus, https://www.freedesktop.org/wiki/Software/dbus/
- All Linux systems are built on top of D-bus, I absolutely want to be compatible
Monitoring: Must support systemd watchdog, https://0pointer.de/blog/projects/watchdog.html
- Surprisingly many out-of-box Linux daemons won't support this. This is most likely because they're still SysV-init based and haven't modernized their operation.
- I most definitely want to have this feature!
Security: Must use Linux capabilities to run only with necessary permissions, https://man7.org/linux/man-pages/man7/capabilities.7.html
Security: Must support SElinux to run only with required permissions, https://github.com/SELinuxProject/selinux-notebook/blob/main/src/selinux_overview.md
Isolation: Must be independent from system Python
- venv
- virtualenv
- Any possible changes to system Python won't affect daemon or its dependencies at all.
Modern: Asynchronous Python
- Event-based is the key to success.
- D-bus and systemd watchdog pretty much nail this. Absolutely must be asynchronous.
Packaging: Installation from RPM-package
- This is the only one I'll support for any forseeable future.
- The package will contain all necessary parts, libraries and dependencies to run a self-contained daemon.

That's a tall order. Selecting only two or three of those are enough to add tons of complexity to my project. Also, I initially expected somebody else in The Net to be doing this same or something similar. Looks like I was wrong. Most systems programmers love sticking to their old habits and being at SysV-init state with their synchronous C / Perl -daemons.

I've previously blogged about running an own email server and fighting spam. Let's automate lot of those tasks and while automating, create a Maildir monitor of junk mail -folder.

This is the project I wrote for that purpose: Spammer Blocker

Toolkit will query for AS-number of spam-sending SMTP-server. Typically I'll copy/paste the IP-address from SpamCop's report and produce a CIDR-table for Postfix. The table will add headers to email to-be-stored so that possible Procmail / Maildrop can act on it, if so needed. As junk mail -folder is constantly monitored, any manually moved mail will be processed also.

Having these features bring your own Linux box spam handling capabilities pretty close to any of those free-but-spy-all -services commonly used by everybody.

Let's take a peek on what I did to meet above requirements.

This is nearly trivial. See service definition in spammer-reporter.service.

What's in the file is your run-of-the-mill systemd service with appropriate unit, service and install -definitions. That triplet makes a Linux run a systemd service as a daemon.

For any Python-developer this is somewhat trivial. You create the environment box/jail, install requirements via setup.py and that's it. You're done. This same isolation mechanism will be used later for packaging and deploying the ready-made daemon into a system.

What's missing or to-do is start using a pyproject.toml. That is something I'm yet to learn. Obviously there is always something. Nobody, nowhere is "ready". Ever.

Talking to systemd watchdog and providing a service endpoint on system D-bus requires little bit effort. Read: lots of it.

To get a D-bus service properly running, I'll first become asynchronous. For that I'll initiate an event-loop dbus.mainloop.glib. As there are multiple options for event loop, that is the only one actually working. Majority of Python-code won't work with Glib, they need asyncio. For that I'll use asyncio_glib to pair the GLib's loop with asyncio. It took me while to learn and understand how to actually achieve that. When successfully done, everything needed will run in a single asynchronous event loop. Great success!

With solid foundation, As the main task I'll create an asynchronous task for monitoring filesystem changes and run it in a forever-loop. See inotify(7) for non-Python command. See asyncinotify-library for more details of the Pythonic version. What I'll be monitoring is users' Maildirs configured to receive junk/spam. When there is a change, a check is made to see if change is about new spam received.

For side tasks, there is a D-bus service provider. If daemon is running under systemd also the required watchdog -handler is attached to event loop as a periodic task. Out-of-box my service-definition will state max. 20 seconds between watchdog notifications (See service Type=notify and WatchdogSec=20s). In daemon configuration file spammer-reporter.toml, I'll use 15 seconds as the interval. That 5 seconds should be plenty of headroom.

Documentation of systemd.service for WatchdogSec states following:

If the time between two such calls is larger than the configured time, then the service is placed in a failed state and it will be terminated

For any failed service, there is the obvious Restart=on-failure.

If for ANY possible reason, the process is stuck, systemd scaffolding will take control of the failure and act instantly. That's resiliency and self-healing in my books!

My obvious choice would be to not run as root. As my main task is to provide a D-bus service while reading all users' mailboxes, there is absolutely no way of avoiding root permissions. Trust me! I tried everything I could find to grant nobody's process with enough permissions to do all that. No avail. See the documenatiion about UID / GID rationale.

As standard root will have waaaaaay too much power (especially if mis-applied) I'll take some of those away. There is a yet rarely used mechanism of capabilities(7) in a Linux. My documentation of what CapabilityBoundingSet=CAP_AUDIT_WRITE CAP_DAC_READ_SEARCH CAP_IPC_LOCK CAP_SYS_NICE means in system service definition is also in the source code. That set grants the process for permission to monitor, read and write any users' files.

There are couple of other super-user permissions left. Most not needed powers a regular root would have are stripped, tough. If there is a security leak and my daemon is used to do something funny, quite lot of the potential impact is already mitigated as the process isn't allowed to do much.

For even more improved security, I do define a policy in spammer-block_policy.te. All my systems are hardened and run SElinux in enforcing -mode. If something leaks, there is a limiting box in-place already. In 2014, I wrote a post about a security flaw with no impact on a SElinux-hardened system.

The policy will allow my daemon to:

read and write FIFO-files
create new unix-sockets
use STDIN, STDOUT and STDERR streams
read files in /etc/, note: read! not write
read I18n (or internationalization) files on the system
use capabilities
use TCP and UDP -sockets
acess D-bus -sockets in /run/
access D-bus watchdog UDP-socket
access user passwd information on the system via SSSd
read and search user's home directories as mail is stored in them, note: not write
send email via SMTPd
create, write, read and delete temporary files in /tmp/

Above list is a comprehensive requirement of accesses in a system to meet the given task of monitoring received emails and act on determined junk/spam. As the policy is very carefully crafted not to allow any destruction, writing, deletion or manging outside /tmp/, in my thinking having such a hardening will make the daemon very secure.

Yes, in /tmp/, there is stuff that can be altered with potential security implications. First you have to access the process. While hacking the daemon, make sure to keep the event-loop running or systemd will zap the process within next 20 seconds or less. I really did consider quite a few scenarios if, and only if, something/somebody pops the cork on my daemon.

To wrap all of this into a nice packages, I'm using rpmenv. This toolkit will automatically wrap anything needed by the daemon into a nice virtualenv and deploy that to /usr/libexec/spammer-block/. See rpm.json for details.

SElinux-policy has an own spammer-block_policy_selinux.spec. Having these two in separate packages is mandatory as the mechanisms to build the boxes are completely different. Also, this is the typical approach on other pieces of software. Not everybody has strict requirements to harden their systems.

Where to place an entire virtualenv in a Linux? That one is a ball-buster. RPM Packaging Guide really doesn't say how to handle your Python-based system daemons. Remember? Up there ↑, I tried explaining how all of this is rather novel and there isn't much information in The Net regarding this. However, I found somebody asking What is the purpose of /usr/libexec? on Stackexchange and decided that libexec/ is fine for this purpose. I do install shell wrappers into /bin/ to make everybody's life easier. Having entire Python environment there wouldn't be sensible.

Only time will tell if I made the right design choices. I totally see Python and Rust -based deamons gaining popularity in the future. The obvious difference is Rust is a compiled language like Go, C and C++. Python isn't.

by Jari Turkia in Linux, Programming, Security at 22:12 | Comments (0) | Share in LinkedIn

This topic won't go away. It just keeps bugging me. Back in -19 I wrote about GPE06 and couple months ago I wrote about sleep wake. As there is no real solution in existence and I've been using my Mac with Linux, I've come to a conclusion they are in fact the same problem.

When I boot my Mac, log into Linux and observe what's going on. Following CPU-hog can be observed in top:

RES SHR S %CPU %MEM TIME+ COMMAND 0 0 I 41.5 0.0 2:01.50 [kworker/0:1-kacpi_notify]

ACPI-notify will chomp quite a lot of CPU. As previously stated, all of this will go to zero if /sys/firmware/acpi/interrupts/gpe06 would be disabled. Note how GPE06 and ACPI are intertwined. They do have a cause and effect.

Also, doing what I suggested earlier to apply acpi=strict noapic kernel arguments:

grubby --args="acpi=strict noapic" --update-kernel=$(ls -t1 /boot/vmlinuz-*.x86_64 | head -1)

... will in fact reduce GPE06 interrupt storm quite a lot:

RES SHR S %CPU %MEM TIME+ COMMAND 0 0 I 10.0 0.0 0:22.92 [kworker/0:1-kacpi_notify]

Storm won't be removed, but drastically reduced. Also, the aluminium case of MBP will be a lot cooler.

However, by running grubby, the changes won't stick. Fedora User Docs, System Administrator’s Guide, Kernel, Module and Driver Configuration, Working with the GRUB 2 Boot Loader tells following:

To reflect the latest system boot options, the boot menu is rebuilt automatically when the kernel is updated or a new kernel is added.

Translation: When you'll install a new kernel. Whatever changes you did with grubby won't stick to the new one. To make things really stick, edit file /etc/default/grub and have line GRUB_CMDLINE_LINUX contain these ACPI-changes as before: acpi=strict noapic

Many people are suffering from this same issue. Example: Bug 98501 - [i915][HSW] ACPI GPE06 storm

Even this change won't fix the problem. Lot of CPU-resources are still wasted. When you close the lid for the first time and open it again, this GPE06-storm miraculously disappears. Also what will happen, your next lid open wake will take couple of minutes. It seems the entire Mac is stuck, but it necessarily isn't (sometimes it really is). It just takes a while for the hardware to wake up. Without noapic, it never will. Also, reading the Freedesktop bug-report, there is a hint of problem source being Intel i915 GPU.

Hopefully somebody would direct some development resources to this. Linux on a Mac hardware runs well, besides these sleep/sleep-wake issues.

by Jari Turkia in Apple, Linux at 18:42 | Comments (0) | Share in LinkedIn

Few years back I wrote about running Linux on a MacBook Pro. An year ago, the OpenSuse failed to boot on the Mac. Little bit of debugging, I realized the problem isn't in the hardware. That particular kernel update simply didn't work on that particular hardware anymore. Totally fair, who would be stupid enough to attempt using 8 years old laptop. Well, I do.

There aren't that many distros I use and I always wanted to see Fedora Workstation. It booted from USB and also, unlike OpenSuse Leap, it also booted installed. So, ever since I've been running a Fedora Workstation on encrypted root drive.

One glitch, though. It didn't always sleep wake. Quite easily, I found stories of a MBP not sleeping. Here's one: Macbook Pro doesn't suspend properly. Unlike that 2015 model, this 2013 puppy slept ok, but had such deep state, it had major trouble regaining consciousness. Pressing the power for 10 seconds obviously recycled power, but it always felt too much of a cannon for simple task.

Checking what ACPI has at /proc/acpi/wakeup:

Device S-state Status Sysfs node P0P2 S3 *enabled pci:0000:00:01.0 PEG1 S3 *disabled EC S4 *disabled platform:PNP0C09:00 GMUX S3 *disabled pnp:00:03 HDEF S3 *disabled pci:0000:00:1b.0 RP03 S3 *enabled pci:0000:00:1c.2 ARPT S4 *disabled pci:0000:02:00.0 RP04 S3 *enabled pci:0000:00:1c.3 RP05 S3 *enabled pci:0000:00:1c.4 XHC1 S3 *enabled pci:0000:00:14.0 ADP1 S4 *disabled platform:ACPI0003:00 LID0 S4 *enabled platform:PNP0C0D:00

For those had-to-sleep -cases, disabling XHC1 and LID0 did help, but made wakeup troublesome. While troubleshooting my issue, I did try if disabling XHC1 and/or LID0 would a difference. It didn't.

Also, I found it very difficult to find any detailed information on what those registered ACPI wakeup -sources translate into. Lid is kinda obvious, but rest remain relatively unknown.

While reading System Sleep States from Intel, a thought occurred to me. Let's make this one hibernate to see if that would work. Sleep semi-worked, but I wanted to see if hibernate was equally unreliable.

Going for systemctl hibernate didn't quite go as well as I expected. It simply resulted in an error of: Failed to hibernate system via logind: Not enough swap space for hibernation

With free, the point was made obvious:

total used free shared buff/cache available Mem: 8038896 1632760 2424492 1149792 3981644 4994500 Swap: 8038396 0 8038396

For those not aware: Modern Linux systems don't have swap anymore. They have zram instead. If you're really interested, go study zram: Compressed RAM-based block devices.

To verify the previous, running zramctl displayed prettyy much the above information in form of:

NAME ALGORITHM DISKSIZE DATA COMPR TOTAL STREAMS MOUNTPOINT /dev/zram0 lzo-rle 7.7G 4K 80B 12K 8 [SWAP]

I finally gave up on that by bumping into article Supporting hibernation in Workstation ed., draft 3. It states following:

The Fedora Workstation working group recognizes hibernation can be useful, but due to impediments it's currently not practical to support it.

Ok ok ok. Got the point, no hibernate.

Looking into sleep wake issue more, I bumped into this thread Ubuntu Processor Power State Management. There a merited user Toz suggested following:

It may be a bit of a stretch, but can you give the following kernel parameters a try:

acpi=strict

noapic

I had attempted lots of options, that didn't sound that radical. Finding the active kernel file /boot/vmlinuz-5.18.18-200.fc36.x86_64, then adding mentioned kernel arguments to GRUB2 with: grubby --args=acpi=strict --args=noapic --update-kernel=vmlinuz-5.18.18-200.fc36.x86_64

... aaand a reboot!

To my surprise, it improved the situation. Closing the lid and opening it now works robust. However, that does not solve the problem where battery is nearly running out and I plug the Magsafe. Any power input to the system taints sleep and its back to deep freeze. I'm happy about the improvement, tough.

This is definitely a far fetch, but still: If you have an idea how to fix Linux sleep wake on an ancient Apple-hardware, drop me a comment. I'll be sure to test it out.

by Jari Turkia in Apple, Linux at 21:00 | Comments (2) | Share in LinkedIn

September '25

In ArchLinux, this is what happens too often when you're running simple upgrade with pacman -Syu:

error: libcap: signature from "-an-author-" is marginal trust :: File /var/cache/pacman/pkg/-a-package-.pkg.tar.zst is corrupted (invalid or corrupted package (PGP signature)). Do you want to delete it? [Y/n] y error: failed to commit transaction (invalid or corrupted package) Errors occurred, no packages were upgraded.

This error has occurred multiple times since ever and by googling, it has a simple solution. Looks like the solution went sour at some point. Deleting obscure directories and running pacman-key --init and pacman-key --populate archlinux won't do the trick. I tried that fix, multiple times. Exactly same error will be emitted.

Correct way of fixing the problem is running following sequence (as root):

paccache -ruk0
pacman -Syy archlinux-keyring
pacman-key --populate archlinux

Now you're good to go for pacman -Syu and enjoy upgraded packages.

Disclaimer:
I'll give you really good odds for above solution to go eventually rot. It does work at the time of writing with archlinux-keyring 20220713-2.

by Jari Turkia in Linux at 20:56 | Comments (0) | Share in LinkedIn