Monday, July 3. 2023
LM-Sensors is set of libraries and tools for accessing your Linux server's motheboard sensors. See more @ https://github.com/lm-sensors/lm-sensors.
If you're ever wondered why in Windows it is tricky to get readings from your CPU-fan rotation speed or core temperatures from you fancy GPU without manufacturer utilities. Obviously vendors do provide all the possible readings in their utilities, but people who would want to read, record and store the data for their own purposes, things get hairy. Nothing generic exists and for unknown reason, such API isn't even planned.
In Linux, The One toolkit to use is LM-Sensors. On kernel side, there exists The Linux Hardware Monitoring kernel API. For this stack to work, you also need a kernel module specific to your motherboard providing the requested sensor information via this beautiful API. It's also worth noting, your PC's hardware will have multiple sensors data providers. An incomplete list would include: motherboard, CPU, GPU, SSD, PSU, etc.
sensors-detect found all your sensors, confirm
sensors will output what you'd expect it to. In my case there was a major malfunction. On a boot, following thing happened when system started
sensord (in case you didn't know, kernel-stuff can be read via
systemd: Starting lm_sensors.service - Hardware Monitoring Sensors...
kernel: nct6775: Enabling hardware monitor logical device mappings.
kernel: nct6775: Found NCT6793D or compatible chip at 0x2e:0x290
kernel: ACPI Warning: SystemIO range 0x0000000000000295-0x0000000000000296 conflicts with OpRegion 0x0000000000000290-0x0000000000000299 (_GPE.HWM) (20221020/utaddress-204)
kernel: ACPI: OSL: Resource conflict; ACPI support missing from driver?
systemd: Finished lm_sensors.service - Hardware Monitoring Sensors.
This conflict resulted in no available mobo readings! NVMe, GPU and CPU-cores were ok, the part I was mostly looking for was fan RPMs and mobo temps just to verify my system health. No such joy. Uff.
It seems, this particular Linux kernel module has issues. Or another way to state it: mobo manufacturers have trouble implementing Nuvoton chip into their mobos. On Gentoo forums, there is a helpful thread: [solved] nct6775 monitoring driver conflicts with ACPI
Disclaimer: For ROG Maximus X Code -mobo adding
acpi_enforce_resources=no into kernel parameters is the correct solution. Results will vary depending on what mobo you have.
Such ACPI-setting can be permanently enforced by first querying about the Linux kernel version being used (I run a Fedora):
grubby --info=$(grubby --default-index). The resulting kernel version can be updated by:
grubby --args="acpi_enforce_resources=no" --update-kernel DEFAULT. A reboot shows fix in effect, ACPI Warning is gone and mobo sensor data can be seen.
As a next step you'll need userland tooling to interpret the raw data into human-readable information with semantics. A new years back, I wrote about Improving Nuvoton NCT6776 lm_sensors output. It's mainly about bridging the flow of zeros and ones into something having meaning to humans. This is my LM-Sensors configuration for ROG Maximus X Code:
# 1. voltages
label in12 "Cold Bug Killer"
set in12_min 0.936
set in12_max 2.613
set in12_beep 1
label in13 "DMI"
set in13_min 0.550
set in13_max 2.016
set in13_beep 1
# 2. fans
label fan1 "Chassis fan1"
label fan2 "CPU fan"
label fan5 "Ext fan?"
# 3. temperatures
label temp1 "MoBo"
label temp2 "CPU"
set temp2_max 90
set temp2_beep 1
# 4. other
set beep_enable 1
I'd like to credit Mr. Peter Sulyok on his work about ASRock Z390 Taichi. This mobo happens to use the same Nuvoton NCT6793D -chip for LPC/eSPI SI/O (I have no idea what those acronyms are for, I just copy/pasted them from the chip data sheet). The configuration is in GitHub for everybody to see: https://github.com/petersulyok/asrock_z390_taichi
Also, I''d like to state my ignorance. After reading less than 500 pages of the NCT6793D data sheet, I have no idea what is:
- Cold Bug Killer voltage
- DMI voltage
- AUXTIN1 is or exactly what temperature measurement it serves
- PECI Agent 0 temperature
- PECI Agent 0 Calibration temperature
Remember, I did mention semantics. From
sensors-command output I can read a reading, what it translates into, no idea! Luckily there are some of the readings seen are easy to understand and interpret. As an example, fan RPMs are really easy to measure by removing the fan from its connector. Here is an excerpt from my mobo manual to explain fan-connectors:
As data quality is taken care of and output is meaningful, next step is to start recording data. In LM-Sensors, there is
sensord for that. It is a system service taking a snapshot (you can define the frequency) and storing it for later use. I'll enrich the stored data points with system load averages, this enables me to estimate a relation with high temperatures and/or fan RPMs with how much my system is working.
Finally, all data gathered into a RRDtool database can be easily visualized with
rrdcgi into HTML + set of PNG-images to present a web page like this: