Improving Nuvoton NCT6776 lm_sensors output
Monday, November 16. 2015
Problem
My home Linux-box was outputting more-or-less useless lm_sensor output. Example:
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +36.0°C (high = +80.0°C, crit = +98.0°C)
Core 0: +34.0°C (high = +80.0°C, crit = +98.0°C)
Core 1: +31.0°C (high = +80.0°C, crit = +98.0°C)
Core 2: +36.0°C (high = +80.0°C, crit = +98.0°C)
Core 3: +33.0°C (high = +80.0°C, crit = +98.0°C)
nct6776-isa-0290
Adapter: ISA adapter
Vcore: +0.97 V (min = +0.00 V, max = +1.74 V)
in1: +1.02 V (min = +0.00 V, max = +0.00 V) ALARM
AVCC: +3.33 V (min = +2.98 V, max = +3.63 V)
+3.3V: +3.31 V (min = +2.98 V, max = +3.63 V)
in4: +1.01 V (min = +0.00 V, max = +0.00 V) ALARM
in5: +2.04 V (min = +0.00 V, max = +0.00 V) ALARM
in6: +0.84 V (min = +0.00 V, max = +0.00 V) ALARM
3VSB: +3.42 V (min = +2.98 V, max = +3.63 V)
Vbat: +3.36 V (min = +2.70 V, max = +3.63 V)
fan1: 0 RPM (min = 0 RPM)
fan2: 703 RPM (min = 0 RPM)
fan3: 0 RPM (min = 0 RPM)
fan4: 819 RPM (min = 0 RPM)
fan5: 0 RPM (min = 0 RPM)
SYSTIN: +36.0°C (high = +0.0°C, hyst = +0.0°C) ALARM sensor = thermistor
CPUTIN: -60.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermal diode
AUXTIN: +35.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
PECI Agent 0: +26.0°C (high = +80.0°C, hyst = +75.0°C)
(crit = +88.0°C)
PCH_CHIP_TEMP: +0.0°C
PCH_CPU_TEMP: +0.0°C
PCH_MCH_TEMP: +0.0°C
That's all great and all, but what the heck are in 1, 4-6 and fan 1-5? Are the in 1, 4-6 readings really reliable? Why are there sensors with 0 RPM readings? CPUTIN indicating -60 degrees, really? PCH-temps are all 0, why?
Investigation
In order to get to bottom of all this, let's start from the chip in question. lm_sensors -setup identified it as NCT6776. For some reason Nuvoton doesn't have the data sheet anymore, but by little bit of googling, a PDF with title NCT6776F / NCT6776D Nuvoton LPC I/O popped up.
Analog inputs:
Following information can be found:
It contains following analog inputs:
- AVCC
- VBAT
- 3VSB
- 3VCC
- CPUVCORE
- VIN0
- VIN1
- VIN2
- VIN3
The good thing is, that first 5 of them are clearly labeled, but inputs 0 through 3 are not. They can be pretty much anything.
Revolution Pulse counters:
When it comes to RPM-readings, following information is available:
That lists following inputs:
- SYSFANIN
- CPUFANIN
- AUXFANIN0
- AUXFANIN1
- AUXFANIN2
Looks like all of those have connectors on my motherboard.
Temperature Sources:
For the temperature measurements, the chip has:
The analog temperature inputs are:
- SMIOVT1
- SMIOVT2
- SMIOVT3
- SMIOVT4
- SMIOVT5
- SMIOVT6
According to the above table, they're mapped into AUXTIN, CPUTIN and SYSTIN.
Also on top of those, there is PECI (Platform Environment Control Interface). A definition says "PECI is a new digital interface to read the CPU temperature of Intel® CPUs". So, there aren't any analog pins for that, but there are readings available, when questioned.
Configuration
A peek in to /etc/sensors3.conf
at the definition of the chip shows:
chip "w83627ehf-*" "w83627dhg-*" "w83667hg-*" "nct6775-*" "nct6776-*"
label in0 "Vcore"
label in2 "AVCC"
label in3 "+3.3V"
label in7 "3VSB"
label in8 "Vbat"
set in2_min 3.3 * 0.90
set in2_max 3.3 * 1.10
set in3_min 3.3 * 0.90
set in3_max 3.3 * 1.10
set in7_min 3.3 * 0.90
set in7_max 3.3 * 1.10
set in8_min 3.0 * 0.90
set in8_max 3.3 * 1.10
And that's all. I guess that would be ok for the generic case, but in my particular box that list of settings doesn't cover half of the inputs.
Solution
Configuration changes
I added following settings for temperature into "chip "w83627ehf-*" "w83627dhg-*" "w83667hg-*" "nct6775-*" "nct6776-*"
"-section:
label in0 "Vcore"
set in0_min 1.1 * 0.9
set in0_max 1.1 * 1.15
label in1 "+12V"
compute in1 @ * 12, @ / 12
set in1_min 12 * 0.95
set in1_max 12 * 1.1
label in2 "AVCC"
set in2_min 3.3 * 0.95
set in2_max 3.3 * 1.1
label in3 "+3.3V"
set in3_min 3.3 * 0.95
set in3_max 3.3 * 1.1
label in4 "+5V"
compute in4 @ * 5, @ / 5
set in4_min 5 * 0.95
set in4_max 5 * 1.1
ignore in5
ignore in6
label in7 "3VSB"
set in7_min 3.3 * 0.95
set in7_max 3.3 * 1.1
label in8 "Vbat"
set in8_min 3.3 * 0.95
set in8_max 3.3 * 1.1
The obvious problem still stands: what are the undocumented in 1, 4, 5 and 6? Mr. Ian Dobson at Ubuntuforums.org discussion about NCT6776 claims, that in1 is for +12 VDC power and in4 is for +5VDC power. I cannot deny nor confirm that for my board. The Novoton-chip only provides the inputs, but there is absolutely no way of telling how the manufacturer chooses to connect them to various parts of the MoBo. I took the same assumption, so all that was necessary, was to multiply the input data by 12 and 5 to get a proper reading. I don't know what in5 and in6 are for, that's why I remove them from the display. All the other ones are min and max boundaries for the known readings.
The fan settings are machine specific, in my case:
label fan2 "CPU fan"
set fan2_min 200
label fan4 "HDD fan"
set fan4_min 200
ignore fan1
ignore fan3
ignore fan5
As I only have fans connected to 2 out of 5, I'll ignore the not connected ones. For the connected, I set a lower limit of 200 RPM.
Temperatures are motherboard-specific. In my case, I did following additions:
label temp1 "MB"
set temp1_max 38
set temp1_max_hyst 35
label temp3 "CPU"
label temp7 "CPU?"
ignore temp2
ignore temp8
ignore temp9
ignore temp10
The easy part is to remove the values not displaying anything. The hard part is to try to figure out what the measurements indicate. Based on the other readings, temp3 is CPU combined somehow. The other sensor is displaying rougly same values for each core I have there. However, the temp7 is for PECI, but it doesn't behave anything like CPU-temps. It should, but it doesn't. That's why I left a question mark after it.
Resulting output
After the additions, following output is available:
coretemp-isa-0000
Adapter: ISA adapter
Physical id 0: +48.0°C (high = +80.0°C, crit = +98.0°C)
Core 0: +48.0°C (high = +80.0°C, crit = +98.0°C)
Core 1: +40.0°C (high = +80.0°C, crit = +98.0°C)
Core 2: +43.0°C (high = +80.0°C, crit = +98.0°C)
Core 3: +39.0°C (high = +80.0°C, crit = +98.0°C)
nct6776-isa-0290
Adapter: ISA adapter
Vcore: +1.22 V (min = +0.99 V, max = +1.26 V)
+12V: +12.29 V (min = +11.42 V, max = +13.25 V)
AVCC: +3.33 V (min = +3.14 V, max = +3.63 V)
+3.3V: +3.31 V (min = +3.14 V, max = +3.63 V)
+5V: +5.04 V (min = +4.76 V, max = +5.52 V)
3VSB: +3.42 V (min = +3.14 V, max = +3.63 V)
Vbat: +3.38 V (min = +3.14 V, max = +3.63 V)
CPU fan: 912 RPM (min = 200 RPM)
HDD fan: 897 RPM (min = 200 RPM)
MB: +35.0°C (high = +38.0°C, hyst = +35.0°C) sensor = thermistor
CPU: +37.0°C (high = +80.0°C, hyst = +75.0°C) sensor = thermistor
CPU?: +37.0°C (high = +80.0°C, hyst = +75.0°C)
(crit = +88.0°C)
Before taking the readings, I ran sensors -s
to set the min/max values.
Now my output starts making sense and I can actually monitor any changes.
PS.
At the time of writing this article, website http://www.lm-sensors.org/ was down for multiple days in a row. I can only hope, that project personnel solves the issue with the web site and it is up at the time you're seeing this.
Replacing physical drive for LVM - pvcreate Can't open /dev exclusively
Sunday, November 8. 2015
This is part 2 of my hard drive upgrade. My previous part was about failure to partition a replaced hard drive with GNU Parted: It was just emitting an error of: "The resulting partition is not properly aligned for best performance"
When I had the drive partitioned properly, I failed to proceed with my setup in a yet another mysterious error. My drives are always using LVM, so that I get more control over the filesystem sizes. To get the new partition into LVM, it needs to be associated with a Volume Group (VG). First step is to inform LVM about new physical drive:
# pvcreate /dev/sda1
Can't open /dev/sda1 exclusively. Mounted filesystem?
Oh really? It's definitely not mounted, but ... somebody is stealing my resource. The root of this problem is obviously on the fact, that there used to be a PV on that partition, but I replaced the drive and partitioned it. It is entirely possible, that LVM likes to fiddle with my new partition somehow.
The device mapper knows about the partition:
# dmsetup ls
Box_vg1-LogVol_wrk2 (253:9)
That's kind of bad. I guess it likes to hold on into it. Further check of:
# pvdisplay
... indicates, that LVM doesn't know about the partition (yet), but Linux kernel does.
An attempt to fix:
# dmsetup remove Box_vg1-LogVol_wrk2
And new attempt:
# pvcreate /dev/sda1
Can't open /dev/sda1 exclusively. Mounted filesystem?
No change. Perhaps a strace will provide helpful details of the problem:
# strace pvcreate /dev/sda1
...
stat("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1), ...}) = 0
stat("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1), ...}) = 0
open("/dev/sda1", O_RDWR|O_EXCL|O_DIRECT|O_NOATIME) = -1 EBUSY (Device or resource busy)
...
Reading a fragment of OPEN(2) man page:
OPEN(2)
open, openat, creat - open and possibly create a file
O_EXCL Ensure that this call creates the file: if this flag is
specified in conjunction with O_CREAT, and pathname already
exists, then open() will fail.
In general, the behavior of O_EXCL is undefined if it is used
without O_CREAT. There is one exception: on Linux 2.6 and
later, O_EXCL can be used without O_CREAT if pathname refers
to a block device. If the block device is in use by the
system (e.g., mounted), open() fails with the error EBUSY.
... confirms the suspicion, that somebody is holding a handle to the block device. Running lsof(8) or fuser(1) yield nothing. It's not a file-handle, when kernel has your block device as hostage.
My only idea at this point was to do a wimpy Windows-style reboot. The thing is: Linux-men don't reboot on anything, but this time I was out of ideas. I'm sure somewhere there is an IOCTL-call to release the handle, but I couldn't find it easily. So, a reboot was in order.
After the reboot: yes results:
# pvcreate /dev/sda1
Physical volume "/dev/sda1" successfully created
Then I could proceed with my build sequence. Next, associate a Volume Group with the new Pysical Volume. The options would be to to add the drive into an existing VG, or create a new one. I chose the latter:
# vgcreate Box_vg1 /dev/sda1
Volume group "Box_vg1" successfully created
Then create a logical partition, or Logical Volume in LVM-lingo on the newly created VG:
# lvcreate -L 800G -n LogVol_wrk2 Box_vg1
Logical volume "LogVol_wrk2" created
As a physical partition also a LV needs to have a filesystem on it, to be usable for the operating system:
# mkfs.ext4 /dev/Box_vg1/LogVol_wrk2
mke2fs 1.42.12 (29-Aug-2014)
Creating filesystem with 209715200 4k blocks and 52428800 inodes
Filesystem UUID: 93be6c97-3ade-4a62-9403-789f64ef73d0
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
Now the drive was ready to be mounted and I had plenty of completely empty space waiting for data to be stored on it.
I plugged in a SATA-USB -dock and started looking for my old data. I intentionally had created a VG with precisely the same name as the old drive had, so there was an obvious collision. My syslog had entries about the pvscan:
Nov 8 16:03:21 pvscan: device-mapper: create ioctl on Box_vg1-LogVol_wrk2 failed: Device or resource busy
Nov 8 16:03:21 pvscan: 0 logical volume(s) in volume group "Box_vg1" now active
Nov 8 16:03:21 pvscan: Box_vg1: autoactivation failed.
Yes, that one I had coming. No autoactivation, as VG names collided. A check:
# vgdisplay
--- Volume group ---
VG Name Box_vg0
...
--- Volume group ---
VG Name Box_vg1
...
--- Volume group ---
VG Name Box_vg1
...
VG UUID trx8sq-2Mtf-2tfa-2m1P-YPGq-cVzA-6fWflU
No surprises there, there were two Volume Groups with exactly same name. To address them, there are unique identifiers or UUIDs. With UUID, it is possible to rename the VG. Like this:
# vgrename trx8sq-2Mtf-2tfa-2m1P-YPGq-cVzA-6fWflU Box_vgold
Volume group "Box_vg1" successfully renamed to "Box_vgold"
Now it would be possible to activate and it would appear on udev:
# vgchange -ay Box_vgold
1 logical volume(s) in volume group "Box_vgold" now active
Now the old data was available at /dev/Box_vgold/LogVol_wrk2
and ready to be mounted and files copied out of it.
Done and mission accomplished! Now I had much more space on a fast drive.
GNU Parted: Solving the dreaded "The resulting partition is not properly aligned for best performance"
Saturday, November 7. 2015
On the other day I was cleaning out junk from my shelfs and found a perfectly good WD Caviar Black hard drive. Obviously in the current SSD-era where your only computer is a laptop and most of your data is stashed into a cloud somewhere, no regular Joe User is using spinning platters.
Hey! I'm not a regular, nor joe. I have a Linux-server running with plenty of capacity in it for my various computing needs. So, the natrural thing to do is to pop out one of the old drives and hook this 1,5 TiB high performing storage monster to replace it. The actual hardware installation on an ATX-case isn't anything worth documenting, but what happens afterwards goes pretty much this sequence: 1) partition the drive, 2) copy all/some of the old data back to it and 3) continue living successfully ever after.
The typical scenario is that something always at least hiccups, if not fails. And as expected, I choked on the 1).
Here goes:
Preparation
The drive had been used previously, and I just wasted the beginning of the drive by writing 10k sectors of nothingness. This will remove all traces of possible partition tables, boot sectors and all the critical metadata of the drive you normally value highly:
# dd if=/dev/zero of=/dev/sda bs=512 count=10000
Pay attetion to the details. It would be advisable to target a correct drive. In my case a regular JBOD-drive really appears as /dev/sda
on the Linux-side. On your case, I'm pretty sure your operating system runs on /dev/sda
, so please don't wipe that.
Then with GNU Parted, create a GUID partition table (or GPT):
# parted /dev/sda
GNU Parted 3.1
Using /dev/sda
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) mktable gpt
That's it for the preparation part.
Attempt 1: The stupid way
Regardless what's on the drive already (in my case, its completely empty), Parted syntax allows an approach, where you create a partition using the maximum allowed capacity from start 0, to end -1. Like this:
(parted) mkpart LVM ext4 0 -1
Warning: The resulting partition is not properly aligned for best performance.
Ignore/Cancel? c
That obviously will emit an error about non-optimal partition alignment. But hey, that's what I asked for. I obviously cancelled that attempt.
Attempt 2: The smart way
A smart approach would be to see about the boundaries:
(parted) print free
Model: ATA WDC WD1502FAEX-0 (scsi)
Disk /dev/sda: 1500GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt Disk Flags:
Number Start End Size File system Name Flags
17.4kB 1500GB 1500GB Free Space
Now we have a range of 17.4 KiB to 1500 GiB which can be used for a new partition. Let's try that:
(parted) mkpart LVM ext4
Start? 17.4kB
End? 1500GB
Warning: You requested a partition from 16.9kB to 1500GB (sectors 33..2929687500).
The closest location we can manage is 17.4kB to 1500GB (sectors 34..2930277134).
Is this still acceptable to you? Yes/No? y
Warning: The resulting partition is not properly aligned for best performance.
Ignore/Cancel? c
I have bumped into this number of times earlier. Why in the f**k cannot the Parted tell me what values it wants to see there!! Come on!
This is the part where it hits me like a hammer: enough bullshit, let's solve this once and for all!
Attempt 3: Solution
This is the script I wrote: parted_mkpart_calc.sh.
It is based on the information found from following sources:
- How to align partitions for best performance using parted, somebody else is having the same fight than I do
- I/O Limits: block sizes, alignment and I/O hints, information about the Parted alignment calculation
- https://www.kernel.org/doc/Documentation/ABI/testing/sysfs-block, Linux kernel block-device ABI information
It is a Bash-script to do the math for you. Example usage:
$ ./parted_mkpart_calc.sh sda
Using default 1 MiB default alignment in calc
Calculated alignment for /dev/sda (gpt) is: 2048s
If you would be root, you could create partition with:
# parted /dev/sda mkpart [name] [type] 2048s 2930276351s
Verify partition alignment with:
# parted /dev/sda align-check optimal 1 Should return: 1 aligned
I just enter one argument to the script: sda
. From that, the script deduces the alignment, that should be used when partitioning that block-device. In this case it is 2048 sector boundaries (what it doesn't say is, that a sector contains 512 bytes). But it outputs 2 commands which can be copy/pasted (as root):
parted /dev/sda mkpart [name] [type] 2048s 2930276351s
If you would replace [name]
with a partition name and [type]
with a partition type, it would create a correctly aligned partition to fill up most of the drive. It won't fill up exactly all of the drive, because of the alignment issues.
To help that issue, I added a feature to do the following:
$ ./parted_mkpart_calc.sh sda LVM ext4
Optionally, you can provide the partition name and type on the command line to get:
parted /dev/sda mkpart LVM ext4 2048s 2930276351s
as output. That's ready-to-go copy/paste material.
Finally, you can verify the correct alignment:
# parted /dev/sda align-check optimal 1
1 aligned
That's the proof, that calc worked ok.
Attempt 4: The simple way
It didn't take long, before I got my first comment on this article. It was simply: "Why didn't you use percentages?". What? What percentages.
Example:
(parted) unit s
(parted) print
Model: ATA WDC WD1502FAEX-0 (scsi)
Disk /dev/sda: 2930277168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:
Number Start End Size File system Name Flags
1 2048s 2930276351s 2930274305s LVM
(parted) rm 1
(parted) mkpart LVM ext4 0% 100%
(parted) print
Model: ATA WDC WD1502FAEX-0 (scsi)
Disk /dev/sda: 2930277168s
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Number Start End Size File system Name Flags
1 2048s 2930276351s 2930274305s LVM
Using range 0% 100% will produce exactly the same results. Amazing!
So, parted knows the alignment and can use it, but not if you don't first do a rain dance and knock three times on a surface sprinkled with holy water.
Final Words
Why does Parted complain about mis-alignment, but offers no help at all? That's just plain stupid!
Of course, I should add the feature to the source code and offer the patch to FSF, but on the other hand. Naah. I don't want to waste any more energy on this madness.