Arch Linux failing to start network interface
Monday, June 16. 2014
One of my boxes is running an Arch Linux. Out of the box it is really a slim one. The install runs only a blink and as a result the operating system won't have anything that is not absolutely necessary to boot the thing for the first time. Given any of the other distros who require gigabytes and gigabytes of storage for crap you won't ever need this is a refreshing change. Every Arch Linux user needs to "build their own excitement" (originally about Gentoo from obsoleted http://www.usenix.org.uk/pictures/despair-linux/gentoo.jpg).
Recently they maintainers have been fiddling around too much with network interface naming conventions. When I installed it was eth0, then it changed to ens3, and guess what happened when I last updated! Yuupp! Back to eth0, but with a twist. Now the eth0 won't come up on boot. Crap!
The Arch Linux discussion forum's section Networking, Server, and Protection has a discussion with topic [SOLVED] Update broke netctl (I guess?). It discusses the problem with a sys-subsystem-net-devices-ens3.device. However, in my box none of the repair instructions were accurate.
Later I realized that my dmesg has following lines in it:
systemd[1]: Expecting device sys-subsystem-net-devices-eth0.device...
systemd[1]: Expecting device sys-subsystem-net-devices-ens3.device...
Ok. On bootup it waits for two, as in not one, network interfaces to become alive. The problem is that my box only has one. A check for the ghost-interface:
systemctl status sys-subsystem-net-devices-ens3.device
* sys-subsystem-net-devices-ens3.device
Loaded: loaded
Active: inactive (dead)
Yields pretty much what I know. It is inactive and dead. A manual fix would be to start the DHCP-client manually with a:
systemctl start dhcpcd@eth0.service
... after which the network starts functioning again, but which does not fix the problem. On bootup the interface won't work!
What I did to fix this was to disable dhcpcd for both interfaces:
systemctl disable dhcpcd@ens3.service
systemctl disable dhcpcd@eth0.service
And enabled it to the proper one:
systemctl enable dhcpcd@eth0.service
This does seem to help, but on bootup it still complains "Dependency failed for dhcpcd on ens3". I don't know exactly where the old interface keeps popping up.
In the end, this does work, but it simply takes a bit longer to boot than it used to. Any suggestions to improve booting are welcome.
Wrangling permissions on an enforcing SElinux setup
Saturday, March 22. 2014
Most people don't much care about their Linux-boxes' security. You install it, you run it, you use it and occasionally run some system updates into it. Not me. When I have a box running against the wild wild Net, I absolutely positively plan to make the life of anybody cracking into one of my boxes as difficult as possible (with some usability left for myself). See Mr. Tan's article about Security-Functionality-Usability Trade-Off.
So, my choice is at the Functionality - Security -axis with less on the Ease-of-use. The rationale is that, a web application needs to run as safely as possible and can have the ease-of-use in it. The system administrator is a trained professional, he doesn't need the easy-part so much. However, there is a point, when things are set up too tight:
Image courtesy of Dilbert by Scott Adams
So, I voluntarily run software designed and implemented by NSA, SElinux. I even run it in the the Enforcing-mode which any even remotely normal system administrator thinks as being totally insane! Any small or even a tiny slip-up from the set security policy will render things completely useless. Mordac steps in and stuff simply does not work anymore.
On my Fedora-box there was a bug in BIND, the name server and an update was released to fix that. After running the update, the DNS was gone. As in, it didn't function, it didn't respond to any requests and the service didn't start. All it said was:
# systemctl status named-chroot.service --full
named-chroot.service - Berkeley Internet Name Domain (DNS)
Loaded: loaded (/usr/lib/systemd/system/named-chroot.service; enabled)
Active: failed (Result: timeout)
Any attempt to start the service resulted in a 60 second wait and a failure. dmesg-log had nothing about the issue, nor BIND's own log had nothing about the issue in it. So I started suspecting a SElinux-permission issue. My standard SElinux debugging always starts with a:
cat /var/log/audit/audit.log | audit2allow -m local
... to see if SElinux's audit logger is logging any permission-related audit faults. Indeed it did:
require {
type named_conf_t;
type named_t;
class dir write;
}
#============= named_t ==============
allow named_t named_conf_t:dir write;
That reads:
A process running in named_t security context is trying to access a directory with named_conf_t security context to gain a write access, but is denied while doing so.
It is obvious that the process in question must be the BIND name server. No other process has the named_t security context in it. When starting up, BIND name server was about to write into its own configuration directory, which is a big no no! When you write, you write only to designated directories, nowhere else (remember: running in enforcing-mode is insanity).
That is definitely a reason for a daemon not to start or to timeout while starting. Further investigation showed that also Fedora's SElinux policy had been updated a week ago: selinux-policy-3.12.1-74.19.fc19.
At this point I had all the pieces for the puzzle, it was simply a matter of putting it all together. The recently released SElinux policy has a bug in it, and nobody else was there to fix it for me.
The exact audit-log line is:
type=AVC msg=audit(1395481575.712:15239): avc:
denied { write } for
pid=4046 comm="named" name="named" dev="tmpfs" ino=14899
scontext=system_u:system_r:named_t:s0
tcontext=system_u:object_r:named_conf_t:s0 tclass=dir
So, my chrooted BIND-damon was trying to write into a tmpfs. There aren't that many of those in a system. I've even touched the tmpfs-subject earlier when I wrote a systemd-configuration into my own daemon. To find the tmpfs-usage, I ran:
# mount | fgrep tmpfs
tmpfs on /var/named/chroot/run/named type tmpfs
BIND's chroot-environment has one. That is very likely the culprit. That can be confirmed:
# ls -Z /var/named/chroot/run/
drwxrwx---. named named system_u:object_r:named_conf_t:s0 named
Yep! That's it. The directory has incorrect security context in it. To compare into system's non-chrooted one:
# ls -Zd /run/
drwxr-xr-x. root root system_u:object_r:var_run_t:s0 /run/
There is a difference between named_conf_t and var_run_t. You can write temporary files into latter, but not to the first one. The fix is very simple (assuming, that you speak fluent SElinux):
semanage fcontext -a -t var_run_t "/var/named/chroot/run(/.*)?"
restorecon -R -v named/
The two commands are:
First, re-declare a better security-context for the directory in question and then start using the new definition. Now my BIND started and was fully operational! Nice.
My investigation ran further. I needed to report this to Fedora-people. I looked into the policy-file of /etc/selinux/targeted/contexts/files/file_contexts and found the faulty line in it:
/var/named/chroot/var/run/named.* system_u:object_r:named_var_run_t:s0
That line almost works. The directory in question has only two files in it. One of them even has a matching name. The problem, obviously, is that the another one does not:
# ls -l /var/named/chroot/run/named/
total 8
-rw-r--r--. 1 named named 5 Mar 22 12:02 named.pid
-rw-------. 1 named named 102 Mar 22 12:02 session.key
See Bug 1079636 at Red Hat Bugzilla for further developments with this issue.
Installing OpenSuse 13.1 into a MacBook
Monday, February 10. 2014
OpenSuse 13.1 was released November 2013. During Christmas holidays I started a project of upgrading my previous installation.
Since I'm running on a MacBook 1,1 it was obvious that I was looking for trouble. Previously I had rEFIt running just to get a GRUB 2 -prompt. This time I decided to start from a clean slate. Literally. I ran
dd if=/dev/zero of=/dev/sda
for the 10 first MiB of the drive to make sure, that it definitely has no trace of any of my previous settings. Since rEFIt has been abandoned years ago, I went for the replacement project rEFInd. I approached the author Mr. Roderick W. Smith and he was very helpful, but no matter what I did, I could not get rEFInd running on my very old 32-bit Mac. So, I had two options left: to go back to abandonware or see what would happen without a Boot Manager.
I failed on the installer settings-dialog, by trying to out-smart OpenSuse logic. My completed installation didn't boot. On 2nd try I simply went with the flow. As Mr. Smith instructed me, I didn't touch the most critical thing: MBR is not the way to go on a Mac! Here are my settings:
And guess what, it worked! OpenSuse 13.1 installer has enough logic to create a bootable Linux-installation to a completely blank hard drive. Nice!
The installer was pretty smart. Wi-Fi network was configured properly, it worked out-of-the-box. Apple-keys work: screen brightness, volume, etc. work as is. Also the typical trouble-makers sleep (on RAM) / hibernate (to disk), battery info, sound, and what not. There were only two minor issues: iSight does not work without the Apple proprietary firmware and the keyboard Apple-keys don't do anything usable.
To get the iSight camera working, see ift-extract -tool at Apple Built-in iSight Firmware Tools for Linux. It can dig the guts out of Mac OS X iSight-driver and equip your Linux with a functioning camera. The keyboard is a trivial one. Like previously, I just keyfuzz'ed the keys into something useful. See the OpenSuse 12.3 installation blog entry for details.
There is one thing you may want to check. If you enable SSHd, like I always do on all servers. As default /etc/sysconfig/SuSEfirewall2.d/services/sshd defines as TCP/22 to be open. That is the general idea, but apparently there is so much SSHd bombing going on, that I always tar pit my installations. For some strange reason Suse engineers chose not to allow that in a specific service definition file, but it has to be in the classic /etc/sysconfig/SuSEfirewall2 file, section FW_SERVICES_ACCEPT_EXT="0/0,tcp,22,,hitcount=3,blockseconds=60,recentname=ssh"
I urge everyone of you to rename the services/sshd into something else and add the above line. This makes bombing your SSH-port so much more difficult. And it does not affect your own login performance, unless you choose to bomb it yourself.
You may want to check OpenSuse's hardware compatibility list for details about Apple Laptops. The HCL has info about what works and what doesn't.
In general OpenSuse folks did a very good job with this one. There was a real improvement on ease installation. Thanks Roderick W. Smith for his help during my installation and thanks to Novell for giving this great distro for free!
Tar: resolve failed weirness
Tuesday, February 4. 2014
The ancient tar is de-facto packing utility in all *nixes. Originally it was used for tape backups, but since tape backups are pretty much in the past, it is used solely for file transfers. Pretty much everything distributed for a *nix in the net is a single compressed tar-archive. However, there is a hidden side-effect in it. Put a colon-character (:) in the filename and tar starts mis-behaving.
Example:
tar tf 2014-02-04_12\:09-59.tar
tar: Cannot connect to 2014-02-04_12: resolve failed
What resolve! The filename is there! Why there is a need to resolve anything?
Browsing the tar manual at chapter 6.1 Choosing and Naming Archive Files reveals following info: "If the archive file name includes a colon (‘:’), then it is assumed to be a file on another machine" and also "If you need to use a file whose name includes a
colon, then the remote tape drive behavior
can be inhibited by using the ‘--force-local’ option".
Right. Good to know. The man-page reads:
Device selection and switching:
--force-local
archive file is local even if it has a colon
Let's try again:
tar --force-local tf 2014-02-04_12\:09-59.tar
tar: You must specify one of the `-Acdtrux' or `--test-label' options
Hm.. something wrong there. Another version of that would be:
tar -t --force-local f 2014-02-04_12\:09-59.tar
Well, that hung until I hit Ctrl-d. Next try:
tar tf 2014-02-04_12\:09-59.tar --force-local
Whooo! Finally some results.
I know that nobody is going to change tar-command to behave reasonably. But who really would use it over another machine (without a SSH-pipe)? That legacy feature makes things overly complex and confusing. You'll get my +1 for dropping the feature or changing the default.
Installing own CA root certificate into openSUSE
Monday, February 3. 2014
This puzzled me for a while. It is almost impossible to install the root certificate from own CA into openSUSE Linux and make it stick. Initially I tried the classic /etc/ssl/certs/-directory which works for every OpenSSL-installation. But in this case it looks like some sort of script cleans out all weird certificates from it, so effectively my own changes won't last beyond couple of weeks.
This issue is really poorly documented. Also searching the Net yields no usable results. I found something usable in Nabble from a discussion thread titled "unify ca-certificates installations". There they pretty much confirm the fact that there is a script doing the updating. Luckily they give a hint about the script.
To solve this, the root certificate needs to be in /etc/pki/trust/anchors/. When the certificate files (in PEM-format) are placed there, do the update with update-ca-certificates -command. Example run:
# /usr/sbin/update-ca-certificates
2 added, 0 removed.
The script, however, does not process revocation lists properly. I didn't find anything concrete about them, except manually creating symlinks to /var/lib/ca-certificates/openssl/ -directory.
Example of verification failing:
# openssl verify -crl_check_all test.certificate.cer
test.certificate.cer: CN = test.site.com
error 3 at 0 depth lookup:unable to get certificate CRL
To get this working, we'll need a hash of the revocation list. The hash value is actually same than the certificate hash value, but this is how you'll get it:
openssl crl -noout -hash -in /etc/pki/trust/anchors/revoke.crl
Then create the symlink:
ln -s /etc/pki/trust/anchors/revoke.crl \
/var/lib/ca-certificates/openssl/-the-hash-.r0
Now test the verify again:
# openssl verify -crl_check_all test.certificate.cer
test.certificate.cer: OK
Yesh! It works!
Funny how openSUSE chose a completely different way of handling this... and then chose not to document it enough.
Linux Integration Services for Hyper-V 3.5: Network status still degraded
Friday, January 24. 2014
Microsoft announced version 3.5 of Linux Integration Services for Hyper-V. An ISO-image is available for download at Mirosoft's site.
In one of my earlier articles I was wondering if it really matters when Hyper-V indicates the Linux guest status as degraded and tells that an upgrade is required. This version does not change that. Looks like they just added some (weird) new features and improved set of virtulization features for Windows Server 2012 R2, but didn't touch the network code. However, there is a promise of TRIM-support for 2012 R2.
So, the bottom line is: not worth upgrading.
Speedtest.net from Linux CLI
Monday, January 20. 2014
Speedtest.net has pretty much gained The-place-to-test-your-connection-speed -status. It's like Google for doing web searches. There simply is no real competition available.
Mr. Matt Martz (while throwing hot coals) did study their JavaScript-code enough to write their client-API with Python.
The installation into proper directory (recommended: /usr/local/bin/) with proper permissions is this simple:
wget -O speedtest-cli \ https://raw.github.com/sivel/speedtest-cli/master/speedtest_cli.py chmod +x speedtest-cli
The built-in automatic detection of nearest server does not work for me very well. Their recommended nearest server is not in the country I live (Finland), but on Russian side. The network connections over their border aren't that good and it simply does not yield reliable measurements. Not to worry, the CLI-version can do following:
speedtest_cli --list | fgrep Finland
864) Nebula Oy (Helsinki, Finland) [204.35 km]
Now that we know the server ID of a proper point, we can do:
speedtest_cli --server 864
It will yield:
Retrieving speedtest.net configuration...
Retrieving speedtest.net server list...
Testing from Finland Oyj (80.80.80.80)...
Hosted by Nebula Oy (Helsinki) [204.35 km]: 14.782 ms
Testing download speed........................................
Download: 92.69 Mbit/s
Testing upload speed..................................................
Upload: 4.32 Mbit/s
Nice!
Again, thanks Matt for sharing your work with all of us.
Making USB-bootable PLD RescueCD from your Linux
Tuesday, December 31. 2013
PLD RescueCD is my new favorite Linux rescue CD. It has a ton of stuff in it, even the ipmitool from OpenIPMI-project. One of these days, it so happened that I lost my IPMI network access due to own mis-configuration. I just goofed up the conf and oops, there was no way of reaching management interface anymore. If the operating system on the box would have been ok, it might have been possible to do some fixing via that, but I chose not to. Instead I got a copy of PLD and started working.
The issue is, that PLD RescueCD comes as ISO-image only. Well, erhm... nobody really boots CDs or DVDs anymore. To get the thing booting from an USB-stick appeared to be a rather simple task.
Prerequisites
- A working Linux with enough root-access to do some work with USB-stick and ISO-image
- syslinux-utility installed, all distros have this, but not all of them install it automatically. Confirm that you have this or you won't get any results.
- GNU Parted -utility installed, all distros have this. If yours doesn't you'll have to adapt with the partitioning weapon of your choice.
- An USB-stick with capacity of 256 MiB or more, the rescue CD isn't very big for a Linux distro
- WARNING! During this process you will lose everything on that stick. Forever.
- Not all old USB-sticks can be used to boot all systems. Any reasonable modern ones do. If you are failing, please try again with a new stick.
- PLD RescueCD downloaded ISO-file, I had RCDx86_13_03_10.iso
- You'll need to know the exact location (as in directory) for the file
- The system you're about to rescue has a means of booting via USB. Any reasonable modern system does. With old ones that's debatable.
Assumptions used here:
- Linux sees the USB-stick as /dev/sde
- ISO-image is at /tmp/
- Mount location for the USB-stick is /mnt/usb/
- Mount location for the ISO-image is /mnt/iso/
- syslinux-package installs it's extra files into /usr/share/syslinux/
- You will be using the 32-bit version of PLD Rescue
On your system those will most likely be different or you can adjust those according to your own preferences.
Information about how to use syslinux can be found from SYSLINUX HowTos.
Steps to do it
- Insert the USB-stick into your Linux-machine
- Partition the USB-stick
- NOTE: Feel free to skip this if you already have a FAT32-partition on the stick
- Steps:
- Start GNU Parted:
parted /dev/sde - Create a MS-DOS partition table to the USB-stick:
mktable msdos - Create a new 256 MiB FAT32 partition to the USB-stick:
mkpart pri fat32 1 256M - Set the newly created partition as bootable:
set 1 boot on - End partitioning:
quit - Format the newly created partition:
mkfs.vfat -F 32 /dev/sde1 - Copy a syslinux-compatible MBR into the stick:
dd if=/usr/share/syslinux/mbr.bin of=/dev/sde conv=notrunc bs=440 count=1 - Install syslinux:
syslinux /dev/sde1 - Mount the USB-stick to be written into:
mount /dev/sde1 /mnt/usb/ - Mount the ISO-image to be read:
mount /tmp/RCDx86_13_03_10.iso /mnt/iso/ -o loop,ro - Copy the ISO-image contents to the USB-stick:
cp -r /mnt/iso/* /mnt/usb/ - Convert the CD-boot menu to work as USB-boot menu:
mv /mnt/usb/boot/isolinux /mnt/usb/syslinux - Take the 32-bit versions into use:
cp /mnt/usb/syslinux/isolinux.cfg.x86 /mnt/usb/syslinux/syslinux.cfg - Umount the USB-stick:
umount /mnt/usb - Umount the ISO-image:
umount /mnt/iso - Un-plug the USB-stick and test!
Result
Here is what a working boot menu will look like:
Like always, any comments or improvements are welcome. Thanks Arkadiusz for your efforts and for the great product you're willing to share with rest of us. Sharing is caring, after all!
Bug in Linux 3.11: Netfilter MASQUERADE-target does not work anymore
Wednesday, October 9. 2013
This is something I've been trying to crack ever since I installed Fedora 19 alpha into my router. My HTTP-streams do not work. At all. Depending on the application and its retry-policy implementation some things would work, some won't. Examples:
- Playstation 3 updates: Updates load up to 30% and then nothing, this one I mistakenly thought was due to PS3 firmware update
- YLE Areena: No functionality after first 10-40 seconds
- Netflix: Poor picture quality, HD pretty much never kicks in, super-HD... dream on.
- Spotify: Works ok
- Ruutu.fi: Endless loop of commercials, the real program never starts
- Regular FTP-stream: Hang after first bytes
My Fedora 19 Linux is a router connecting to Internet and distributing the connection to my home LAN via NAT. The IPtables rule is:
iptables -t nat -A POSTROUTING -o em1 -j MASQUERADE
I found an article with title IPTables: DNAT, SNAT and Masquerading from LinuxQuestions.org. It says:
"SNAT would be better for you than MASQUERADE, but they both work on outbound (leaving the server) packets. They replace the source IP address in the packets for their own external network device, when the packet returns, the NAT function knows who sent the packet and forwards it back to the originating workstation inside the network."
So, I had to try that. I changed my NAT-rule to:
iptables -t nat -D POSTROUTING -o em1 -j MASQUERADE
iptables -t nat -A POSTROUTING -o em1 -j SNAT --to-source 80.my.source.IP
... and everything starts to work ok! I've been using the same masquerade-rule for at least 10 years without problems. Something must have changed in Linux-kernel.
I did further studies with this problem. On a remote server I did following on a publicly accessible directory:
# dd if=/dev/urandom of=random.bin bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 1.76654 s, 5.9 MB/s
It creates a random file of 10 MiB. For testing purposes, I can load the file with wget-utility:
# wget http://81.the.other.IP/random.bin
Connecting to 81.the.other.IP:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10485760 (10M) [application/octet-stream]
Saving to: `random.bin'
100%[=================>] 10,485,760 7.84M/s in 1.3s
2013-10-08 17:06:02 (7.84 MB/s) - `random.bin' saved [10485760/10485760]
No problems. The file loads ok. The speed is good, nothing fancy there. I change the rule back to MASQUERADE and do the same thing again:
# wget http://81.the.other.IP/random.bin
Connecting to 81.the.other.IP:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10485760 (10M) [application/octet-stream]
Saving to: `random.bin'
10% [=======> ] 1,090,200 --.-K/s eta 85m 59s
After waiting for 10 minutes, there was no change in the download. wget simply hung there and would not process without manual intervention. Its official: masquerade is busted.
Me finding a bug in Linux kernel is almost impossible. I'm not a kernel developer, or anything, but anything I try finds nothing from the net. So I had to double check to rule out following:
- Hardware:
- Transferring similar file from router-box to client works fully. I tested a 100 MiB file. No issues with my LAN or the client computer.
- Transferring similar file from outside-server to router-box works fully. I tested a 100 MiB file. No issues with my Internet connection.
- When not NATing, everything works ok. Based on this I don't suspect any hardware issues.
- There is no difference in my home if using WLAN or Ethernet. The problem is related to my POSTROUTING-setting.
- IPv4:
- I have a SixXS IPv6-tunnel at my disposal. Transferring a 100 MiB file from outside-server via IPv6 to the same a IPv4 NATed client works fully. No issues.
- My original claim is that MASQUERADE is broken, SNAT works. Functioning IPv6 connection supports that claim.
To further see if it would be a Fedora-thing, or affecting entire Linux, I took official Linux 3.11.4 source code and Fedora kernel-3.11.3-201.fc19.src.rpm and ran a diff:
# diff -aur /tmp/linux.orig/linux-3.11.4/net/ipv4/netfilter \
/tmp/linux.fc19/linux-3.11/net/ipv4/netfilter
Nothing. No differences encountered. Looks like I have to file a bug report to Fedora and possibly Netfilter-project. Looking at the change log of /net/ipv4/netfilter/ipt_MASQUERADE.c reveals absolutely nothing, the change must be somewhere else.
NTPd vs. Chrony
Monday, August 19. 2013
In my Fedora 19 I've been wondering why my NTPd does not start on boot. It used to do so couple of Fedora installations ago. This is not a big deal, so I've been mostly ignoring it. Today I dug up some energy to investigate.
The reason was much simpler than I tought. On my very short checklist were:
- Confirm that systemd has ntpd.service enabled, it was.
- Confirm that ntpd.service has a dependency to start the service after network interfaces are up, it was chained to do a single ntpdate update and start the daemon after it.
- Needed interfaces have not been blocked and/or needed interfaces have been enabled in config, everything was out-of-the-box: all network interfaces allowed.
The daemon even had the panic-threshold disabled in the config, so it wouldn't choke on startup if time was badly off for some reason. I found no reason for the daemon to start.
However, doing a search for ntpd in /usr/lib/systemd/system revealed what was going on. chronyd.service has Conflicts=ntpd.service in the service description. WTF?! What the hell is chronyd?
According to http://chrony.tuxfamily.org/ it is "a pair of programs which are used to maintain the accuracy of the system clock on a computer". Sounds like a NTPd to me. Running netstat confirmed the fact:
# netstat -nap | fgrep :123
udp 0 0 0.0.0.0:123 0.0.0.0:* 666/chronyd
udp6 0 0 :::123 :::* 666/chronyd
The daemon does bind to NTP-ports. To get chronyd running properly, all I had to do was add proper time source and allowed updates from my LAN with allow-directives.
That's it!
Linux failing to mount iSCSI on boot
Thursday, August 15. 2013
My Fedora 19 failed to boot if I had an entry for an iSCSI-mount in /etc/fstab. During boot the system just fell to emergency mode. To get the box to boot, I simply did a "stupd man's solution", and commented the line out. This is what happens if I have the standard line in fstab:
My fstab line is:
/dev/qnap /mnt/qnap ext4 defaults 1 0
It took me a while to get back to the issue and investigate, it was that bad. This is the clue I found on Fedora project's documentation about iSCSI. They said, that any iSCSI-volumes should be mounted with a special flag _netdev. I changed to that, and hey presto! During bootup, it first does something and then mounts the iSCSI-drive. I merged those two occurrences into a single photo:
It works! I'm so happy about this. For clarity, the fstab-line is:
/dev/qnap /mnt/qnap ext4 _netdev 0 0
Own RPM package: Make symlink survive update/freshen
Friday, August 9. 2013
During my ventures in the Linux-land, I constantly package and re-package RPMs. Sometimes to introduce new functionality to existing package or to simply get a newer version that distro vendor is prepared to offer. Number of times I've created packaging to software that is not in the distro at all.
Another thing I love using are symlinks. I can have newer and older package of a software and can simply switch with updating the symlink into correct version. When I combined those two, it bit me in the ass.
I had quite simple script-blocks to handle the symlink:
%post
cd /my/package/directory/
%{__rm} -f my.cool.symlink-name
ln -s package/library/my.cool my.cool.symlink-name
%preun
%{__rm} -f /my/package/directory/my.cool.symlink-name
On install, that worked, but on update/freshen there was no symlink left. I was puzzled, why is that? Little bit of googling revealed two pieces of information: RPM spec-file documentation about scripts, especially the Install/Erase-time Scripts -part and 2nd the Fedora Project's packaging information, especially the scriptlet ordering. I'll abbreviate the Fedora's ordering here omitting the non-interesting parts:
- %pre of new package
This is the part where my script confirms that the symlink exists. - (package install)
- %post of new package
- %preun of old package
During update/freshen, this is the part where my script removes the symlink created in 1.)
Crap! - (removal of old package)
- %postun of old package
Further reading of RPM spec-docs said "the argument passed to version 1.0's scripts is 1". Ok, nice to how, but now what? How can I utilize the information? What is the exact syntax for the script? The only usable information I found was in the Fedora packaging instructions, there was an example:
%preun
if [ $1 = 0 ] ; then
/sbin/install-info --delete %{_infodir}/%{name}.info %{_infodir}/dir || :
fi
So this was the thing I had to try. My solution is to change the %preun-block:
%preun
if [ $1 -lt 1 ] ; then
# This is really an un-install, not deleting previous version on update
%{__rm} -f /my/package/directory/my.cool.symlink-name
fi
I did that and upgraded the package. Poooof! The symlink was gone like there was no change at all. WHY? I upgraded the revision number of the package and upgraded again. NOW it worked! Nice.
There is a simple explanation what happened. It says in the Fedora project's order-list that "%preun of old package". OLD package! It works starting from the next update, but not on the first one.
Anyway I was delighted to get that one sorted.
Handling /var/run with systemd
Tuesday, August 6. 2013
Previously I've studied the init.d replacement systemd.
Update 4th Jun 2017: See the new version
To my surprise, my contraption from the previous article didn't survive a reboot. WTF?! It turned out that in Fedora 19 the /var/run/
is a symlink into /run/
which has been turned into tmpfs. Goddamnit! It literally means, that it is pointless to create /var/run/<the daemon name here>/
with correct permissions in RPM spec-file. Everything will be wiped clean on next reboot anyway.
So, I just had to study the systemd some more.
This is my version 2 (the Unit
and Install
-parts are unchanged):
[Service]
Type=forking
PrivateTmp=yes
User=nobody
Group=nobody
# Run ExecStartPre with root-permissions
PermissionsStartOnly=true
ExecStartPre=-/usr/bin/mkdir /var/run/dhis
ExecStartPre=/usr/bin/chown -R nobody:nobody /var/run/dhis/
# Run ExecStart with User=nobody / Group=nobody
ExecStart=/usr/sbin/dhid -P /var/run/dhis/dhid.pid
PIDFile=/var/run/dhis/dhid.pid
The solution is two-fold. First an ExecStartPre
-directive is required. It allows to run stuff before actually executing the deamon. My first thing to do is create a directory, the minus sign before the command says to ignore any possible errors during creation. The target is mainly to ignore any errors from the fact that creation would fail due to the directory already existing. Anyway, all errors are ignored regardless of the reason.
The second command to run is to make sure that permissions are set correctly for my daemon to create a PID-file into the directory created earlier. That must succeed or there will be no attempt to start the daemon. chown
ing the directory will fail if the directory does not exist, or any other possible reason.
Sounds nice, huh? Initially I couldn't get that working. It was simply due to reason, that the entire Service
-part is run as the user pointed by the User=nobody
and Group=nobody
-directives. That user was intentionally chosen, because it has very limited permission anywhere or anything. Now it cannot create any new directories into/var/run/
. Darn!
This where the solution's 2nd part comes in. Somebody designing the systemd thought about this. Using the PermissionsStartOnly
-directive does the security context switch at the last moment before starting the daemon. This effectively changes the default behavior to allow running Service
-part as root, except for the daemon. Exactly what I was looking for! Now my daemon starts each and every time. Even during boot.
Another thing which I noticed, is that when I edit a systemd service-file, the changes really don't affect before I do a systemctl --system daemon-reload
. It was a big surprise to me, after all in traditional init.d everything was effective immediately.
PS.
Why cronie does not create a PID-file? I had an issue in CentOS where I had not one, but two cron-daemons running at the same time. This is yet another reason to go systemd, it simply works better with ill-behaving deamons like cronie.
Converting classic init.d startup script into new systemd
Wednesday, July 3. 2013
I have couple of own daemons running on my Linux-box. Now that all the distros are going systemd, my scripts are becoming obsolete. Sure, the systemd can piggy-back into old init.d-scripts, but ... I'd rather have them converted to the new way.
Lennart Poettering's blog has a helpful article, which got me started on my project. Also the manual pages for systemd (systemd.service and systemd.exec) proved a very valuable reference.
My daemon is pretty much from the trivial end of daemons. It runs as nobody-user to prevent it from disallowing access to number of places in case something/somebody breaks it. It does the classic fork on start and parent process simply exits. Fortunately systemd programmers anticipated that and there is a perfect support for such startup sequence.
Here is my example. I simply placed a file named dhid.service into directory /usr/lib/systemd/system/. Then I could interface with it by systemctl-command. Example:
# systemctl status dhid.service
dhid.service - DHIS client for keeping track of changing dynamic IP addresses in DNS
Loaded: loaded (/usr/lib/systemd/system/dhid.service; disabled)
Active: active (running) since Wed 2013-07-03 15:26:03 EEST; 928ms ago
Process: 32355 ExecStart=/usr/sbin/dhid -P /var/run/dhis/dhid.pid (code=exited, status=0/SUCCESS)
Main PID: 32356 (dhid)
CGroup: name=systemd:/system/dhid.service
└─32356 /usr/sbin/dhid -P /var/run/dhis/dhid.pid
Jul 03 15:26:03 samba dhid[32356]: daemon started
My entire file is here:
[Unit]
Description=DHIS client for keeping track of changing dynamic IP addresses in DNS
After=syslog.target network.target
[Service]
Type=forking
PrivateTmp=yes
User=nobody
Group=nobody
ExecStart=/usr/sbin/dhid -P /var/run/dhis/dhid.pid
PIDFile=/var/run/dhis/dhid.pid
[Install]
WantedBy=multi-user.target
It is really that simple! To make the daemon to start on bootup, just use the systemctl enable dhid.service -command.
udev wrangling
Tuesday, June 25. 2013
Most Linux distros have udev. It has been around a while and is pretty much the way of handling physical devices in your box.
In The Old Age™ making a device to be something was very simple. /dev was in regular filesystem and could have permissions/symlinks/whatever set by admins. During modern era creating a symlink or setting permissions is bit more complex. The steps are:
- Identify the device
- Figure out the identifying attributes from udev
- Choose an operation / operations to be executed when the device is found
- This can be during boot or plug'n'play / USB
- Bring it all together in a configuration file readable by udev
An example:
External USB-drive/-stick can have pretty much any drive letter assigned into it by SCSI-subsystem during plugin. It can be /dev/sde today and /dev/sdf tomorrow. Trying to figure out the drive letter each time it is plugged in is both tedious and unnecessary. With (simple?) udev-wrangling you can have a /dev/myownusb to access it every time the drive is plugged in. Steps:
- Identify
- lsusb is your friend, from the output it is possible to determine that:
Bus 001 Device 007: ID 1941:8021 My C00l USB-drive - Today USB-bus 001 device 007 is the drive. What if you plug it into a different USB-port next time? We need to find identifying attribute/attributes to make configuring possible.
- If we assume that the drive is /dev/sdf this time, all the udev-attributes can be displayed with a:
udevadm info --query=all --name=/dev/sdf --attribute-walk - It will reveal a drive serial number in a format similar to:
ATTRS{serial}=="0000002CE09310500C1B" - The operation we'd like to be done when such a USB-device with a matching serial number is plugged into the computer is a symlink.
- The final step to get this configured would be to create a file into /etc/udev/rules.d/ with a suitable name.
- I chose my configuration to be /etc/udev/rules.d/99-mylocalrules.
- The file will contain a single line with identifying information and the operation. Example:
SUBSYSTEMS=="usb", ATTRS{serial}=="0000002CE09310500C1B", KERNEL=="sd?1", SYMLINK+="myownusb" - That literally reads: Whenever a new device is introduced into USB-subsystem with suitable serial number and having a partition, the 1st partition will be symlinked into udev with name "myownusb"
To get the rule into effect you need to run:
udevadm trigger
It is not necessary to unplug an already working drive. Just confirm that it worked:
ls -l /dev/myownusb
... or similar. Then just mount:
mount /dev/myownusb /mnt/myownusb
Another example:
I have a weather station connected into my Linux via USB-cable. There is no point of accessing it as a root, but out-of-the-box that's the only way to go. I need to chgrp the device after every boot for regular users to gain access into it.
With above process my identifying factor is the USB ID of the device and operation is to chgrp the device with a suitable group to allow access for those users belonging into the group. The rule is:
SUBSYSTEMS=="usb", ATTR{idVendor}=="1941", ATTR{idProduct}=="8021", GROUP="110"
Yet again the udev-rule reads: Whenever a new device is introduced into USB-subsystem with vendor ID of 0x1941 and product ID of 0x8021 the newly created udev-device will have a group with id 110. I prepared a group with groupadd and confirmed that it exists:
# getent group 110
WH-1080usb:*:110:itsme
After a udevadm trigger the result can be confirmed:
# ls -l /dev/bus/usb/001/007
crw-rw-r--. 1 root WH-1080usb 189, 6 Jun 19 10:07 /dev/bus/usb/001/007
The long(ish) path into the device comes from the lsusb output, it reads:
Bus 001 Device 007: ID 1941:8021 Dream Link WH1080 Weather Station
... and can be also translated as /dev/bus/usb/001/007. Simple, huh?