Bug in Linux 3.11: Netfilter MASQUERADE-target does not work anymore
Wednesday, October 9. 2013
This is something I've been trying to crack ever since I installed Fedora 19 alpha into my router. My HTTP-streams do not work. At all. Depending on the application and its retry-policy implementation some things would work, some won't. Examples:
- Playstation 3 updates: Updates load up to 30% and then nothing, this one I mistakenly thought was due to PS3 firmware update
- YLE Areena: No functionality after first 10-40 seconds
- Netflix: Poor picture quality, HD pretty much never kicks in, super-HD... dream on.
- Spotify: Works ok
- Ruutu.fi: Endless loop of commercials, the real program never starts
- Regular FTP-stream: Hang after first bytes
My Fedora 19 Linux is a router connecting to Internet and distributing the connection to my home LAN via NAT. The IPtables rule is:
iptables -t nat -A POSTROUTING -o em1 -j MASQUERADE
I found an article with title IPTables: DNAT, SNAT and Masquerading from LinuxQuestions.org. It says:
"SNAT would be better for you than MASQUERADE, but they both work on outbound (leaving the server) packets. They replace the source IP address in the packets for their own external network device, when the packet returns, the NAT function knows who sent the packet and forwards it back to the originating workstation inside the network."
So, I had to try that. I changed my NAT-rule to:
iptables -t nat -D POSTROUTING -o em1 -j MASQUERADE
iptables -t nat -A POSTROUTING -o em1 -j SNAT --to-source 80.my.source.IP
... and everything starts to work ok! I've been using the same masquerade-rule for at least 10 years without problems. Something must have changed in Linux-kernel.
I did further studies with this problem. On a remote server I did following on a publicly accessible directory:
# dd if=/dev/urandom of=random.bin bs=1024 count=10240
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 1.76654 s, 5.9 MB/s
It creates a random file of 10 MiB. For testing purposes, I can load the file with wget-utility:
# wget http://81.the.other.IP/random.bin
Connecting to 81.the.other.IP:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10485760 (10M) [application/octet-stream]
Saving to: `random.bin'
100%[=================>] 10,485,760 7.84M/s in 1.3s
2013-10-08 17:06:02 (7.84 MB/s) - `random.bin' saved [10485760/10485760]
No problems. The file loads ok. The speed is good, nothing fancy there. I change the rule back to MASQUERADE and do the same thing again:
# wget http://81.the.other.IP/random.bin
Connecting to 81.the.other.IP:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 10485760 (10M) [application/octet-stream]
Saving to: `random.bin'
10% [=======> ] 1,090,200 --.-K/s eta 85m 59s
After waiting for 10 minutes, there was no change in the download. wget simply hung there and would not process without manual intervention. Its official: masquerade is busted.
Me finding a bug in Linux kernel is almost impossible. I'm not a kernel developer, or anything, but anything I try finds nothing from the net. So I had to double check to rule out following:
- Hardware:
- Transferring similar file from router-box to client works fully. I tested a 100 MiB file. No issues with my LAN or the client computer.
- Transferring similar file from outside-server to router-box works fully. I tested a 100 MiB file. No issues with my Internet connection.
- When not NATing, everything works ok. Based on this I don't suspect any hardware issues.
- There is no difference in my home if using WLAN or Ethernet. The problem is related to my POSTROUTING-setting.
- IPv4:
- I have a SixXS IPv6-tunnel at my disposal. Transferring a 100 MiB file from outside-server via IPv6 to the same a IPv4 NATed client works fully. No issues.
- My original claim is that MASQUERADE is broken, SNAT works. Functioning IPv6 connection supports that claim.
To further see if it would be a Fedora-thing, or affecting entire Linux, I took official Linux 3.11.4 source code and Fedora kernel-3.11.3-201.fc19.src.rpm and ran a diff:
# diff -aur /tmp/linux.orig/linux-3.11.4/net/ipv4/netfilter \
/tmp/linux.fc19/linux-3.11/net/ipv4/netfilter
Nothing. No differences encountered. Looks like I have to file a bug report to Fedora and possibly Netfilter-project. Looking at the change log of /net/ipv4/netfilter/ipt_MASQUERADE.c reveals absolutely nothing, the change must be somewhere else.