>=dhcpcd-7.0.0 makes interface hang on high traffic
Remy Blank
Sat Sep 08 22:04:57 2018Hello, I would like to report an issue that I have been experiencing since dhcpcd-7.0.0, where one interface of a machine acting as a router hangs on high traffic. While this may not seem related to dhcpcd, it is fully reproducible, and have I bisected the issue to a specific dhcpcd commit. Here are details of my setup: - ThinkPad P70, running Gentoo, with 2 network interfaces. - The machine's internal network interface, called "int", is an "Intel Corporation Ethernet Connection (2) I219-LM (rev 31)", driven by the e1000e driver. It is connected to the internal network. - An additional network interface, called "ext", is connected as an ExpressCard. It's a "Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 03)", driven by the r8169 driver. It is connected to the internet. - The "int" network has a static configuration, and is IPv4+IPv6. - The "ext" network runs dhcpcd to get its configuration from the ISP, and is IPv4 only. An IPv6 SIT tunnel runs over it, though. - The machine routes internet traffic to and from machines on the internal network. - The internet connection is 500 Mb/s in, 50 Mb/s out. Now the symptoms. When I run >=dhcpcd-7.0.0 on the "ext" interface, any sustained high-bandwidth traffic between "int" and "ext" makes the "int" interface hang. I can reproduce the issue very easily, by running an internet speed test (e.g. <http://www.speedtest.net/>) on a machine connected to "int". When this happens, the kernel logs the following: e1000e 0000:00:1f.6 int: Detected Hardware Unit Hang: TDH <25> TDT <60> next_to_use <60> next_to_clean <24> buffer_info[next_to_clean]: time_stamp <1002ebe6c> next_to_watch <26> jiffies <1002ec3c0> next_to_watch.status <0> MAC Status <80083> PHY Status <796d> PHY 1000BASE-T Status <3800> PHY Extended Status <3000> PCI Status <10> e1000e 0000:00:1f.6 int: Reset adapter unexpectedly e1000e: int NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None The issue is reproducible across a wide range of kernel versions (tested with 4.4.117, 4.9.90 and 4.14.65). On 4.4 and 4.9 kernels, the interface would hang indefinitely until I restarted it. On 4.14, it resets and recovers by itself, but all open connections break, and traffic is halted until the interface resets. Now, the strange thing is that "int" is *not* the interface on which dhcpcd runs, it's the other one. The issue happens even if I kill dhcpcd after it has set up the network interface, so it must be something in the way it configures the interface, and not the process itself. dhcpcd-6.11.5 does not exhibit these symptoms, and even prolonged high-bandwidth traffic doesn't cause any issues. I have also tried dhcpcd-7.0.8, and the issue is still present there. I have three other identical machines that have only a single network interface, and those run dhcpcd-7.0.1 without any issues. So this must be related to the routing between interfaces. I have bisected the range from 6.11.5 to 7.0.0, and git gave me the following culprit: Rename if_*raw functions to bpf_* so it's more descriptive and move https://roy.marples.name/git/dhcpcd.git/commit/?id=88047988bb0055cbce02e4cece210c8289f4ffa6 This is a large commit making fairly major changes to the BPF code. This might explain why dhcpcd affects another interface that it isn't controlling directly. Maybe a bug in the BPF code causes an infinite loop under certain conditions, making the interface hang. Or the new BPF code triggers a kernel bug. Unfortunately, this is the limit of my investigation skills, as I have zero knowledge of BPF "assembly", so I'm unable to pinpoint the cause more precisely. But I'm happy to perform more tests, or try out patches that could fix the issue or provide more information about it. Thanks for a great piece of software! -- Remy
Attachment:
signature.asc
Description: OpenPGP digital signature
| Re: >=dhcpcd-7.0.0 makes interface hang on high traffic | Roy Marples |