Weird interface selection bug introduced shortly after 9.0.2
Thore Bödecker
Thu Jul 09 23:44:27 2020Good evening,
I've spent the last couple of hours wondering what the heck is going
on with my dhcpv6-pd client since it wasn't receiving any replies and
just timing out after an upgrade.
I upgraded from 9.0.2 to 9.1.2 and that's when my problems began to
show.
This meant, somewhere between the git tags 9.0.2 and 9.1.2 the bug was
introduced.
So I setup a git bisect workflow to quickly build, deploy and test each
commit to identify the first bad commit.
Result:
----
c1e483219a2e67cc7c1d55205494c1b6c0b9376e is the first bad commit
commit c1e483219a2e67cc7c1d55205494c1b6c0b9376e
Author: Roy Marples <roy@xxxxxxxxxxxx>
Date: Thu Apr 23 14:33:48 2020 +0100
Rename ifp->family -> ifp->hwtype so it's less confusing
src/arp.c | 6 ++---
src/bpf.c | 16 ++++++------
src/dhcp.c | 8 +++---
src/dhcpcd.h | 2 +-
src/duid.c | 2 +-
src/if-linux.c | 2 +-
src/if.c | 81 ++++++++++++++++++----------------------------------------
src/ipv6.c | 4 +--
8 files changed, 45 insertions(+), 76 deletions(-)
----
Now to the problem description.
On my router, I have multiple physical and VLAN interfaces.
My provider requires to sent the pppd packets with a VLAN tag, so that
is being handled by that VLAN device.
The pppd client uses this VLAN device to establish a PPPoE session,
which also sets up an IPv6 transfer network between my router and the
peer of my ISP.
Once that is done, there will be a "ppp0" interface present.
Afterwards, through ppp hook and systemd dependency magic, the
dhcpcd binary is being started, properly configured to only do
dhcpv6-pd.
In order to not disturb any other interfaces and make sure it uses the
correct interface under any circumstance, I have used all available
command line arguments and configuration options, as you can see
below.
dhcpcd command line arguments:
----
/usr/bin/dhcpcd -6 -B -f /etc/dhcpcd/nc-ipv6.conf ppp0
----
along with the relevant excerpt from /etc/dhcpcd/nc-ipv6.conf:
----
denyinterfaces <long-list-of-interfaces-that-should-not-be-touched>
allowinterfaces ppp0
nogateway
nohook lookup-hostname
nohook resolv.conf
noipv4ll
noipv4
noipv6rs
ipv6only
waitip 6
interface ppp0
persistent
dhcp6
ipv6
xidhwaddr
ia_pd 1/::/48 <local-if1>/0/64/1 <local-if2>/1/64/1 <local-if3>/2/64/1
----
As you would expect, with dhcpcd 9.0.2 the dhcp6 solicit packets are
being sent directly on the ppp0 interface and everything works just
fine.
However, starting with the commit identified above during git bisect,
this behavior changes into a totally unexpected and wrong state:
Instead of sending the dhcp6 solicit packets over ppp0, it randomly
selects one of my VLAN interfaces (changes between reboots) and sends
the dhcp6 solicit packets on that interface, using the link-local
address on that particular interface along with its mac address.
This in turn (at least in my setup) continues to encapsulate these
dhcp6 solicit packets with an 802.1Q header (VLAN tag) and pushing
them out on the corresponding physical interface.
Obviously they will never reach the peer of my ISP, rendering
dhcpv6-pd functionaly completely broken.
And the most fun part: in the dhcpcd output it states that it will
being doing the dhcp6 solicitation on "ppp0", which was *very*
confusing and actually the reason why it took me so long to figure it
out.
Only after running some tcpdumps I noticed something wrong going on.
I have not looked into the source code since there must be something
*very* funky going on in there to exhibit this kind of behaviour, but
it's reproducibly broken for me. It puzzles me how that can even
happen despite all my denyinterfaces/allowinterfaces and explicit
interface specification on the dhcpcd commandline.
For now I have reverted to the last known-good version on my router,
namely 9.0.2, and blocked this package from further upgrades for now.
It would be much appreciated if someone could look into this bug.
I'm available for testing in case you can't replicate it, feel free to
send patches, point me to a different branch or whatever, I should be
able to build and test from source without much hassle.
Let me know if you need further details or have some questions.
Cheers,
Thore
--
Thore Bödecker
GPG ID: 0xD622431AF8DB80F3
GPG FP: 0F96 559D 3556 24FC 2226 A864 D622 431A F8DB 80F3
Attachment:
signature.asc
Description: PGP signature
| Re: Weird interface selection bug introduced shortly after 9.0.2 | Roy Marples |