problem with "nobackground" mode on NetBSD current?
Rob Newberry
Thu Sep 10 23:37:51 2020
Hi.
I'm building my own "embedded" NetBSD system images (running on a Raspberry Pi), and running dhcpcd under the control of my own network management process.
That network management process does the following:
- runs (and allows to exit) "ifconfig" to enable bwfm0 (I haven't debugged why wpa_supplicant doesn't do this on its own, but if I don't do this, then I can't even scan)
/sbin/ifconfig bwfm0 up
- writes a simple /etc/wpa_supplicant.conf file
- runs wpa_supplicant (which doesn't fork by default, and I capture it's output into a file)
/sbin/wpa_supplicant -i bwfm0 -c /etc/wpa_supplicant.conf -d -d
- runs dhcpcd in "nobackground" mode (I also capture it's output to a file)
/sbin/dhcpcd -d -B bwfm0
When it runs, dhcpcd dies with a few error messages:
Sep 10 04:26:08 dhcpcd[824]: main: pidfile_lock -1: Invalid argument
Sep 10 04:26:08 dhcpcd[824]: ps_sendpsmmsg: Bad file descriptor
Sep 10 04:26:08 dhcpcd[824]: main: control_stop: Bad file descriptor
Sep 10 04:26:08 dhcpcd[824]: ps_sendpsmmsg: Bad file descriptor
Now, first off, this didn't happen "before" -- and by "before" I mean that I was working on this a few months ago, not long after NetBSD 9.0 shipped. Certainly before the new dhcpcd was integrated -- but I'm sure there are plenty of other changes in NetBSD-current right now.
I've debugged a little bit, checking for all the obvious stuff -- "/var/run/dhcpcd" is present:
# ls -l /var/run/dhcpcd
drwxrwxrwx 3 root wheel 512 Sep 10 04:27 /var/run/dhcpcd
Prior to starting up, the directory is empty. But at the time of death, the "pid" file that it's complaining about is there:
# ls -l /var/run/dhcpcd/
total 1
-rw-r--r-- 1 root wheel 4 Sep 10 04:26 bwfm0.pid
# cat /var/run/dhcpcd/bwfm0.pid
821
# ps ax | grep dhcpcd
821 tty00 Z 0:00.00 (dhcpcd)
And the "db" directory is there as well (but empty):
# ls -ld /var/db/dhcpcd
drwxrwxrwx 2 root wheel 512 Sep 10 04:25 /var/db/dhcpcd
What's REALLY interesting to me, though, is that if, in this current state, I run dhcpcd AGAIN, directly from the command line, with the same arguments, it works just fine:
# /sbin/dhcpcd -d -B bwfm0
dhcpcd-9.2.0 starting
spawned privileged actioneer on PID 938
spawned network proxy on PID 1133
spawned controller proxy on PID 934
chrooting to `/var/chroot/dhcpcd' as _dhcpcd
bwfm0: executing `/libexec/dhcpcd-run-hooks' PREINIT
spawned listener fe80::ba27:ebff:fedb:539 on PID 986
bwfm0: connected to Access Point `dudenet'
bwfm0: executing `/libexec/dhcpcd-run-hooks' CARRIER
DUID 00:01:00:01:26:ec:69:6f:b8:27:eb:db:05:39
bwfm0: IAID eb:db:05:39
bwfm0: delaying IPv6 router solicitation for 0.3 seconds
bwfm0: delaying IPv4 for 0.9 seconds
bwfm0: soliciting an IPv6 router
bwfm0: sending Router Solicitation
bwfm0: reading lease `/var/db/dhcpcd/bwfm0-dudenet.lease'
bwfm0: soliciting a DHCP lease
bwfm0: sending DISCOVER (xid 0xd86ffd41), next in 4.4 seconds
bwfm0: offered 10.0.1.141 from 10.0.1.1
bwfm0: sending REQUEST (xid 0xd86ffd41), next in 3.7 seconds
bwfm0: acknowledged 10.0.1.141 from 10.0.1.1
bwfm0: adding IP address 10.0.1.141/24 broadcast 10.0.1.255
bwfm0: sending Router Solicitation
bwfm0: DAD completed for 10.0.1.141
bwfm0: leased 10.0.1.141 for 86400 seconds
bwfm0: renew in 43200 seconds, rebind in 75600 seconds
bwfm0: writing lease `/var/db/dhcpcd/bwfm0-dudenet.lease'
bwfm0: IP address 10.0.1.141/24 already exists
bwfm0: adding route to 10.0.1.0/24
bwfm0: adding default route via 10.0.1.1
bwfm0: ARP announcing 10.0.1.141 (1 of 2), next in 2.0 seconds
bwfm0: executing `/libexec/dhcpcd-run-hooks' BOUND
spawned listener 10.0.1.141 on PID 954
bwfm0: sending Router Solicitation
bwfm0: ARP announcing 10.0.1.141 (2 of 2)
bwfm0: sending Router Solicitation
bwfm0: no IPv6 Routers available
^Z[2] + Suspended /sbin/dhcpcd -d -B bwfm0
# bg
[2] /sbin/dhcpcd -d -B bwfm0
# ps ax | grep dhcpc
934 ? I 0:00.01 dhcpcd: [control proxy] bwfm0 [ip4] [ip6]
936 ? I 0:00.03 dhcpcd: bwfm0 [ip4] [ip6]
938 ? I 0:00.02 dhcpcd: [privileged actioneer] bwfm0 [ip4] [ip6]
954 ? I 0:00.01 dhcpcd: [network proxy] 10.0.1.141
986 ? I 0:00.01 dhcpcd: [network proxy] fe80::ba27:ebff:fedb:539
1133 ? I 0:00.00 dhcpcd: [network proxy] bwfm0 [ip4] [ip6]
821 tty00 Z 0:00.00 (dhcpcd)
933 tty00 S 0:00.02 dhcpcd: [launcher]
I'm currently trying to debug what's going on, basically adding some logging around pid_file lock (and I might even need to add some debugging inside libutil and pid_file lock).
But before I dive in, I'm wondering if anyone here has any easier things to try. I'm not all that familiar with this code -- don't mind getting my hands dirty, but if there's any "did you check this?" or "it's probably this" ideas, I'm glad have them.
Thanks a ton for any guidance.
Rob
Archive administrator: postmaster@marples.name