dhcpcd-discuss

Re: problem with "nobackground" mode on NetBSD current?

Roy Marples

Fri Sep 11 12:32:26 2020

Hi Rob

On 10/09/2020 23:37, Rob Newberry wrote:
I'm building my own "embedded" NetBSD system images (running on a Raspberry Pi), and running dhcpcd under the control of my own network management process.

That network management process does the following:

	- runs (and allows to exit) "ifconfig" to enable bwfm0 (I haven't debugged why wpa_supplicant doesn't do this on its own, but if I don't do this, then I can't even scan)

		/sbin/ifconfig bwfm0 up


There are reasons why wpa_supplicant does not do this, but that's fine. It should still work because by default dhcpcd *will* do this and wpa_supplicant will then start scanning.


	- writes a simple /etc/wpa_supplicant.conf file

	- runs wpa_supplicant (which doesn't fork by default, and I capture it's output into a file)

		/sbin/wpa_supplicant -i bwfm0 -c /etc/wpa_supplicant.conf -d -d

	- runs dhcpcd in "nobackground" mode (I also capture it's output to a file)

		/sbin/dhcpcd -d -B bwfm0


Why? Both wpa_supplicant and dhcpcd have options to redirect output to files and should work fine when backgrounded.
I can see that you're using dhcpcd-9.2.0 so you already have the stdio fixes.
Anyway....

When it runs, dhcpcd dies with a few error messages:

	Sep 10 04:26:08  dhcpcd[824]: main: pidfile_lock -1: Invalid argument
	Sep 10 04:26:08  dhcpcd[824]: ps_sendpsmmsg: Bad file descriptor
	Sep 10 04:26:08  dhcpcd[824]: main: control_stop: Bad file descriptor
	Sep 10 04:26:08  dhcpcd[824]: ps_sendpsmmsg: Bad file descriptor

Now, first off, this didn't happen "before" -- and by "before" I mean that I was working on this a few months ago, not long after NetBSD 9.0 shipped.  Certainly before the new dhcpcd was integrated -- but I'm sure there are plenty of other changes in NetBSD-current right now.

dhcpcd with priviledge separation will be new for NetBSD-10.
I've comitted a fix here which tidies the errors up a litte, so you should just get the pidfile_lock error now.
https://roy.marples.name/cgit/dhcpcd.git/commit/?id=f4b5b1b9497015888c9dc69dc6428fe71f983ab1


I've debugged a little bit, checking for all the obvious stuff -- "/var/run/dhcpcd" is present:

	# ls -l /var/run/dhcpcd
	drwxrwxrwx  3 root  wheel  512 Sep 10 04:27 /var/run/dhcpcd

Prior to starting up, the directory is empty.  But at the time of death, the "pid" file that it's complaining about is there:

	# ls -l /var/run/dhcpcd/
	total 1
	-rw-r--r--  1 root  wheel  4 Sep 10 04:26 bwfm0.pid

	# cat /var/run/dhcpcd/bwfm0.pid
	821

	# ps ax | grep dhcpcd
	821 tty00 Z    0:00.00 (dhcpcd)

And the "db" directory is there as well (but empty):

	# ls -ld /var/db/dhcpcd
	drwxrwxrwx  2 root  wheel  512 Sep 10 04:25 /var/db/dhcpcd

This looks wrong:
$ ls -ld /var/db/dhcpcd
drwxr-xr-x  2 root  wheel  4 Aug 30 08:54 /var/db/dhcpcd


What's REALLY interesting to me, though, is that if, in this current state, I run dhcpcd AGAIN, directly from the command line, with the same arguments, it works just fine:

	# /sbin/dhcpcd -d -B bwfm0
	dhcpcd-9.2.0 starting
	spawned privileged actioneer on PID 938
	spawned network proxy on PID 1133
	spawned controller proxy on PID 934
	chrooting to `/var/chroot/dhcpcd' as _dhcpcd
	bwfm0: executing `/libexec/dhcpcd-run-hooks' PREINIT
	spawned listener fe80::ba27:ebff:fedb:539 on PID 986
	bwfm0: connected to Access Point `dudenet'
	bwfm0: executing `/libexec/dhcpcd-run-hooks' CARRIER
	DUID 00:01:00:01:26:ec:69:6f:b8:27:eb:db:05:39
	bwfm0: IAID eb:db:05:39
	bwfm0: delaying IPv6 router solicitation for 0.3 seconds
	bwfm0: delaying IPv4 for 0.9 seconds
	bwfm0: soliciting an IPv6 router
	bwfm0: sending Router Solicitation
	bwfm0: reading lease `/var/db/dhcpcd/bwfm0-dudenet.lease'
	bwfm0: soliciting a DHCP lease
	bwfm0: sending DISCOVER (xid 0xd86ffd41), next in 4.4 seconds
	bwfm0: offered 10.0.1.141 from 10.0.1.1
	bwfm0: sending REQUEST (xid 0xd86ffd41), next in 3.7 seconds
	bwfm0: acknowledged 10.0.1.141 from 10.0.1.1
	bwfm0: adding IP address 10.0.1.141/24 broadcast 10.0.1.255
	bwfm0: sending Router Solicitation
	bwfm0: DAD completed for 10.0.1.141
	bwfm0: leased 10.0.1.141 for 86400 seconds
	bwfm0: renew in 43200 seconds, rebind in 75600 seconds
	bwfm0: writing lease `/var/db/dhcpcd/bwfm0-dudenet.lease'
	bwfm0: IP address 10.0.1.141/24 already exists
	bwfm0: adding route to 10.0.1.0/24
	bwfm0: adding default route via 10.0.1.1
	bwfm0: ARP announcing 10.0.1.141 (1 of 2), next in 2.0 seconds
	bwfm0: executing `/libexec/dhcpcd-run-hooks' BOUND
	spawned listener 10.0.1.141 on PID 954
	bwfm0: sending Router Solicitation
	bwfm0: ARP announcing 10.0.1.141 (2 of 2)
	bwfm0: sending Router Solicitation
	bwfm0: no IPv6 Routers available
	^Z[2] + Suspended               /sbin/dhcpcd -d -B bwfm0
	# bg
	[2] /sbin/dhcpcd -d -B bwfm0


	# ps ax | grep dhcpc
	 934 ?     I    0:00.01 dhcpcd: [control proxy] bwfm0 [ip4] [ip6]
	 936 ?     I    0:00.03 dhcpcd: bwfm0 [ip4] [ip6]
	 938 ?     I    0:00.02 dhcpcd: [privileged actioneer] bwfm0 [ip4] [ip6]
	 954 ?     I    0:00.01 dhcpcd: [network proxy] 10.0.1.141
	 986 ?     I    0:00.01 dhcpcd: [network proxy] fe80::ba27:ebff:fedb:539
	1133 ?     I    0:00.00 dhcpcd: [network proxy] bwfm0 [ip4] [ip6]
	 821 tty00 Z    0:00.00 (dhcpcd)
	 933 tty00 S    0:00.02 dhcpcd: [launcher]


I'm currently trying to debug what's going on, basically adding some logging around pid_file lock (and I might even need to add some debugging inside libutil and pid_file lock).

But before I dive in, I'm wondering if anyone here has any easier things to try.  I'm not all that familiar with this code -- don't mind getting my hands dirty, but if there's any "did you check this?" or "it's probably this" ideas, I'm glad have them.

Thanks a ton for any guidance.

See process 821? Looks like it's in the zombie state, ie crashed.
Is there a coredump for it? Maybe the outputn files you captures or syslog entries can give a further clue.

Roy

Follow-Ups:
Re: problem with "nobackground" mode on NetBSD current?Rob Newberry
References:
problem with "nobackground" mode on NetBSD current?Rob Newberry
Archive administrator: postmaster@marples.name