Re: problem with "nobackground" mode on NetBSD current?
Roy Marples
Fri Sep 11 15:20:12 2020
On 11/09/2020 15:04, Rob Newberry wrote:
I'll answer a couple questions here, but since you also responded to my second message (after I'd done some debugging), I'll save more for there.
On 10/09/2020 23:37, Rob Newberry wrote:
I'm building my own "embedded" NetBSD system images (running on a Raspberry Pi), and running dhcpcd under the control of my own network management process.
That network management process does the following:
- runs (and allows to exit) "ifconfig" to enable bwfm0 (I haven't debugged why wpa_supplicant doesn't do this on its own, but if I don't do this, then I can't even scan)
/sbin/ifconfig bwfm0 up
There are reasons why wpa_supplicant does not do this, but that's fine. It should still work because by default dhcpcd *will* do this and wpa_supplicant will then start scanning.
Yes, I remember now. I had this in my code before I got dhcpcd working last year, and then took it out -- but when I ran into this new issue, dhcpcd died before getting very far, so I had to put it back in. I'll remove it once I get dhcpcd working.
(Would appreciate more insight on why wpa_supplicant does not do this, but it's fine offline.)
It's not wpa_supplicants job to up the interface.
The interface being up is an administrative decision.
In the strictest sense, dhcpcd should not do this by default either and there is
an option to turn this off.
But I get away with it becasue dhcpcd has evolved into a network manager of sorts.
- writes a simple /etc/wpa_supplicant.conf file
- runs wpa_supplicant (which doesn't fork by default, and I capture it's output into a file)
/sbin/wpa_supplicant -i bwfm0 -c /etc/wpa_supplicant.conf -d -d
- runs dhcpcd in "nobackground" mode (I also capture it's output to a file)
/sbin/dhcpcd -d -B bwfm0
Why? Both wpa_supplicant and dhcpcd have options to redirect output to files and should work fine when backgrounded.
I can see that you're using dhcpcd-9.2.0 so you already have the stdio fixes.
Anyway....
Because I'm running my own "custom" embedded system, and I want my management process to know when either one of them dies.
The dhcpcd-ui project knows when each of them died.
https://roy.marples.name/cgit/dhcpcd-ui.git/tree/src/libdhcpcd/wpa.c#n172
https://roy.marples.name/cgit/dhcpcd-ui.git/tree/src/dhcpcd-qt/dhcpcd-wi.cpp#n291
https://roy.marples.name/cgit/dhcpcd-ui.git/tree/src/dhcpcd-qt/dhcpcd-qt.cpp#n62
When it runs, dhcpcd dies with a few error messages:
Sep 10 04:26:08 dhcpcd[824]: main: pidfile_lock -1: Invalid argument
Sep 10 04:26:08 dhcpcd[824]: ps_sendpsmmsg: Bad file descriptor
Sep 10 04:26:08 dhcpcd[824]: main: control_stop: Bad file descriptor
Sep 10 04:26:08 dhcpcd[824]: ps_sendpsmmsg: Bad file descriptor
Now, first off, this didn't happen "before" -- and by "before" I mean that I was working on this a few months ago, not long after NetBSD 9.0 shipped. Certainly before the new dhcpcd was integrated -- but I'm sure there are plenty of other changes in NetBSD-current right now.
dhcpcd with priviledge separation will be new for NetBSD-10.
I've comitted a fix here which tidies the errors up a litte, so you should just get the pidfile_lock error now.
https://roy.marples.name/cgit/dhcpcd.git/commit/?id=f4b5b1b9497015888c9dc69dc6428fe71f983ab1
I've debugged a little bit, checking for all the obvious stuff -- "/var/run/dhcpcd" is present:
# ls -l /var/run/dhcpcd
drwxrwxrwx 3 root wheel 512 Sep 10 04:27 /var/run/dhcpcd
Prior to starting up, the directory is empty. But at the time of death, the "pid" file that it's complaining about is there:
# ls -l /var/run/dhcpcd/
total 1
-rw-r--r-- 1 root wheel 4 Sep 10 04:26 bwfm0.pid
# cat /var/run/dhcpcd/bwfm0.pid
821
# ps ax | grep dhcpcd
821 tty00 Z 0:00.00 (dhcpcd)
And the "db" directory is there as well (but empty):
# ls -ld /var/db/dhcpcd
drwxrwxrwx 2 root wheel 512 Sep 10 04:25 /var/db/dhcpcd
This looks wrong:
$ ls -ld /var/db/dhcpcd
drwxr-xr-x 2 root wheel 4 Aug 30 08:54 /var/db/dhcpcd
I can change. The reason I did this was this note in "dist/README.md":
dhcpcd-9 requires this directory and contents to be writeable by the
unprivileged user (default _dhcpcd, _dhcp or dhcpcd)
The unprivileged user isn't root, and I don't think it's a member of "wheel" either -- so is this comment inaccurate?
I've removed that statement. It no longer applies.
See process 821? Looks like it's in the zombie state, ie crashed.
Is there a coredump for it? Maybe the outputn files you captures or syslog entries can give a further clue.
That's the original one I ran. It died because of pidfile_lock failure and then exited. It's stuck because the parent process (my custom "network" [really, embedded system] management process hasn't called waitpid to clean it up yet). There's no core file because it didn't die from any kind of fault.
Interesting.
Let's pick this up in your other email as I think it's is done as far as dhcpcd
is concerned for now.
Roy
Archive administrator: postmaster@marples.name