Re: problem with "nobackground" mode on NetBSD current?
Roy Marples
Fri Sep 11 13:07:18 2020
On 11/09/2020 01:20, Rob Newberry wrote:
I've debugged this down to something unexpected happening in pidfile_lock, which gets called around line 2345 of dhcpcd.c.
But the unexpected part is happening inside the pidfile_lock code in NetBSD's libutil.
It LOOKS to me like pidfile_lock is being called early on (around line 2250).
But by the time we call it a second time, we've forked and (maybe?) switched users.
No.
We double fork and lock the pidfile right away - we are still the root process.
The master process switches user at line 2406.
And this time, when we call pidfile_lock, it goes through a different code path. Looking at libutil's pidfile.c sources, it is finding that pidfile_fd is valid (i.e., it's not -1), and then trying to truncate it.
But this time (after debugging in the kernel), the pidfile_fd file descriptor's "write" permission is gone. And so the call to "ftruncate" fails (in the kernel check for "fp->f_flag & FWRITE"). And that bubbles up to cause the failure.
So we need to track *why* permission was lost.
Certainly this privsep version of dhcpcd has been live in NetBSD since April 2nd
this year and this is the first time this issue has been reported.
To "fix" the issue, I added a call to "pidfile_clean" shortly after the error handling in the first call to pidfile_lock. With that change, the second call to pidfile_lock no longer goes through the "it's already open" path, and since the pidfile isn't open, it opens and locks it and everything works fine.
But I don't know if that's the RIGHT fix. There's now a window of time where the original process (which checked and locked the pidfile) and the child (actually, grand-child I believe) process (re-)locks it. I don't know if that's bad or not.
Certainly this is not the right fix. It's there to enforce another dhcpcd
instance does not run at the same time.
What is the underlying filesystem for /var/run?
Roy
Archive administrator: postmaster@marples.name