dhcpcd-discuss

Re: dhcpcd runs into timeout while probing offered ip via arp

Roy Marples

Wed Nov 26 10:40:03 2008

On Wed, 2008-11-26 at 09:54 +0100, Jean Wolter wrote:
> It does help - dhcpcd now actually uses the montonic clock provided by
> the underlying kernel. But it does not solve the problem we are
> experiencing.

http://www.uclibc.org/lists/busybox/2007-July/028051.html
uclibc-0.9.28 supports it but uclibc-0.9.27 doesn't.

So I guess I'd rather that you upgraded to uclibc-0.9.28

> I instrumented dhcpcd a little bit and found the problem. There are
> three timers, "timeout", "exit", and "stop". The arp-probing code uses
> the "timeout"-timer. The following trace shows function names, states
> and used timers (the line numbers are a little bit off due to the
> additional code). The following happens:
> 
> - dhcpcd gets an offer
> Tue Nov 25 20:16:57 2008: eth2: offered 192.168.1.33 from 192.168.1.1
> Tue Nov 25 20:16:57 2008: eth2: handle_timeout:1162 - state STATE_PROBING
> 
> - dhcpcd starts probing the ip address via arp, it uses the timer
>   "timeout" for it
> Tue Nov 25 20:16:58 2008: eth2: checking 192.168.1.33 is available on attached networks
> Tue Nov 25 20:16:58 2008: eth2: sending ARP probe (1 of 3), next in 1.30 seconds
> Tue Nov 25 20:16:58 2008: eth2: wait_for_fd:792 - state STATE_PROBING
> Tue Nov 25 20:16:58 2008: eth2: get_lowest_timer:777 - state STATE_PROBING
> Tue Nov 25 20:16:58 2008: eth2: get_lowest_timer: lowest timer: timeout (1.30 seconds)
> Tue Nov 25 20:16:58 2008: eth2: wait_for_fd:798 - state STATE_PROBING
> Tue Nov 25 20:16:58 2008: eth2: polling for 1299 milli seconds
> 
> - while waiting for an arp reply or a timeout the netlink fd
>   becomes ready, handle_link() ist invoked
> Tue Nov 25 20:16:58 2008: eth2: handle_link:1768 - state STATE_PROBING
> 
> - handle_link() resets the timer "timeout" unconditionally
> Tue Nov 25 20:16:58 2008: eth2: resetting timeout
> 
> handle_link(struct if_state *state)
> {
>         int retval;
> 
>         retval = link_changed(state->interface);
>         if (retval == -1) {
>                 logger(LOG_ERR, "link_changed: %s", strerror(errno));
>                 return -1;
>         }
>         if (retval == 0)
>                 return 0;
> 
>         logger(LOG_WARNING, "resetting timeout");
>         timerclear(&state->timeout);
>         ...
> 
> - handle_link() does not detect anything interesting, it enters the
>   default branch of the switch statement, state->carrier is 1
>   (LINK_UP), so it simply leaves the function
> 
> Tue Nov 25 20:16:58 2008: eth2: carrier_status default, state->carrier==1
> 
> - now the next timer is the "exit"-timer and dhcpcd times out after
>   40s without leaving the PROBING state and therefore terminates
> 
> Tue Nov 25 20:16:58 2008: eth2: wait_for_fd:792 - state STATE_PROBING
> Tue Nov 25 20:16:58 2008: eth2: get_lowest_timer:777 - state STATE_PROBING
> Tue Nov 25 20:16:58 2008: eth2: get_lowest_timer: lowest timer: exit (40.00 seconds)
> Tue Nov 25 20:16:58 2008: eth2: wait_for_fd:798 - state STATE_PROBING
> Tue Nov 25 20:16:58 2008: eth2: polling for 39996 milli seconds
> 
> So the bug seems to be the unconditional timeout reset in
> handle_link().

Could be.
I've attached a patch that may help you, but I need to regression test
it a little before applying.

> Some minor remarks: 
> - It might help to have a compile time warning about the missing
>   clock_monotonic support. It took me a while to actually realize, that
>   the code is not included at all. So something along these lines
>   maybe?
> 
> get_monotonic(struct timeval *tp)
> {
>         static int posix_clock_set = 0;
> #if defined(_POSIX_MONOTONIC_CLOCK) && defined(CLOCK_MONOTONIC)
> ...
> #else
> #warning "host does not support a monotonic clock - timing can skew"
> #endif

Could be done, but #warning is a GCC extensions iirc and I know for sure
that PCC compiles dhcpcd just fine. I can investigate a little.

> - some plattforms supply clock_gettime(), but it returns an error if
>   invoked for CLOCK_MONOTONIC. The code does not catch this:
> 
>         if (posix_clock_set == 0) {
>                 if (clock_gettime(CLOCK_MONOTONIC, &ts) == 0) {
>                         posix_clock = CLOCK_MONOTONIC;
>                         clock_monotonic = 1;
>                 }
>                 posix_clock_set = 1;
>         }
>         ...
>         /* Something above failed, so fall back to
>            gettimeofday */
>         if (!posix_clock_set) {
>                 logger(LOG_WARNING, NO_MONOTONIC);
>                 posix_clock_set = 1;
>         }
>         return gettimeofday(tp, NULL);
>   So even if clock_gettime() fails, the warning does not show up.

Another attached patch should address this also.

Let me know how they work for you!

Thanks

Roy
Index: client.c
===================================================================
--- client.c	(revision 1097)
+++ client.c	(working copy)
@@ -1736,7 +1736,6 @@
 	if (retval == 0)
 		return 0;
 
-	timerclear(&state->timeout);
 	switch (carrier_status(state->interface->name)) {
 	case -1:
 		logger(LOG_ERR, "carrier_status: %s", strerror(errno));
@@ -1746,6 +1745,7 @@
 			logger(LOG_INFO, "carrier lost");
 			state->carrier = LINK_DOWN;
 			do_socket(state, SOCKET_CLOSED);
+			timerclear(&state->timeout);
 			if (state->state != STATE_BOUND)
 				timerclear(&state->stop);
 		}
@@ -1755,6 +1755,7 @@
 			logger(LOG_INFO, "carrier acquired");
 			state->state = STATE_RENEW_REQUESTED;
 			state->carrier = LINK_UP;
+			timerclear(&state->timeout);
 			timerclear(&state->stop);
 			return 1;
 		}
Index: common.c
===================================================================
--- common.c	(revision 1097)
+++ common.c	(working copy)
@@ -201,9 +201,8 @@
 	if (posix_clock_set == 0) {
 		if (clock_gettime(CLOCK_MONOTONIC, &ts) == 0) {
 			posix_clock = CLOCK_MONOTONIC;
-			clock_monotonic = 1;
+			clock_monotonic = posix_clock_set = 1;
 		}
-		posix_clock_set = 1;
 	}
 
 	if (clock_monotonic) {
@@ -225,9 +224,8 @@
 	if (posix_clock_set == 0) {
 		if (mach_timebase_info(&info) == KERN_SUCCESS) {
 			factor = (double)info.numer / (double)info.denom;
-			clock_monotonic = 1;	
+			clock_monotonic = posix_clock_set = 1;
 		}
-		posix_clock_set = 1;
 	}
 	if (clock_monotonic) {
 		nano = mach_absolute_time();
Index: client.c
===================================================================
--- client.c	(revision 1097)
+++ client.c	(working copy)
@@ -1736,7 +1736,6 @@
 	if (retval == 0)
 		return 0;
 
-	timerclear(&state->timeout);
 	switch (carrier_status(state->interface->name)) {
 	case -1:
 		logger(LOG_ERR, "carrier_status: %s", strerror(errno));
@@ -1746,6 +1745,7 @@
 			logger(LOG_INFO, "carrier lost");
 			state->carrier = LINK_DOWN;
 			do_socket(state, SOCKET_CLOSED);
+			timerclear(&state->timeout);
 			if (state->state != STATE_BOUND)
 				timerclear(&state->stop);
 		}
@@ -1755,6 +1755,7 @@
 			logger(LOG_INFO, "carrier acquired");
 			state->state = STATE_RENEW_REQUESTED;
 			state->carrier = LINK_UP;
+			timerclear(&state->timeout);
 			timerclear(&state->stop);
 			return 1;
 		}

Attachment: signature.asc
Description: This is a digitally signed message part


Follow-Ups:
Re: dhcpcd runs into timeout while probing offered ip via arpJean Wolter
References:
dhcpcd runs into timeout while probing offered ip via arpJean Wolter
Re: dhcpcd runs into timeout while probing offered ip via arpRoy Marples
Re: dhcpcd runs into timeout while probing offered ip via arpJean Wolter
Archive administrator: postmaster@marples.name