dhcpcd-discuss

Re: [RFC]Routes are stored as a linked list

Roy Marples

Fri Mar 08 13:18:37 2019

OK, I've now done a little bit more testing on this and comitted a few more changes - this mainly affects a single dhcpcd running on all interfaces rather that just one which may not be your use case, but it is the prefered way of running dhcpcd as it uses less resources.

I still have some more test cases to run, but it's looking quite promising now for being merged into master.

But lets talk some more about the recieve socket size.
The linux man page says this:
       SO_RCVBUF
              Sets or gets the maximum socket receive buffer in bytes.  The
              kernel doubles this value (to allow space for bookkeeping
overhead) when it is set using setsockopt(2), and this doubled
              value is returned by getsockopt(2).  The default value is set
              by the /proc/sys/net/core/rmem_default file, and the maximum
              allowed value is set by the /proc/sys/net/core/rmem_max file.
              The minimum (doubled) value for this option is 256.

So you've allocated a 16Mb buffer, but the kernel doubles it to 32Mb.
Ouch.
But let's not use doubled values for this discussion.

Have you tested with a smaller buffer at all?
The default is about 100k I believe on Linux - does it overflow still if we say increase it to 512k or 1Mb?

Roy

On 05/03/2019 20:02, Donald Sharp wrote:
Mar 05 15:35:52 janelle dhcpcd[20445]: free route list used 4000231 times
Mar 05 15:35:52 janelle dhcpcd[20445]: new routes from free list 3000224
Mar 05 15:35:52 janelle dhcpcd[20445]: dhcpcd exited

So a note on the buffer overruns -> While dhcpcd does indeed have
logic to handle this situation, we have noticed, in general, that
recovery mechanisms are always more expensive than not having to
recover at all.  Especially as a system becomes more loaded.  We have
even seen negative feed back loops where a system never recovers
because multiple daemons are all recovering from netlink socket buffer
overflow.  In other words we do our best to avoid it.  Hence the
suggestion to allow the receive buffer to be expanded via the cli so
people sensitive to this situation can tune the behavior of their
system.

donald

On Tue, Mar 5, 2019 at 10:24 AM Roy Marples <roy@xxxxxxxxxxxx> wrote:



On 05/03/2019 15:03, Donald Sharp wrote:
Roy -

I could not figure out how to make debug's work so I just changed the
logdebugx to logerrx:

Use the -d flag or add the word debug to /etc/dhcpcd.conf


diff --git a/src/route.c b/src/route.c
index dbe2b5a5..6963837d 100644
--- a/src/route.c
+++ b/src/route.c
@@ -258,8 +258,8 @@ rt_dispose(struct dhcpcd_ctx *ctx)
    rt_headfree(&ctx->routes);
   #ifdef RT_FREE_ROUTE_TABLE
    rt_headfree(&ctx->froutes);
- logdebugx("free route list used %zu times", froutes);
- logdebugx("new routes from free list %zu", nroutes);
+ logerrx("free route list used %zu times", froutes);
+ logerrx("new routes from free list %zu", nroutes);
   #endif
   }


And only see this:

Mar 05 14:41:19 janelle dhcpcd[20151]: free route list used 7 times
Mar 05 14:41:19 janelle dhcpcd[20168]: dummy0: soliciting an IPv6 router
Mar 05 14:41:19 janelle dhcpcd[20151]: new routes from free list 0

On startup.  After startup when I install/remove 1 million routes I
never see the message again.

Cause dhcpcd to exit and then check syslog.
You'll see it then.


Another issue that has popped up during testing is that now that I am
looking at the syslog is that I am seeing frequent messges from dhcpd
that say this:

Mar 05 14:34:40 janelle dhcpcd[19931]: route socket overflowed -
learning interface state
Mar 05 14:34:41 janelle dhcpcd[19931]: route socket overflowed -
learning interface state

during route install/removal.

I added this bit of code:

diff --git a/src/if-linux.c b/src/if-linux.c
index b912c171..4b5dda54 100644
--- a/src/if-linux.c
+++ b/src/if-linux.c
@@ -323,12 +323,15 @@ if_opensockets_os(struct dhcpcd_ctx *ctx)
        &on, sizeof(on));
   #endif

+ int rcvbufsize = 16 * 1024 * 1024;
+ setsockopt(ctx->link_fd, SOL_SOCKET, SO_RCVBUFFORCE, &rcvbufsize,
sizeof(rcvbufsize));
    if ((priv = calloc(1, sizeof(*priv))) == NULL)
    return -1;

    ctx->priv = priv;
    memset(&snl, 0, sizeof(snl));
    priv->route_fd = _open_link_socket(&snl, NETLINK_ROUTE);
+ setsockopt(priv->route_fd, SOL_SOCKET, SO_RCVBUFFORCE, &rcvbufsize,
sizeof(rcvbufsize));
    if (priv->route_fd == -1)
    return -1;
    len = sizeof(snl);

and the error message has gone away.  Perhaps some cli option is
needed for linux when it is planned to be used in a large scale
routing env?

While the error itself is indeed scary, dhcpcd does have the logic to
work around it.
Each time it happens it will re-sync interfaces and addresses with a
single call to getifaddrs(3).
It doens't resync routes, but that's not overly important.

Do you only see the message on system boot, or when dhcpcd is actually
running as well?
I'm curious as that's quite a large kernel buffer you're reserving, and
that will be for each instance of dhcpcd.

Roy


Follow-Ups:
Re: [RFC]Routes are stored as a linked listDonald Sharp
References:
[RFC]Routes are stored as a linked listDonald Sharp
Re: [RFC]Routes are stored as a linked listDonald Sharp
Re: [RFC]Routes are stored as a linked listDonald Sharp
Re: [RFC]Routes are stored as a linked listRoy Marples
Re: [RFC]Routes are stored as a linked listDonald Sharp
Re: [RFC]Routes are stored as a linked listRoy Marples
Re: [RFC]Routes are stored as a linked listDonald Sharp
Re: [RFC]Routes are stored as a linked listRoy Marples
Re: [RFC]Routes are stored as a linked listRoy Marples
Re: [RFC]Routes are stored as a linked listDonald Sharp
Re: [RFC]Routes are stored as a linked listDonald Sharp
Re: [RFC]Routes are stored as a linked listRoy Marples
Re: [RFC]Routes are stored as a linked listRoy Marples
Re: [RFC]Routes are stored as a linked listDonald Sharp
Re: [RFC]Routes are stored as a linked listRoy Marples
Re: [RFC]Routes are stored as a linked listDonald Sharp
Archive administrator: postmaster@marples.name