dhcpcd-discuss

Re: malloc() error; coredump

Neal P. Murphy

Thu May 26 20:12:11 2016

On Thu, 26 May 2016 16:28:37 +0100
Roy Marples <roy@xxxxxxxxxxxx> wrote:

> On 25/05/2016 23:32, Neal P. Murphy wrote:
> > I'm playing with getting Smoothwall Express to use IPv6. And not
> > getting anywhere very quickly. But that's partly due to my newness to
> > IPv6.
> > 
> > Something is weird.
> > 
> > With 6.10.3 and 6.11.0, any of these lines in the .conf: ---- ia_pd
> > 0b:17:f2:7d/::/60 eth0/1 eth2/3 ia_pd 0b:17:f2:7d/::0/60 eth0/1
> > eth2/3 ia_pd 0b:17:f2:7d//60 eth0/1 eth2/3 ----
> > 
> > causes ---- ... DUID 00:01:00:01:1b:ff:1d:50:00:90:0b:17:f2:7d eth0:
> > IAID 0b:17:f2:7a eth2: IAID 0b:17:f2:7c eth3: IAID 0b:17:f2:7d eth3:
> > delaying IPv6 router solicitation for 0.8 seconds eth3: reading lease
> > `/var/db/dhcpcd-eth3.lease6' eth3: soliciting a DHCPv6 lease eth3:
> > delaying SOLICIT6 (xid 0x32dd0d), next in 0.9 seconds eth3:
> > soliciting an IPv6 router *** Error in `dhcpcd': malloc(): memory
> > corruption: 0x09c8fb50 *** ----
> > 
> > And dhcpcd must be killed via 'kill -9 PID'.
> 
> Can you test a trunk snapshot please as I think this is already fixed?
> http://roy.marples.name/projects/dhcpcd/tarball/dhcpcd-trunk.tar.gz
> Login as anonymous and give the password shown on screen.
> 
> Roy

(Had to figure out how to make the build system use '-g'....) This is dhcpcd.trunk (even though it looks like 6.11.0).

OK. Built with -g. Have gdb on the target. Run with and still get the memory corruption; even gdb is hung up. So run it without gdb, then attach gdb to dhcpcd once it's hung, and use 'gcore' to generate a coredump. Looks like something is thoroughly stomping on the stack. This is all I get:

----
[root@firewall dhcpcd-6.11.0]# gdb -c core.6025 ./dhcpcd
GNU gdb (GDB) 7.5.1
Copyright (C) 2012 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "i586-pc-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /root/dhcpcd-6.11.0/dhcpcd...done.
[New LWP 6025]

warning: Could not load shared library symbols for linux-gate.so.1.
Do you need "set solib-search-path" or "set sysroot"?

warning: File "/lib/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".
To enable execution of this file add
        add-auto-load-safe-path /lib/libthread_db-1.0.so
line to your configuration file "/root/.gdbinit".
To completely disable this security protection add
        set auto-load safe-path /
line to your configuration file "/root/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
        info "(gdb)Auto-loading safe path"

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.

warning: File "/lib/libthread_db-1.0.so" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load".

warning: Unable to find libthread_db matching inferior's thread library, thread debugging will not be available.
Core was generated by `dhcpcd'.
#0  0xb765349c in ?? () from /lib/libc.so.6
(gdb) where
#0  0xb765349c in ?? () from /lib/libc.so.6
#1  0xb75db91b in ?? () from /lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
----




So *that* won't help much. ... OK, I'll 'next' through the program. It dies in eloop_start, so SIGKILL, run, break at eloop_start, and 'next' to the next failure. Here's where it dies:

----
eth3: broadcasting SOLICIT6 (xid 0xf140c4), next in 0.9 seconds
874                                     TAILQ_INSERT_TAIL(&eloop->free_timeouts, t, next);
(gdb) 
875                                     continue;
(gdb) 
976             }
(gdb) 
859                     if (eloop->exitnow)
(gdb) 
863                     if (eloop->timeout0) {
(gdb) 
869                     if ((t = TAILQ_FIRST(&eloop->timeouts))) {
(gdb) 
870                             clock_gettime(CLOCK_MONOTONIC, &now);
(gdb) 
871                             if (timespeccmp(&now, &t->when, >)) {
(gdb) 
872                                     TAILQ_REMOVE(&eloop->timeouts, t, next);
(gdb) 
873                                     t->callback(t->arg);
(gdb) 
eth3: soliciting an IPv6 router
*** Error in `/root/dhcpcd-6.11.0/dhcpcd': malloc(): memory corruption: 0x080b1b38 ***
----




Breaking at line 872 and 'c'ontinuing a few times, I get:
----
Breakpoint 3, eloop_start (eloop=0x80aabd8, signals=0xbffffa64) at eloop.c:873
873                                     t->callback(t->arg);
(gdb) c
Continuing.
eth3: IAID 0b:17:f2:7d
eth3: delaying IPv6 router solicitation for 0.6 seconds
eth3: reading lease `/var/db/dhcpcd-eth3.lease6'
eth3: soliciting a DHCPv6 lease
eth3: delaying SOLICIT6 (xid 0x844521), next in 0.1 seconds

Breakpoint 3, eloop_start (eloop=0x80aabd8, signals=0xbffffa64) at eloop.c:873
873                                     t->callback(t->arg);
(gdb) c
Continuing.
timed out
forking to background
forked to background, child pid 7360
*** Error in `/root/dhcpcd-6.11.0/dhcpcd': free(): corrupted unsorted chunks: 0x080b05f0 ***
*** Error in `/root/dhcpcd-6.11.0/dhcpcd': malloc(): memory corruption: 0x080b1db8 ***
----



Drilling deeper:

----
eth3: IAID 0b:17:f2:7d
eth3: delaying IPv6 router solicitation for 0.1 seconds
eth3: reading lease `/var/db/dhcpcd-eth3.lease6'
eth3: soliciting a DHCPv6 lease
eth3: delaying SOLICIT6 (xid 0xe577d9), next in 0.5 seconds
961     }
(gdb) 
eloop_start (eloop=0x80aabd8, signals=0xbffffa64) at eloop.c:874
874                                     TAILQ_INSERT_TAIL(&eloop->free_timeouts, t, next);
(gdb) 
875                                     continue;
(gdb) 
976             }
(gdb) 
859                     if (eloop->exitnow)
(gdb) 
863                     if (eloop->timeout0) {
(gdb) 
869                     if ((t = TAILQ_FIRST(&eloop->timeouts))) {
(gdb) 
870                             clock_gettime(CLOCK_MONOTONIC, &now);
(gdb) 
871                             if (timespeccmp(&now, &t->when, >)) {
(gdb) 
872                                     TAILQ_REMOVE(&eloop->timeouts, t, next);
(gdb) 

Breakpoint 3, eloop_start (eloop=0x80aabd8, signals=0xbffffa64) at eloop.c:873
873                                     t->callback(t->arg);
(gdb) s
ipv6nd_startrs1 (arg=0x80afce0) at ipv6nd.c:1591
1591            struct interface *ifp = arg;
(gdb) n
1594            logger(ifp->ctx, LOG_INFO, "%s: soliciting an IPv6 router", ifp->name);
(gdb) 
eth3: soliciting an IPv6 router
1595            if (ipv6nd_open(ifp->ctx) == -1) {
(gdb) 
*** Error in `/root/dhcpcd-6.11.0/dhcpcd': malloc(): memory corruption: 0x080b2220 ***
----




Getting close to fracking:

----
eth3: delaying SOLICIT6 (xid 0x0ae9d3), next in 0.4 seconds
eth3: soliciting an IPv6 router

Breakpoint 2, ipv6nd_startrs1 (arg=0x80afce0) at ipv6nd.c:1595
1595            if (ipv6nd_open(ifp->ctx) == -1) {
(gdb) s
ipv6nd_open (dctx=0xbffff9dc) at ipv6nd.c:189
189             ctx = dctx->ipv6;
(gdb) n
190             if (ctx->nd_fd != -1)
(gdb) 
193             ctx->nd_fd = xsocket(PF_INET6, SOCK_RAW | SOCK_FLAGS, IPPROTO_ICMPV6);
(gdb) 
195             if (ctx->nd_fd == -1)
(gdb) 
199             on = 255;
(gdb) 
200             if (setsockopt(ctx->nd_fd, IPPROTO_IPV6, IPV6_MULTICAST_HOPS,
(gdb) 
204             on = 1;
(gdb) 
205             if (setsockopt(ctx->nd_fd, IPPROTO_IPV6, IPV6_RECVPKTINFO,
(gdb) 
209             on = 1;
(gdb) 
210             if (setsockopt(ctx->nd_fd, IPPROTO_IPV6, IPV6_RECVHOPLIMIT,
(gdb) 
214             ICMP6_FILTER_SETBLOCKALL(&filt);
(gdb) 
215             ICMP6_FILTER_SETPASS(ND_NEIGHBOR_ADVERT, &filt);
(gdb) 
216             ICMP6_FILTER_SETPASS(ND_ROUTER_ADVERT, &filt);
(gdb) 
217             if (setsockopt(ctx->nd_fd, IPPROTO_ICMPV6, ICMP6_FILTER,
(gdb) 
221             eloop_event_add(dctx->eloop, ctx->nd_fd,  ipv6nd_handledata, dctx);
(gdb) 
*** Error in `/root/dhcpcd-6.11.0/dhcpcd': malloc(): memory corruption: 0x080b1908 ***
----




At this point it's starting to become a little erratic. Here, I set breakpoints on every malloc():
----
Breakpoint 9, ipv6_handlert (ctx=0xbffff9dc, cmd=1, rt=0xbffff518) at ipv6.c:2033
2033                            if ((f = malloc(sizeof(*f))) == NULL)
(gdb) 
Continuing.

Breakpoint 9, ipv6_handlert (ctx=0xbffff9dc, cmd=1, rt=0xbffff518) at ipv6.c:2033
2033                            if ((f = malloc(sizeof(*f))) == NULL)
(gdb) 
Continuing.
eth3: IAID 0b:17:f2:7d
eth3: delaying IPv6 router solicitation for 0.7 seconds
eth3: reading lease `/var/db/dhcpcd-eth3.lease6'
eth3: soliciting a DHCPv6 lease
eth3: delaying SOLICIT6 (xid 0x80905a), next in 0.1 seconds
*** Error in `/root/dhcpcd-6.11.0/dhcpcd': malloc(): memory corruption: 0x080b1868 ***
----




If dhcpcd uses threads, I'm beginning to suspect another thread is going on a walkabout. Perhaps I need to tell gdb to follow the child when forking (and there should be a way to do that when threading). If this spelunking hasn't helped you, I'll do that next. And when the F/W isn't 'needed' here, I'll see what happens when I put a 64-bit system in place (I have one that's almost ready to be installed at my church); too bad Comcrash doesn't seem to allow me to connect two systems, one for IPv4 and one for IPv6.




Here's a different problem. Here the firewall is 'closed' (IPv6 is not allowed), so sendmsg fails.
When run with gdb, here's the end of the gdb output including 'bt full'.

----
eth3: delaying IPv6 router solicitation for 0.6 seconds
eth3: reading lease `/var/db/dhcpcd-eth3.lease6'
eth3: soliciting a DHCPv6 lease
eth3: delaying SOLICIT6 (xid 0x669f31), next in 0.1 seconds
eth3: broadcasting SOLICIT6 (xid 0x669f31), next in 1.0 seconds
eth3: dhcp6_sendmessage: sendmsg: Operation not permitted

Program received signal SIGSEGV, Segmentation fault.
0xb7ebf0df in ?? () from /lib/libc.so.6
(gdb) bt full
#0  0xb7ebf0df in ?? () from /lib/libc.so.6
No symbol table info available.
#1  0x08071f49 in ipv6_freeaddr (ap=0x80b1588) at ipv6.c:861
        ia = 0xb7ee8ce4 <strftime_l+52>
#2  0x0807216b in ipv6_freedrop_addrs (addrs=0x80b1300, drop=1, ifd=0x0) at ipv6.c:909
        ap = 0x80b1588
        apn = 0x0
        apf = 0x8
        now = {tv_sec = 0, tv_nsec = 0}
#3  0x0807a717 in dhcp6_freedrop_addrs (ifp=0x80afce0, drop=1, ifd=0x0) at dhcp6.c:921
        state = 0x80b1290
#4  0x0808095d in dhcp6_freedrop (ifp=0x80afce0, drop=1, reason=0x808cd2b "EXPIRE6") at dhcp6.c:3375
        state = 0x80b1290
        ctx = 0x4
        options = 310045818639374377
        dropdele = 0
#5  0x08080b11 in dhcp6_drop (ifp=0x80afce0, reason=0x808cd2b "EXPIRE6") at dhcp6.c:3417
No locals.
#6  0x0807b1ee in dhcp6_sendmessage (ifp=0x80afce0, callback=0x807b308 <dhcp6_senddiscover>) at dhcp6.c:1127
        state = 0x80b1290
        ctx = 0x80b0910
        dst = {sin6_family = 10, sin6_port = 8962, sin6_flowinfo = 0, sin6_addr = {__in6_u = {
              __u6_addr8 = "\377\002", '\000' <repeats 11 times>, "\001\000\002", __u6_addr16 = {767, 0, 0, 0, 
                0, 0, 256, 512}, __u6_addr32 = {767, 0, 0, 33554688}}}, sin6_scope_id = 5}
        cm = 0x80b0958
        pi = {ipi6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 
                0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, ipi6_ifindex = 5}
        RTprev = {tv_sec = 0, tv_nsec = 29000000}
        rnd = 0.029999999999999999
        ms = 29
        neg = 0 '\000'
        broad_uni = 0x808cc84 "broadcasting"
        alldhcp = {__in6_u = {__u6_addr8 = "\377\002", '\000' <repeats 11 times>, "\001\000\002", 
            __u6_addr16 = {767, 0, 0, 0, 0, 0, 256, 512}, __u6_addr32 = {767, 0, 0, 33554688}}}
        __func__ = "dhcp6_sendmessage"
#7  0x0807b321 in dhcp6_senddiscover (arg=0x80afce0) at dhcp6.c:1158
No locals.
#8  0x08052971 in eloop_start (eloop=0x80aabd8, signals=0xbffffa64) at eloop.c:873
        n = 0
        e = 0x80b81f0
        t = 0x80aae40
        now = {tv_sec = 2150, tv_nsec = 660526885}
        ts = {tv_sec = 0, tv_nsec = 134469643}
        tsp = 0xbffff868
        t0 = 0x0
        epe = {events = 134554200, data = {ptr = 0x80aabd8, fd = 134917080, u32 = 134917080, u64 = 4429884376}}
        timeout = 135
#9  0x08051491 in main (argc=4, argv=0xbffffc54) at dhcpcd.c:1926
        ctx = {pidfile = "/var/run/dhcpcd.pid", '\000' <repeats 23 times>, 
          cffile = 0xbffffd95 "/root/dhcpcd6.conf", options = 308637754413701161, 
          logfile = 0x80aaef0 "/var/log/dhcpcd6.log", log_fd = 8, argc = 4, argv = 0xbffffc54, ifac = 3, 
          ifav = 0x80aae80, ifdc = 3, ifdv = 0x80aaec0, ifc = 0, ifv = 0xbffffc64, ifcc = 1, ifcv = 0x80aaf10, 
          duid = 0x80b1508 "", duid_len = 14, ifaces = 0x80b81f0, pf_inet_fd = 11, priv = 0x80aac30, 
          link_fd = 9, seq = 21, sseq = 21, sigset = {__val = {0 <repeats 32 times>}}, eloop = 0x80aabd8, 
          control_fd = 13, control_unpriv_fd = 14, control_fds = {tqh_first = 0x0, tqh_last = 0xbffffaf0}, 
          control_sock = "/var/run/dhcpcd.sock", '\000' <repeats 20 times>, control_group = 0, vivso = 0x0, 
          vivso_len = 0, randomstate = 0x0, ppid = 0, pseq = 0, dhcp_opts = 0x80a95a0, dhcp_opts_len = 123, 
          ipv4_routes = 0x0, ipv4_kroutes = 0x0, udp_fd = -1, opt_buffer = 0x0, opt_buffer_len = 0, 
          secret = 0x0, secret_len = 0, nd_opts = 0x80aaac8, nd_opts_len = 6, dhcp6_opts = 0x80aea68, 
          dhcp6_opts_len = 70, ipv6 = 0x80b0910, dev_load = 0x0, dev_fd = 17, dev = 0x80ab7a8, 
          dev_handle = 0x80ab400}
        ifo = 0x0
        ifp = 0x0
        family = 10
        opt = 1
        oi = 0
        i = 0
        t = 30
        len = 134837276
        pid = 0
        sig = 0
        siga = 0x0
        __func__ = "main"
----



I hope *some* of this has been helpful.

Neal

Follow-Ups:
Re: malloc() error; coredumpRoy Marples
References:
malloc() error; coredumpNeal P. Murphy
Re: malloc() error; coredumpRoy Marples
Archive administrator: postmaster@marples.name