Re: [Security][Patch] SEGV in dhcpcd-8.0.4
Roy Marples
Thu Oct 10 15:07:30 2019
On 10/10/2019 08:11, Kris Karas (Bug reporting) wrote:
Hi Roy -
Got it!
In order to figure out what was really going on, I created some
instrumentation to count what percentage of dhcpcd runs would encounter
invalid packets. And I tested that with 8.0.6, git master, git master +
your most recent patch, -4, -d, -B, spammy network, etc. Some of it I
guessed right; and some was not expected. I'll attach the tally of
runs. Brief summary: (1) flooding the network to capture more invalid
packets actually captures fewer, and (2) your most recent test patch
(moves filter after the bind) avoided most of the race condition,
showing the best results overall.
But the above is somewhat moot, as I've figured out and fixed the bug,
hacky patch attached.
Short(er) answer: in Linux, applying a BPF does not filter packets
already queued. Anything packets received between socket() and filter()
will get through unfiltered. It is up to userspace to synchronize this,
typically by discarding any packets that were downstream of the filter
when the filter was first applied.
Here's the TL+DR version:
Many moons ago, I had a chat with Alan Cox about a change in philosophy
about how the Linux kernel handles packet queues. In the early 2.0
days, if you had an ethernet interface that was bridged to a PPP
interface, and the serial interface on the other side of ppp sent an
XOFF, then Linux (probably like BSD) would reach in and throttle the
ethernet interface, causing its TCP window to shrink to 0. Everything
stayed in sync, nothing got lost. And then, the philosophy changed. To
make the queues faster, that internal logic was discarded. It was now
up to userspace to keep everything in sync - where needed. And I think
the same philosophical change has affected BPF.
In *BSD, I think (based on my limited observations of its behaviour)
that when you apply a packet filter to a socket, the kernel arranges for
any packets already internal to the kernel's memory to have the filter
applied before accept()/read() and friends get to them.
BSD flushes the kernel buffer when the filter is set.
There is also a seperate ioctl to do this.
In Linux, this
is not the case. Any packets that are upstream of the BPF mechanism
will receive filtering, while any packets that are already "in flight"
somewhere within the kernel will not be so filtered. It gets more
"interesting" in that the actual filter can be parceled out to a
separate kernel thread (if the kernel is so-configured), which must be
instantiated before it switches itself into the packet flow; they show
up in top(8) albeit somewhat cryptically.
Here's the race in utterly simplistic code:
L1 socket = socket_connect_bind_accept_etc();
L2 apply_bpf_filter(socket);
L3 bytes = read(socket, buffer, sizeof(buffer));
Line #2 applies a filter, but there is no telling where that filter
mechanism is located relative to the various queues and buffers that
hold packets. In BSD, line #2 magically moves packets back before line
There is nothing magical about BSD here.
The action is very clearly documented in bpf(4) - see BIOCSETF ioctl.
Here's the oldest manual page from 1994:
https://netbsd.gw.com/cgi-bin/man-cgi?bpf+4+NetBSD-1.3.3
#1 and filters everything; in Linux no packets are moved, any packets
already waiting on line #3 will never be filtered by line #2 - by
design. And "by design" here implies any bug I could file against the
Linux kernel would be closed as "wontfix" because it is passing control
of synchronizing duties to userspace as intended. Userspace is expected
to do this:
L1 socket = socket_connect_bind_accept_etc();
L2 apply_bpf_filter(socket);
L3 while(accept(readfd:[socket], timeout:0) > 0)
L4 read(socket, NULL, 0);
L5 bytes = read(socket, buffer, sizeof(buffer));
Lines 3/4 flush whatever might have been in the intermediate queues
before the "real" socket read on line 5 takes place. Not surprisingly,
this is exactly what my patch, attached, does: after establishing the
packet filter, it then goes and reads every packet that has already been
in-queue and discards it. Then, bpf_write() broadcasts the DHCP request
to any listening servers; the packets that get subsequently returned are
all guaranteed to be filtered by the now-instantiated filter.
Sorry, but this is just wrong.
We have no way of knowing if a packet in the queue passed the filter or
not. We could flush the buffer as you do, but equally we could be
flushing packets which were meant for us and passed the filter down the
toilet.
As a developer, I want to know when packets I receive from the applied
BPF filter have passed the filter so I can process them without any loss.
After applying my patch, dhcpcd now reports 0 invalid UDP packets over
several hundred test runs. And the 1/4 second flush delay is "in the
noise" versus the roughly 5-second processing time for one address
acquisition.
That's 1/4 second for each BPF socket being opened.
dhcpcd opens two BPF sockets on Linux - one for DHCP and one for ARP.
So that's half a second of idle time just for one interface. Ouch!!!
Please try the attached patch which moves the BPF sanity checks back
into the application and doesn't log failures - ie has the same effect.
I also added a big note to myself that because of this, BPF cannot be
trusted outside of BSD.
Let me know how it works for you!
Roy
diff --git a/src/arp.c b/src/arp.c
index 594a5854..041902b6 100644
--- a/src/arp.c
+++ b/src/arp.c
@@ -194,25 +194,22 @@ arp_packet(struct interface *ifp, uint8_t *data, size_t len)
return;
memcpy(&ar, data, sizeof(ar));
- /* These checks are enforced in the BPF filter. */
-#if 0
/* Families must match */
if (ar.ar_hrd != htons(ifp->family))
return;
/* Protocol must be IP. */
if (ar.ar_pro != htons(ETHERTYPE_IP))
- continue;
+ return;
/* lladdr length matches */
if (ar.ar_hln != ifp->hwlen)
- continue;
+ return;
/* Protocol length must match in_addr_t */
if (ar.ar_pln != sizeof(arm.sip.s_addr))
return;
/* Only these types are recognised */
if (ar.ar_op != htons(ARPOP_REPLY) &&
ar.ar_op != htons(ARPOP_REQUEST))
- continue;
-#endif
+ return;
/* Get pointers to the hardware addresses */
hw_s = data + sizeof(ar);
diff --git a/src/bpf.c b/src/bpf.c
index 51094b4b..c0cfd389 100644
--- a/src/bpf.c
+++ b/src/bpf.c
@@ -57,6 +57,20 @@
#include "if.h"
#include "logerr.h"
+/*
+ * NOTE TO SELF.
+ *
+ * Even though we program the BPF filter should we trust it?
+ * On Linux at least there is a window between opening the socket,
+ * binding the interface and setting the filter where we receive data.
+ * This data is NOT checked OR flushed and IS returned when reading.
+ * Even if Linux does fix this sorry state, who is to say other kernels
+ * could have bugs causing a similar effect?
+ *
+ * As such, let's strive to keep the filters just for pattern matching
+ * to avoid waking dhcpcd up.
+ */
+
#define ARP_ADDRS_MAX 3
/* BPF helper macros */
@@ -410,13 +424,7 @@ bpf_cmp_hwaddr(struct bpf_insn *bpf, size_t bpf_len, size_t off,
#endif
#ifdef ARP
-
static const struct bpf_insn bpf_arp_ether [] = {
- /* Ensure packet is at least correct size. */
- BPF_STMT(BPF_LD + BPF_W + BPF_LEN, 0),
- BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, sizeof(struct ether_arp), 1, 0),
- BPF_STMT(BPF_RET + BPF_K, 0),
-
/* Check this is an ARP packet. */
BPF_STMT(BPF_LD + BPF_H + BPF_ABS,
offsetof(struct ether_header, ether_type)),
@@ -552,17 +560,8 @@ bpf_arp(struct interface *ifp, int fd)
}
#endif
-#define BPF_M_FHLEN 0
-#define BPF_M_IPHLEN 1
-#define BPF_M_IPLEN 2
-#define BPF_M_UDP 3
-#define BPF_M_UDPLEN 4
-
#ifdef ARPHRD_NONE
static const struct bpf_insn bpf_bootp_none[] = {
- /* Set the frame header length to zero. */
- BPF_STMT(BPF_LD + BPF_IMM, 0),
- BPF_STMT(BPF_ST, BPF_M_FHLEN),
};
#define BPF_BOOTP_NONE_LEN __arraycount(bpf_bootp_none)
#endif
@@ -574,10 +573,8 @@ static const struct bpf_insn bpf_bootp_ether[] = {
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, ETHERTYPE_IP, 1, 0),
BPF_STMT(BPF_RET + BPF_K, 0),
- /* Load frame header length into X. */
- BPF_STMT(BPF_LDX + BPF_W + BPF_IMM, sizeof(struct ether_header)),
- /* Copy frame header length to memory */
- BPF_STMT(BPF_STX, BPF_M_FHLEN),
+ /* Advance to the IP header. */
+ BPF_STMT(BPF_LDX + BPF_K, sizeof(struct ether_header)),
};
#define BPF_BOOTP_ETHER_LEN __arraycount(bpf_bootp_ether)
@@ -591,15 +588,6 @@ static const struct bpf_insn bpf_bootp_filter[] = {
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, 0x40, 1, 0),
BPF_STMT(BPF_RET + BPF_K, 0),
- /* Ensure IP header length is big enough and
- * store the IP header length in memory. */
- BPF_STMT(BPF_LD + BPF_B + BPF_IND, 0),
- BPF_STMT(BPF_ALU + BPF_AND + BPF_K, 0x0f),
- BPF_STMT(BPF_ALU + BPF_MUL + BPF_K, 4),
- BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, sizeof(struct ip), 1, 0),
- BPF_STMT(BPF_RET + BPF_K, 0),
- BPF_STMT(BPF_ST, BPF_M_IPHLEN),
-
/* Make sure it's a UDP packet. */
BPF_STMT(BPF_LD + BPF_B + BPF_IND, offsetof(struct ip, ip_p)),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 1, 0),
@@ -610,52 +598,17 @@ static const struct bpf_insn bpf_bootp_filter[] = {
BPF_JUMP(BPF_JMP + BPF_JSET + BPF_K, 0x1fff, 0, 1),
BPF_STMT(BPF_RET + BPF_K, 0),
- /* Ensure IP length is big enough to hold the UDP + BOOTP payload and
- * store IP length in memory. */
- BPF_STMT(BPF_LD + BPF_H + BPF_IND, offsetof(struct ip, ip_len)),
- BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, BOOTP_MIN_SIZE, 1, 0),
- BPF_STMT(BPF_RET + BPF_K, 0),
- BPF_STMT(BPF_ST, BPF_M_IPLEN),
-
/* Advance to the UDP header. */
- BPF_STMT(BPF_LD + BPF_MEM, BPF_M_IPHLEN),
+ BPF_STMT(BPF_LD + BPF_B + BPF_IND, 0),
+ BPF_STMT(BPF_ALU + BPF_AND + BPF_K, 0x0f),
+ BPF_STMT(BPF_ALU + BPF_MUL + BPF_K, 4),
BPF_STMT(BPF_ALU + BPF_ADD + BPF_X, 0),
BPF_STMT(BPF_MISC + BPF_TAX, 0),
- /* Store UDP location */
- BPF_STMT(BPF_STX, BPF_M_UDP),
-
/* Make sure it's from and to the right port. */
BPF_STMT(BPF_LD + BPF_W + BPF_IND, 0),
BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, (BOOTPS << 16) + BOOTPC, 1, 0),
BPF_STMT(BPF_RET + BPF_K, 0),
-
- /* Store UDP length. */
- BPF_STMT(BPF_LD + BPF_H + BPF_IND, offsetof(struct udphdr, uh_ulen)),
- BPF_STMT(BPF_ST, BPF_M_UDPLEN),
-
- /* Ensure that UDP length + IP header length == IP length */
- /* Copy IP header length to X. */
- BPF_STMT(BPF_LDX + BPF_MEM, BPF_M_IPHLEN),
- /* Add UDP length (A) to IP header length (X). */
- BPF_STMT(BPF_ALU + BPF_ADD + BPF_X, 0),
- /* Store result in X. */
- BPF_STMT(BPF_MISC + BPF_TAX, 0),
- /* Copy IP length to A. */
- BPF_STMT(BPF_LD + BPF_MEM, BPF_M_IPLEN),
- /* Ensure X == A. */
- BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_X, 0, 1, 0),
- BPF_STMT(BPF_RET + BPF_K, 0),
-
- /* Advance to the BOOTP packet. */
- BPF_STMT(BPF_LD + BPF_MEM, BPF_M_UDP),
- BPF_STMT(BPF_ALU + BPF_ADD + BPF_K, sizeof(struct udphdr)),
- BPF_STMT(BPF_MISC + BPF_TAX, 0),
-
- /* Make sure it's BOOTREPLY. */
- BPF_STMT(BPF_LD + BPF_B + BPF_IND, offsetof(struct bootp, op)),
- BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, BOOTREPLY, 1, 0),
- BPF_STMT(BPF_RET + BPF_K, 0),
};
#define BPF_BOOTP_FILTER_LEN __arraycount(bpf_bootp_filter)
@@ -735,14 +688,8 @@ bpf_bootp(struct interface *ifp, int fd)
}
#endif
- /* All passed, return the packet - frame length + ip length */
- BPF_SET_STMT(bp, BPF_LD + BPF_MEM, BPF_M_FHLEN);
- bp++;
- BPF_SET_STMT(bp, BPF_LDX + BPF_MEM, BPF_M_IPLEN);
- bp++;
- BPF_SET_STMT(bp, BPF_ALU + BPF_ADD + BPF_X, 0);
- bp++;
- BPF_SET_STMT(bp, BPF_RET + BPF_A, 0);
+ /* All passed, return the packet. */
+ BPF_SET_STMT(bp, BPF_RET + BPF_K, BPF_WHOLEPACKET);
bp++;
return bpf_attach(fd, bpf, (unsigned int)(bp - bpf));
diff --git a/src/dhcp.c b/src/dhcp.c
index 76ead6dc..74c9c2c0 100644
--- a/src/dhcp.c
+++ b/src/dhcp.c
@@ -3270,9 +3270,40 @@ get_udp_data(void *packet, size_t *len)
return p;
}
-static int
-valid_udp_packet(void *packet, size_t plen, struct in_addr *from,
- unsigned int flags)
+static bool
+is_packet_udp_bootp(void *packet, size_t plen)
+{
+ struct ip *ip = packet;
+ size_t ip_hlen;
+ struct udphdr *udp;
+
+ if (sizeof(*ip) > plen)
+ return false;
+
+ if (ip->ip_v != IPVERSION || ip->ip_p != IPPROTO_UDP)
+ return false;
+
+ /* Sanity. */
+ if (ntohs(ip->ip_len) != plen)
+ return false;
+
+ ip_hlen = (size_t)ip->ip_hl * 4;
+ /* Check we have a UDP header and BOOTP. */
+ if (ip_hlen + sizeof(*udp) + offsetof(struct bootp, vend) > plen)
+ return false;
+
+ /* Check it's to and from the right ports. */
+ udp = (struct udphdr *)(void *)((char *)ip + ip_hlen);
+ if (udp->uh_dport != htons(BOOTPC) || udp->uh_sport != htons(BOOTPS))
+ return false;
+
+ return true;
+}
+
+/* Lengths have already been checked. */
+static bool
+checksums_valid(void *packet,
+ struct in_addr *from, unsigned int flags)
{
struct ip *ip = packet;
struct ip pseudo_ip = {
@@ -3281,69 +3312,34 @@ valid_udp_packet(void *packet, size_t plen, struct in_addr *from,
.ip_dst = ip->ip_dst
};
size_t ip_hlen;
- uint16_t ip_len, udp_len, uh_sum;
+ uint16_t udp_len, uh_sum;
struct udphdr *udp;
uint32_t csum;
- if (plen < sizeof(*ip)) {
- if (from != NULL)
- from->s_addr = INADDR_ANY;
- errno = ERANGE;
- return -1;
- }
-
if (from != NULL)
from->s_addr = ip->ip_src.s_addr;
- /* Check we have the IP header */
ip_hlen = (size_t)ip->ip_hl * 4;
- if (ip_hlen > plen) {
- errno = ENOBUFS;
- return -1;
- }
+ if (in_cksum(ip, ip_hlen, NULL) != 0)
+ return false;
- if (in_cksum(ip, ip_hlen, NULL) != 0) {
- errno = EINVAL;
- return -1;
- }
-
- /* Check we have a payload */
- ip_len = ntohs(ip->ip_len);
- if (ip_len <= ip_hlen + sizeof(*udp)) {
- errno = ERANGE;
- return -1;
- }
- /* Check IP doesn't go beyond the payload */
- if (ip_len > plen) {
- errno = ENOBUFS;
- return -1;
- }
+ if (flags & BPF_PARTIALCSUM)
+ return 0;
- /* Check UDP doesn't go beyond the payload */
udp = (struct udphdr *)(void *)((char *)ip + ip_hlen);
- udp_len = ntohs(udp->uh_ulen);
- if (udp_len > plen - ip_hlen) {
- errno = ENOBUFS;
- return -1;
- }
-
- if (udp->uh_sum == 0 || flags & BPF_PARTIALCSUM)
+ if (udp->uh_sum == 0)
return 0;
/* UDP checksum is based on a pseudo IP header alongside
* the UDP header and payload. */
+ udp_len = ntohs(udp->uh_ulen);
uh_sum = udp->uh_sum;
udp->uh_sum = 0;
pseudo_ip.ip_len = udp->uh_ulen;
csum = 0;
in_cksum(&pseudo_ip, sizeof(pseudo_ip), &csum);
csum = in_cksum(udp, udp_len, &csum);
- if (csum != uh_sum) {
- errno = EINVAL;
- return -1;
- }
-
- return 0;
+ return csum == uh_sum;
}
static void
@@ -3352,13 +3348,12 @@ dhcp_handlebootp(struct interface *ifp, struct bootp *bootp, size_t len,
{
size_t v;
- /* udp_len must be correct because the values are checked in
- * valid_udp_packet(). */
if (len < offsetof(struct bootp, vend)) {
logerrx("%s: truncated packet (%zu) from %s",
ifp->name, len, inet_ntoa(*from));
return;
}
+
/* To make our IS_DHCP macro easy, ensure the vendor
* area has at least 4 octets. */
v = len - offsetof(struct bootp, vend);
@@ -3378,14 +3373,13 @@ dhcp_handlebpf(struct interface *ifp, uint8_t *data, size_t len)
size_t udp_len;
const struct dhcp_state *state = D_CSTATE(ifp);
- if (valid_udp_packet(data, len, &from, state->bpf_flags) == -1) {
- const char *errstr;
+ /* Validate filter. */
+ if (!is_packet_udp_bootp(data, len))
+ return;
- if (errno == EINVAL)
- errstr = "checksum failure";
- else
- errstr = "invalid UDP packet";
- logerrx("%s: %s from %s", ifp->name, errstr, inet_ntoa(from));
+ if (!checksums_valid(data, &from, state->bpf_flags)) {
+ logerrx("%s: checksum failure from %s",
+ ifp->name, inet_ntoa(from));
return;
}
diff --git a/src/if-linux.c b/src/if-linux.c
index 18044bbf..1b1357fd 100644
--- a/src/if-linux.c
+++ b/src/if-linux.c
@@ -1419,9 +1419,6 @@ bpf_open(struct interface *ifp, int (*filter)(struct interface *, int))
state->buffer_len = state->buffer_pos = 0;
}
- if (filter(ifp, s) != 0)
- goto eexit;
-
#ifdef PACKET_AUXDATA
n = 1;
if (setsockopt(s, SOL_PACKET, PACKET_AUXDATA, &n, sizeof(n)) != 0) {
@@ -1436,6 +1433,19 @@ bpf_open(struct interface *ifp, int (*filter)(struct interface *, int))
su.sll.sll_ifindex = (int)ifp->index;
if (bind(s, &su.sa, sizeof(su.sll)) == -1)
goto eexit;
+
+ if (filter(ifp, s) != 0)
+ goto eexit;
+
+ /*
+ * At this point we could have received packets for the wrong
+ * interface or which don't pass the filter.
+ * Linux should flush upon setting the filter like every other OS.
+ * There is no way of flushing them from userland.
+ * As such, consumers need to inspect each packet to ensure it's valid.
+ * Or to put it another way, don't trust the Linux BPF filter.
+ */
+
return s;
eexit:
Archive administrator: postmaster@marples.name