dhcpcd-discuss

Re: [Security][Patch] SEGV in dhcpcd-8.0.4

Roy Marples

Thu Oct 10 15:07:30 2019

On 10/10/2019 08:11, Kris Karas (Bug reporting) wrote:
Hi Roy -

Got it!

In order to figure out what was really going on, I created some instrumentation to count what percentage of dhcpcd runs would encounter invalid packets.  And I tested that with 8.0.6, git master, git master + your most recent patch, -4, -d, -B, spammy network, etc.  Some of it I guessed right; and some was not expected.  I'll attach the tally of runs.  Brief summary: (1) flooding the network to capture more invalid packets actually captures fewer, and (2) your most recent test patch (moves filter after the bind) avoided most of the race condition, showing the best results overall.

But the above is somewhat moot, as I've figured out and fixed the bug, hacky patch attached.

Short(er) answer:  in Linux, applying a BPF does not filter packets already queued.  Anything packets received between socket() and filter() will get through unfiltered.  It is up to userspace to synchronize this, typically by discarding any packets that were downstream of the filter when the filter was first applied.

Here's the TL+DR version:

Many moons ago, I had a chat with Alan Cox about a change in philosophy about how the Linux kernel handles packet queues.  In the early 2.0 days, if you had an ethernet interface that was bridged to a PPP interface, and the serial interface on the other side of ppp sent an XOFF, then Linux (probably like BSD) would reach in and throttle the ethernet interface, causing its TCP window to shrink to 0.  Everything stayed in sync, nothing got lost.  And then, the philosophy changed.  To make the queues faster, that internal logic was discarded.  It was now up to userspace to keep everything in sync - where needed.  And I think the same philosophical change has affected BPF.

In *BSD, I think (based on my limited observations of its behaviour) that when you apply a packet filter to a socket, the kernel arranges for any packets already internal to the kernel's memory to have the filter applied before accept()/read() and friends get to them.

BSD flushes the kernel buffer when the filter is set.
There is also a seperate ioctl to do this.

In Linux, this is not the case.  Any packets that are upstream of the BPF mechanism will receive filtering, while any packets that are already "in flight" somewhere within the kernel will not be so filtered.  It gets more "interesting" in that the actual filter can be parceled out to a separate kernel thread (if the kernel is so-configured), which must be instantiated before it switches itself into the packet flow; they show up in top(8) albeit somewhat cryptically.

Here's the race in utterly simplistic code:

   L1	socket = socket_connect_bind_accept_etc();
   L2	apply_bpf_filter(socket);
   L3	bytes = read(socket, buffer, sizeof(buffer));

Line #2 applies a filter, but there is no telling where that filter mechanism is located relative to the various queues and buffers that hold packets.  In BSD, line #2 magically moves packets back before line

There is nothing magical about BSD here.
The action is very clearly documented in bpf(4) - see BIOCSETF ioctl.
Here's the oldest manual page from 1994:
https://netbsd.gw.com/cgi-bin/man-cgi?bpf+4+NetBSD-1.3.3

#1 and filters everything; in Linux no packets are moved, any packets already waiting on line #3 will never be filtered by line #2 - by design.  And "by design" here implies any bug I could file against the Linux kernel would be closed as "wontfix" because it is passing control of synchronizing duties to userspace as intended.  Userspace is expected to do this:

   L1	socket = socket_connect_bind_accept_etc();
   L2	apply_bpf_filter(socket);
   L3	while(accept(readfd:[socket], timeout:0) > 0)
   L4	    read(socket, NULL, 0);
   L5	bytes = read(socket, buffer, sizeof(buffer));

Lines 3/4 flush whatever might have been in the intermediate queues before the "real" socket read on line 5 takes place.  Not surprisingly, this is exactly what my patch, attached, does: after establishing the packet filter, it then goes and reads every packet that has already been in-queue and discards it.  Then, bpf_write() broadcasts the DHCP request to any listening servers; the packets that get subsequently returned are all guaranteed to be filtered by the now-instantiated filter.

Sorry, but this is just wrong.
We have no way of knowing if a packet in the queue passed the filter or not. We could flush the buffer as you do, but equally we could be flushing packets which were meant for us and passed the filter down the toilet.

As a developer, I want to know when packets I receive from the applied BPF filter have passed the filter so I can process them without any loss.

After applying my patch, dhcpcd now reports 0 invalid UDP packets over several hundred test runs.  And the 1/4 second flush delay is "in the noise" versus the roughly 5-second processing time for one address acquisition.

That's 1/4 second for each BPF socket being opened.
dhcpcd opens two BPF sockets on Linux - one for DHCP and one for ARP.
So that's half a second of idle time just for one interface. Ouch!!!

Please try the attached patch which moves the BPF sanity checks back into the application and doesn't log failures - ie has the same effect.

I also added a big note to myself that because of this, BPF cannot be trusted outside of BSD.

Let me know how it works for you!

Roy
diff --git a/src/arp.c b/src/arp.c
index 594a5854..041902b6 100644
--- a/src/arp.c
+++ b/src/arp.c
@@ -194,25 +194,22 @@ arp_packet(struct interface *ifp, uint8_t *data, size_t len)
 		return;
 	memcpy(&ar, data, sizeof(ar));
 
-	/* These checks are enforced in the BPF filter. */
-#if 0
 	/* Families must match */
 	if (ar.ar_hrd != htons(ifp->family))
 		return;
 	/* Protocol must be IP. */
 	if (ar.ar_pro != htons(ETHERTYPE_IP))
-		continue;
+		return;
 	/* lladdr length matches */
 	if (ar.ar_hln != ifp->hwlen)
-		continue;
+		return;
 	/* Protocol length must match in_addr_t */
 	if (ar.ar_pln != sizeof(arm.sip.s_addr))
 		return;
 	/* Only these types are recognised */
 	if (ar.ar_op != htons(ARPOP_REPLY) &&
 	    ar.ar_op != htons(ARPOP_REQUEST))
-		continue;
-#endif
+		return;
 
 	/* Get pointers to the hardware addresses */
 	hw_s = data + sizeof(ar);
diff --git a/src/bpf.c b/src/bpf.c
index 51094b4b..c0cfd389 100644
--- a/src/bpf.c
+++ b/src/bpf.c
@@ -57,6 +57,20 @@
 #include "if.h"
 #include "logerr.h"
 
+/*
+ * NOTE TO SELF.
+ *
+ * Even though we program the BPF filter should we trust it?
+ * On Linux at least there is a window between opening the socket,
+ * binding the interface and setting the filter where we receive data.
+ * This data is NOT checked OR flushed and IS returned when reading.
+ * Even if Linux does fix this sorry state, who is to say other kernels
+ * could have bugs causing a similar effect?
+ *
+ * As such, let's strive to keep the filters just for pattern matching
+ * to avoid waking dhcpcd up.
+ */
+
 #define	ARP_ADDRS_MAX	3
 
 /* BPF helper macros */
@@ -410,13 +424,7 @@ bpf_cmp_hwaddr(struct bpf_insn *bpf, size_t bpf_len, size_t off,
 #endif
 
 #ifdef ARP
-
 static const struct bpf_insn bpf_arp_ether [] = {
-	/* Ensure packet is at least correct size. */
-	BPF_STMT(BPF_LD + BPF_W + BPF_LEN, 0),
-	BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, sizeof(struct ether_arp), 1, 0),
-	BPF_STMT(BPF_RET + BPF_K, 0),
-
 	/* Check this is an ARP packet. */
 	BPF_STMT(BPF_LD + BPF_H + BPF_ABS,
 	         offsetof(struct ether_header, ether_type)),
@@ -552,17 +560,8 @@ bpf_arp(struct interface *ifp, int fd)
 }
 #endif
 
-#define	BPF_M_FHLEN	0
-#define	BPF_M_IPHLEN	1
-#define	BPF_M_IPLEN	2
-#define	BPF_M_UDP	3
-#define	BPF_M_UDPLEN	4
-
 #ifdef ARPHRD_NONE
 static const struct bpf_insn bpf_bootp_none[] = {
-	/* Set the frame header length to zero. */
-	BPF_STMT(BPF_LD + BPF_IMM, 0),
-	BPF_STMT(BPF_ST, BPF_M_FHLEN),
 };
 #define BPF_BOOTP_NONE_LEN	__arraycount(bpf_bootp_none)
 #endif
@@ -574,10 +573,8 @@ static const struct bpf_insn bpf_bootp_ether[] = {
 	BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, ETHERTYPE_IP, 1, 0),
 	BPF_STMT(BPF_RET + BPF_K, 0),
 
-	/* Load frame header length into X. */
-	BPF_STMT(BPF_LDX + BPF_W + BPF_IMM, sizeof(struct ether_header)),
-	/* Copy frame header length to memory */
-	BPF_STMT(BPF_STX, BPF_M_FHLEN),
+	/* Advance to the IP header. */
+	BPF_STMT(BPF_LDX + BPF_K, sizeof(struct ether_header)),
 };
 #define BPF_BOOTP_ETHER_LEN	__arraycount(bpf_bootp_ether)
 
@@ -591,15 +588,6 @@ static const struct bpf_insn bpf_bootp_filter[] = {
 	BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, 0x40, 1, 0),
 	BPF_STMT(BPF_RET + BPF_K, 0),
 
-	/* Ensure IP header length is big enough and
-	 * store the IP header length in memory. */
-	BPF_STMT(BPF_LD + BPF_B + BPF_IND, 0),
-	BPF_STMT(BPF_ALU + BPF_AND + BPF_K, 0x0f),
-	BPF_STMT(BPF_ALU + BPF_MUL + BPF_K, 4),
-	BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, sizeof(struct ip), 1, 0),
-	BPF_STMT(BPF_RET + BPF_K, 0),
-	BPF_STMT(BPF_ST, BPF_M_IPHLEN),
-
 	/* Make sure it's a UDP packet. */
 	BPF_STMT(BPF_LD + BPF_B + BPF_IND, offsetof(struct ip, ip_p)),
 	BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, IPPROTO_UDP, 1, 0),
@@ -610,52 +598,17 @@ static const struct bpf_insn bpf_bootp_filter[] = {
 	BPF_JUMP(BPF_JMP + BPF_JSET + BPF_K, 0x1fff, 0, 1),
 	BPF_STMT(BPF_RET + BPF_K, 0),
 
-	/* Ensure IP length is big enough to hold the UDP + BOOTP payload and
-	 * store IP length in memory. */
-	BPF_STMT(BPF_LD + BPF_H + BPF_IND, offsetof(struct ip, ip_len)),
-	BPF_JUMP(BPF_JMP + BPF_JGE + BPF_K, BOOTP_MIN_SIZE, 1, 0),
-	BPF_STMT(BPF_RET + BPF_K, 0),
-	BPF_STMT(BPF_ST, BPF_M_IPLEN),
-
 	/* Advance to the UDP header. */
-	BPF_STMT(BPF_LD + BPF_MEM, BPF_M_IPHLEN),
+	BPF_STMT(BPF_LD + BPF_B + BPF_IND, 0),
+	BPF_STMT(BPF_ALU + BPF_AND + BPF_K, 0x0f),
+	BPF_STMT(BPF_ALU + BPF_MUL + BPF_K, 4),
 	BPF_STMT(BPF_ALU + BPF_ADD + BPF_X, 0),
 	BPF_STMT(BPF_MISC + BPF_TAX, 0),
 
-	/* Store UDP location */
-	BPF_STMT(BPF_STX, BPF_M_UDP),
-
 	/* Make sure it's from and to the right port. */
 	BPF_STMT(BPF_LD + BPF_W + BPF_IND, 0),
 	BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, (BOOTPS << 16) + BOOTPC, 1, 0),
 	BPF_STMT(BPF_RET + BPF_K, 0),
-
-	/* Store UDP length. */
-	BPF_STMT(BPF_LD + BPF_H + BPF_IND, offsetof(struct udphdr, uh_ulen)),
-	BPF_STMT(BPF_ST, BPF_M_UDPLEN),
-
-	/* Ensure that UDP length + IP header length == IP length */
-	/* Copy IP header length to X. */
-	BPF_STMT(BPF_LDX + BPF_MEM, BPF_M_IPHLEN),
-	/* Add UDP length (A) to IP header length (X). */
-	BPF_STMT(BPF_ALU + BPF_ADD + BPF_X, 0),
-	/* Store result in X. */
-	BPF_STMT(BPF_MISC + BPF_TAX, 0),
-	/* Copy IP length to A. */
-	BPF_STMT(BPF_LD + BPF_MEM, BPF_M_IPLEN),
-	/* Ensure X == A. */
-	BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_X, 0, 1, 0),
-	BPF_STMT(BPF_RET + BPF_K, 0),
-
-	/* Advance to the BOOTP packet. */
-	BPF_STMT(BPF_LD + BPF_MEM, BPF_M_UDP),
-	BPF_STMT(BPF_ALU + BPF_ADD + BPF_K, sizeof(struct udphdr)),
-	BPF_STMT(BPF_MISC + BPF_TAX, 0),
-
-	/* Make sure it's BOOTREPLY. */
-	BPF_STMT(BPF_LD + BPF_B + BPF_IND, offsetof(struct bootp, op)),
-	BPF_JUMP(BPF_JMP + BPF_JEQ + BPF_K, BOOTREPLY, 1, 0),
-	BPF_STMT(BPF_RET + BPF_K, 0),
 };
 
 #define BPF_BOOTP_FILTER_LEN	__arraycount(bpf_bootp_filter)
@@ -735,14 +688,8 @@ bpf_bootp(struct interface *ifp, int fd)
 	}
 #endif
 
-	/* All passed, return the packet - frame length + ip length */
-	BPF_SET_STMT(bp, BPF_LD + BPF_MEM, BPF_M_FHLEN);
-	bp++;
-	BPF_SET_STMT(bp, BPF_LDX + BPF_MEM, BPF_M_IPLEN);
-	bp++;
-	BPF_SET_STMT(bp, BPF_ALU + BPF_ADD + BPF_X, 0);
-	bp++;
-	BPF_SET_STMT(bp, BPF_RET + BPF_A, 0);
+	/* All passed, return the packet. */
+	BPF_SET_STMT(bp, BPF_RET + BPF_K, BPF_WHOLEPACKET);
 	bp++;
 
 	return bpf_attach(fd, bpf, (unsigned int)(bp - bpf));
diff --git a/src/dhcp.c b/src/dhcp.c
index 76ead6dc..74c9c2c0 100644
--- a/src/dhcp.c
+++ b/src/dhcp.c
@@ -3270,9 +3270,40 @@ get_udp_data(void *packet, size_t *len)
 	return p;
 }
 
-static int
-valid_udp_packet(void *packet, size_t plen, struct in_addr *from,
-	unsigned int flags)
+static bool
+is_packet_udp_bootp(void *packet, size_t plen)
+{
+	struct ip *ip = packet;
+	size_t ip_hlen;
+	struct udphdr *udp;
+
+	if (sizeof(*ip) > plen)
+		return false;
+
+	if (ip->ip_v != IPVERSION || ip->ip_p != IPPROTO_UDP)
+		return false;
+
+	/* Sanity. */
+	if (ntohs(ip->ip_len) != plen)
+		return false;
+
+	ip_hlen = (size_t)ip->ip_hl * 4;
+	/* Check we have a UDP header and BOOTP. */
+	if (ip_hlen + sizeof(*udp) + offsetof(struct bootp, vend) > plen)
+		return false;
+
+	/* Check it's to and from the right ports. */
+	udp = (struct udphdr *)(void *)((char *)ip + ip_hlen);
+	if (udp->uh_dport != htons(BOOTPC) || udp->uh_sport != htons(BOOTPS))
+		return false;
+
+	return true;
+}
+
+/* Lengths have already been checked. */
+static bool
+checksums_valid(void *packet,
+    struct in_addr *from, unsigned int flags)
 {
 	struct ip *ip = packet;
 	struct ip pseudo_ip = {
@@ -3281,69 +3312,34 @@ valid_udp_packet(void *packet, size_t plen, struct in_addr *from,
 		.ip_dst = ip->ip_dst
 	};
 	size_t ip_hlen;
-	uint16_t ip_len, udp_len, uh_sum;
+	uint16_t udp_len, uh_sum;
 	struct udphdr *udp;
 	uint32_t csum;
 
-	if (plen < sizeof(*ip)) {
-		if (from != NULL)
-			from->s_addr = INADDR_ANY;
-		errno = ERANGE;
-		return -1;
-	}
-
 	if (from != NULL)
 		from->s_addr = ip->ip_src.s_addr;
 
-	/* Check we have the IP header */
 	ip_hlen = (size_t)ip->ip_hl * 4;
-	if (ip_hlen > plen) {
-		errno = ENOBUFS;
-		return -1;
-	}
+	if (in_cksum(ip, ip_hlen, NULL) != 0)
+		return false;
 
-	if (in_cksum(ip, ip_hlen, NULL) != 0) {
-		errno = EINVAL;
-		return -1;
-	}
-
-	/* Check we have a payload */
-	ip_len = ntohs(ip->ip_len);
-	if (ip_len <= ip_hlen + sizeof(*udp)) {
-		errno = ERANGE;
-		return -1;
-	}
-	/* Check IP doesn't go beyond the payload */
-	if (ip_len > plen) {
-		errno = ENOBUFS;
-		return -1;
-	}
+	if (flags & BPF_PARTIALCSUM)
+		return 0;
 
-	/* Check UDP doesn't go beyond the payload */
 	udp = (struct udphdr *)(void *)((char *)ip + ip_hlen);
-	udp_len = ntohs(udp->uh_ulen);
-	if (udp_len > plen - ip_hlen) {
-		errno =  ENOBUFS;
-		return -1;
-	}
-
-	if (udp->uh_sum == 0 || flags & BPF_PARTIALCSUM)
+	if (udp->uh_sum == 0)
 		return 0;
 
 	/* UDP checksum is based on a pseudo IP header alongside
 	 * the UDP header and payload. */
+	udp_len = ntohs(udp->uh_ulen);
 	uh_sum = udp->uh_sum;
 	udp->uh_sum = 0;
 	pseudo_ip.ip_len = udp->uh_ulen;
 	csum = 0;
 	in_cksum(&pseudo_ip, sizeof(pseudo_ip), &csum);
 	csum = in_cksum(udp, udp_len, &csum);
-	if (csum != uh_sum) {
-		errno = EINVAL;
-		return -1;
-	}
-
-	return 0;
+	return csum == uh_sum;
 }
 
 static void
@@ -3352,13 +3348,12 @@ dhcp_handlebootp(struct interface *ifp, struct bootp *bootp, size_t len,
 {
 	size_t v;
 
-	/* udp_len must be correct because the values are checked in
-	 * valid_udp_packet(). */
 	if (len < offsetof(struct bootp, vend)) {
 		logerrx("%s: truncated packet (%zu) from %s",
 		    ifp->name, len, inet_ntoa(*from));
 		return;
 	}
+
 	/* To make our IS_DHCP macro easy, ensure the vendor
 	 * area has at least 4 octets. */
 	v = len - offsetof(struct bootp, vend);
@@ -3378,14 +3373,13 @@ dhcp_handlebpf(struct interface *ifp, uint8_t *data, size_t len)
 	size_t udp_len;
 	const struct dhcp_state *state = D_CSTATE(ifp);
 
-	if (valid_udp_packet(data, len, &from, state->bpf_flags) == -1) {
-		const char *errstr;
+	/* Validate filter. */
+	if (!is_packet_udp_bootp(data, len))
+		return;
 
-		if (errno == EINVAL)
-			errstr = "checksum failure";
-		else
-			errstr = "invalid UDP packet";
-		logerrx("%s: %s from %s", ifp->name, errstr, inet_ntoa(from));
+	if (!checksums_valid(data, &from, state->bpf_flags)) {
+		logerrx("%s: checksum failure from %s",
+		    ifp->name, inet_ntoa(from));
 		return;
 	}
 
diff --git a/src/if-linux.c b/src/if-linux.c
index 18044bbf..1b1357fd 100644
--- a/src/if-linux.c
+++ b/src/if-linux.c
@@ -1419,9 +1419,6 @@ bpf_open(struct interface *ifp, int (*filter)(struct interface *, int))
 		state->buffer_len = state->buffer_pos = 0;
 	}
 
-	if (filter(ifp, s) != 0)
-		goto eexit;
-
 #ifdef PACKET_AUXDATA
 	n = 1;
 	if (setsockopt(s, SOL_PACKET, PACKET_AUXDATA, &n, sizeof(n)) != 0) {
@@ -1436,6 +1433,19 @@ bpf_open(struct interface *ifp, int (*filter)(struct interface *, int))
 	su.sll.sll_ifindex = (int)ifp->index;
 	if (bind(s, &su.sa, sizeof(su.sll)) == -1)
 		goto eexit;
+
+	if (filter(ifp, s) != 0)
+		goto eexit;
+
+	/*
+	 * At this point we could have received packets for the wrong
+	 * interface or which don't pass the filter.
+	 * Linux should flush upon setting the filter like every other OS.
+	 * There is no way of flushing them from userland.
+	 * As such, consumers need to inspect each packet to ensure it's valid.
+	 * Or to put it another way, don't trust the Linux BPF filter.
+	 */
+
 	return s;
 
 eexit:

Follow-Ups:
Re: [Security][Patch] SEGV in dhcpcd-8.0.4Roy Marples
Archive administrator: postmaster@marples.name