Roy's Blog

A Hacker's musings on Code | Tech | Life

open_memstream is one of the more important functions added to POSIX libc of late. It's so important because it makes the generation of strings really easy - you no longer need to care about allocating the right amount of memory as the library will do it for you. Now, there's many functions that already help with this, such as asprintf but that's not standard and if you want to create many strings in one area you still need to care about the size of the area. You want to create an area if you have many strings, because it's more efficient for malloc and if you keep the area around and re-use it then it avoids memory fragmentation.

Now, to be clear, you have been able to do this since forever using fopen, writing to the file and then allocating your area based on the final file size. Still, it requires some memory management still but more importantly it writes to a file. Writing to a file is slow and reduces the life span of the disk you're writing to. It's only been fairly recently that tmpfs was a thing, but even today not all OS's have /tmp mounted as tmpfs. Requiring this isn't exactly ideal for a generic program to do - the setup and install should be easy. Because of all these reasons, most programs worked the string length needed and either allocated an area or string and then finally write the string. However, while saving the disk, it's also a lot more error prone because you need to work out the length of everything and that's not always trivial, especially for things like a DHCP client which is always building strings based on the information given by the DHCP server.

Here's an example of open_memstream in action:

 * Example program which manages an area of environment strings
 * to send to child programs.

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

static const char *foo = "foo";
static const char *bar = "bar";

int main(void)
        char *argv[] = { "/usr/bin/env", NULL };
        char *buf, *ep, *p, **env, **envp;
        size_t buflen, nenv, i;
        FILE *fp = open_memstream(&buf, &buflen);

        fprintf(fp, "FOO=%s", foo);
        fputc('\0', fp);
        fprintf(fp, "BAR=%s", bar);
        fputc('\0', fp);

        /* We could keep fp around as our area and just rewind it. */

        /* execve relies on a trailing NULL */
        nenv = 1;
        for (p = buf, ep = p + buflen; p < ep; p++) {
                if (*p == '\0')

        /* reallocarray(3) should be standard really */
        envp = env = malloc(nenv * sizeof(char *));
        *envp++ = buf;
        for (p = buf, ep--; p < ep; p++) {
                if (*p == '\0')
                        *envp++ = p + 1;
        *envp = NULL;

        execve(argv[0], argv, env);

As you can see, we only manage the environment array handed to to execve - open_memstream is managing our string area and fprintf is working out the length each string needs to be for us. This vastly reduces the complexity and increases the security and reliabilty of creating large environment strings, which most DHCP clients do. We could also write a helper function to write the string AND the trailing NULL terminator for the string to be more efficient. You'll get to see this in dhcpcd-8 which should be released later this year.

Continue reading...

Minor update with the following changes:

  • More strict POSIX shell support
  • Interfaces have an implicit metric of 0 unless specified
  • Inline comments are stripped from nameserver and domain entries

The last upate was in 2016, so this is proving to be very stable :)

Continue reading...

Pretty much every piece of software I've seen uses a list of objects. When you need to easily grow and shrink this list easily you then need something like a Linked List. dhcpcd has used very popular and widely available BSD based queue(3), specifically a tailq. The main advantages of this type of list are:

  • Very low extra memory needed for it's implementation
  • Fast at insertion and removal operation - both are O(1)

However, it's just a list. And like any list, to find something we need to pick a starting point and then search in a single direction and match each item until we find it. This is described as O(N) and it's performance roughly corrolates to the Number of objects in the list. If you need to insert at an arbitary point (for example to order the list) you need to search it, so this is O(N) also.

But is this slow?

Typically for dhcpcd, the answer is no because on a desktop, tablet, phone or SOHO router you only have a handful of objects in all the lists. However, someone ran dhcpcd on a switch with millions of routes and dhcpcd was taking minutes of CPU time for routing table modifications. So in this corner case ... yes! dhcpcd is slow! A suggestion to improve matters was given in that report - use a Red-Black Tree instead of a Linked List. A RB tree guarantees O(log N) performance for searching. A patch was even submitted based on OpenBSD's tree implementation! This reduced the dhcpcd runtime from minutes to seconds in his situation.

Now you can't just swap in a RB Tree for a Linked List. Each object within an RB Tree has to have a unique key. Also, you need to have comparison function to define an order to the objects within the tree. Only then can you start to implement it. To find an object within the tree, you just need to know it's key and off you go. I settled on using NetBSD's rbtree(3) implementation instead of the initial suggestion to use the generic BSD tree(3). This mainly to reduce the binary size of dhcpcd because it's in libc on NetBSD and on other systems it will creates a smaller binary due to the complexity of the tree(3) macros. Testing also showed it was slightly faster and also easier to actually develop with.

So should you replace all LinkedLists with RB Trees? No! That would be silly :) After all, sometimes you don't actually need to search or order a list - a lot of the time, lists are used for push/pop operations and thus gain nothing from RB Tree.

You'll get to see this in dhcpcd-8 which should be released later this year.

Continue reading...

In my continuing efforts to entirely self host, fighting spam is hard. I originally configured SpamAssassin on my mail server quite a few years ago, and to be fair it has done it's job. But recently, more spam has been creeping through and my ever growing stack of addons (such as policyd-spf, OpenDKIM, OpenDMARC and others) to SA was eating quite a lot of memory on my poor server.

So I shopped around and found Rspamd. For my needs it sounded wonderful - no more need for MySQL (it's a hard dependency of OpenDMARC) as I much prefer PostgreSQL. SPF, DKIM and DMARC all integrated. Written in C and LUA which is a massive improvement over Perl and Python. Also sports a shiny Web UI to monitor the server and do basic config. Speaking of config, it's still not entirely easy, but it's much easier than configuring the stack I used to have! I did have to patch the build so that it works with OpenSSL-1.1 which is now in pkgsrc. All in all, I anticpated a nice memory reduction once I had it all configured. So far it's using about 200Mb less memory, but it's early days. How much better or worse than SA it is at actual spam filtering remains to be seem, but I have high hopes.

While here, I also replaced procmail with PigeonHole. I didn't really need to do this, but I thought "As I'm here.....". Actually the end result is much nicer as I now only have one Spam folder instead of another two Spam folders for training ham and spam. I just need to hook this final part into how I manage spam on my mlmmj email lists.

Continue reading...

OK, it's not really in the news, but today I got a message of apprecation for what openresolv does.

Thanks for openresolv. If only the whole world used it... 

Rome wasn't built in a day. But it's getting there - openresolv can be found in NetBSD and FreeBSD base systems. It's available in most other OS's package respositories to at least depend upon.


Continue reading...