Nix and BSD: NetBSD

As announced on the Nix discourse:

We can also cross compile OpenBSD binaries now!

Nixpkgs branch: https://github.com/obsidiansystems/nixpkgs/tree/openbsd-rollup (will be making PRs from here and force pushing it, be warned)

Command:

nix-build --arg crossSystem '{ system = "x86_64-openbsd"; isStatic = true; }' -A hello

Chat room: #nix-openbsd:tapenet.org which already existed, thanks to @qbit who previous took a stab at at this, and more recently answered many questions.
Let's revive it!

With this milestone, we can now cross compile to all three BSDs! To celebrate this occasion, we're writing a series of blog posts about the history and current status of Nix and BSD projects — there's a lot going on!


Cross, Linux, and Bare Metal

In 2017, I was able to realize my long-sought goal of revitalizing Nixpkgs's cross compilation, as part of Obsidian's project of creating end-to-end Nixified builds of iOS and Android mobile apps written in Haskell — what would become our Obelisk web-and-mobile-together application framework.

Nixpkgs cross compilation support has since exploded far more than I could ever dream.
Atop the core infrastructure we put in place, numerous volunteers have fixed hundreds — if not thousands — of individual packages.
Entire system images can be cross compiled.
Support for many new instruction set architectures, web assembly, and more was added.
Freestanding/embedded toolchains (and the few libraries that could compile to them) were added.
More languages beyond C, C++, and Haskell gained support.

Still, there was a noticeable gap.
We could build from scratch for Linux a zillion different ways, we could build freestanding, and we could build with opaque vendor-provided binaries for iOS and Android, but we lacked support for other full-fledged free software operating systems.
In other words, we ran the risk of being overfit to Linux — binary blob bootstraps for other systems just don't exercise the configuration space the same was a fully bootstrapped open source ones do.

Enter NetBSD

This began to change when my then-coworker (Matthew Bauer) first started packaging NetBSD packages, and then added cross compilation support.
Later Alyssa Ross and sternenseemann also helped out a lot.

The in general BSDs were hardly designed for such thing.
Contrary to what Wikipedia has to say about the origin of monorepos, traditional Unix had been following the practice for decades longer -- chucking the kernel, libc and libraries, and many utilities all in one repo.
Like with all monorepos, this is great for being able to make cross-cutting edits that modify multiple components simultaneously, avoiding the synchronization overhead of modifying many repos that may not have coordinated testing and release cycles.
Less good — since everything is always being built together, and make isn't exactly know for its hygiene — its possible for components to bleed together.

Thankfully, NetBSD, with its emphasis on the portability, is the most forgiving of three in this regard.
Cross compilation from other systems is an intentional feature, and while we didn't use their exact supported method (for reasons that will be discussed), their method nevertheless provided us a with a few crucial components necessary to pull of the bootstrapping.

Interlude: cross compilation 101

Before we get to what those crucial components are, and the NetBSD bootstrap in general, let's go over some cross compilation basics to put that information in context
What does it a take to cross compile to a platform?
Same as native compilation viewed from scratch, you need:

  1. The tools themselves: compilers, linkers that know how to "target" — produce machine code for — the platform in question.
    This typically involves much work work for the instruction set archicture than the OS.

  2. Library code that runs the platform in question.
    This is a bit more subtle.
    Most code is portable — pure code, code using interfaces that are agreed-upon standards with many implementations, etc.
    But the syscalls by which a userspace process comunicates with the operating system kernel are typically portable — not between kernels, and potentially not even from one version of the kernel to the next.
    A program that actually accomplishes something (other than warming up the computer) needs to use syscalls, which means that if one follows the rabbit hole implementing even portable interfaces, we'll eventually get down to non-portable system calls.

For entirely historical reasons, libraries in most languages end up depending on C libraries, and C libraries end up depending on C's standard library, known as "libc", and that is what what ends up wrapping the system calls in reguarlay-called C functions.
BSDs are nothing if not grounded in history, but because they typically have what's known as an "unstable ABI" — a syscall interface which changes in potentially-breaking ways between kernel versions, unlike Linux — it's all the more economical to go through libc and not implement the syscalls some other way.
NetBSD, however, has a stable ABI, so we don't need to worry about this bit so much at this point.

Finally, just as some portable interfaces need to be implemented non-portably atop syscalls, some operations need to be implemented non-portably atop special instructions.
Well, all (compiled) operations end up as machine code which is non-portable, and the compiler is responsible for this.
But some weirder operations, and weirder instructions, the compiler doesn't want to know so about.
Instead of keeping that logic, which would be a bit "on-off" in the compiler backend, compilers choose to have regular functions with "polyfilled" [inline assembly](TODO WIKI) to expose an abstract over these weirder instructions.
This functions go in a "builtins" library.
Compilers don't have to use this method, but GCC and Clang — the main ones we support — do.

Putting everything together, the complete C toolchains we are going for will have these parts:

  • Libraries:

    1. builtins library (libgcc for GCC, libcompiler-rt for Clang)
    2. C standard library (e.g. GNU libc, Musl, and each BSD's own libc)
    3. (Bonus, not discussed above) C++ standard library (e.g. libstdc++ from GCC land, or libc++ from LLVM land)
  • Tools:

    1. Linker (e.g. GNU ld from GCC land, GNU gold from GCC land, LLD from LLVM land)
    2. Compiler (e.g. GCC, Clang)
    3. (Bonus, not discussed above) other "binary tools" (e.g. GNU binutils, or LLVM's replacements).

Keep this in mind when we go over the bootstrap: ultimately, we are trying to build packages to fill these roles to produce our toolchain.

The Bootstrap

With the above background, we hopefully have enough information to make sense of this work, the full-fledged NetBSD cross toolchain, and packages needed to build it, and their dependencies.
It looks approximately like this:

flowchart TD
%% Nodes
    Make>"NetBSD Make"]
    Compat("NetBSD Compat")
    LOrder>"LOrder"]
    TSort>"TSort"]

    Headers("NetBSD Headers")
    CSU("C Start-Up")
    LibcMinimal("Libc proper")
    Libm("Libm")
    Libpthread("Libpthread")
    bundle("Augmented libc ")

    Binutils>"Binutils"]
    GCC_1{{"GCC, libgcc"}}
    GCC_2{{"GCC, libgcc, libstdc++, other libs"}}

%% Edge connections between nodes
    subgraph Tools [NetBSD Tools]
    Make --> Compat & LOrder & TSort
    Compat -.-> LOrder & TSort
    end

    Binutils --> GCC_1
    GCC_1 & Binutils & Tools --> Libc

    subgraph Libc [NetBSD libc]
    Headers & CSU -.-> LibcMinimal
    LibcMinimal -.-> Libm & Libpthread
    LibcMinimal & Libm & Libpthread -.-> bundle
    end

    Libc -.-> GCC_2

%% You can add notes with two "%" signs in a row!

classDef Portable fill:#AA00FF, stroke:#AA00FF
class Binutils,GCC_1,GCC_2 Portable

classDef NetBSD stroke:#2962FF, fill:#2962FF
class Make,Compat,LOrder,TSort NetBSD
class Headers,CSU,LibcMinimal,Libm,Libpthread,bundle NetBSD

Legend:

Node Colors:

  • Blue: NetBSD package
  • Purple: General Nixpkgs package not associated with any particular OS'

Node Shapes:

  • Round: Library
  • Flag shape: Executable
  • (Irregular) hexagon: Libraries and Executables mix

Edges:

  • Solid arrow: "from" is an executable used to build "to"
  • dotted arrow: "from" is a library linked by "to"

The purple part is our general GCC-base cross compilation bootstrap, unmodified for NetBSD — which is good! Simply put it is:

  1. Build tools we need (Binutils, GCC)
  2. Build libc for the platform in question — the platform specific step
  3. Build GCC again for better libraries.

The first build of GCC gives us a GCC, and a libgcc.
The second build gives us GCC again, a potentially more featureful libgcc, libstdc++, and perhaps a few other libraries.

Aside: That we build GCC twice is a bit of a bummer — we only need more libraries, librariess that that rely on libc and thus cannot be built in the first GCC build.
The second copy of the compiler itself is completely redundant.
But resolving this is a separate pre-existing issue to be discussed some other time, not a BSD-specific problem.

The NetBSD Packages

Step 2, the blue step, is where the NetBSD-specific packages are.
If we could just build libc and be done, that would be easy, but the BSDs give us a few extra challenges.
Thankfully, this is where NetBSD's extra support for cross compilation comes in to help.

NetBSD tools

The first challenge is that NetBSD needs custom tools at build-time too in order to build any NetBSD package.
For example, NetBSD's Make and GNU Make have diverged from the original Unix make in enough ways that we cannot use one of them when the other is expected — for NetBSD packages, we need NetBSD make.
NetBSD packages all depend on NetBSD "standard library" of make rules, and those too use a few misc utilies such as lorder and tsort.

NetBSD's Make itself is also normally built with NetBSD's Make.
This cycle would make us stuck, right on the first package we need.
Thankfully, NetBSD make also has a simple shell-script-only bootstrap build system, with the pair of a GNU autotools configure script and simple build script.
The code itself is also very portable, and so those scripts are enough: we can do a simple build of NetBSD Make with no other dependencies, linking regular Linux (or wherever we are cross-compiling from) libraries, for a NetBSD Make we can run on linux.
One package down!

The next step is building the other commands, like lorder and tsort (and others we've elided from the graph).
These don't have quite red carpet treatment that Make does in terms of portability.
For one, they do in fact require NetBSD Make, rather than having a fallback bootstrap build system.
But this is fine — we have a NetBSD Make we can run on our build platform now.
But more than that, they also rely on various idiosyncrasies of NetBSD's libc and other libraries.
We can't just build them against GNU libc or Musl libc because mischellaenous differencess in the the API will cause build failures.

The solution is NetBSD's compat layer, which wraps some headers and provides missing to make other libc implementations look more like NetBSD's.
This package does use BSD Make, but it has configure script to detect what the host libc is missing.
(Regular BSD packages on the other hand don't have configure scripts, because they know they are running on BSD!)
And it doesn't depend on the other NetBSD tools like lorder or tsort either, so we don't have any other NetBSDs to worry about it.
We can build compat next, and then use it to build those tools.
After that, we have all the NetBSD programs we'll at build time, we're ready to tackle libc.

NetBSD libc

The tools we just built run on Linux/macOS/whatever our build platform is.
With all those built, we're ready to go on to the host platform, i.e.
NetBSD itself.
The libraries we'll built will be the first things that are actually cross-compiled.
They won't need compat, because we're not trying to get NetBSD code compiled for another platform, but we'll be building NetBSD code for NetBSD.
They'll use tsort and lorder as part of the building process, but our library build products won't retain references to those executables, because we don't need non-NetBSD binaries on NetBSD.

We've been using "package" so far on analogy with Linux distributions and in standard Nixpkgs practice.
But BSDs don't really consider all these base-system conponents in the src repo — everything we're using from NetBSD in our toolchain — separate packages per se.
One example of this is that the libc headers (and some other libraries' too) are not part of each library's components, but instead all together in a separate include component.
BSDs assume one must install all of these libraries, so it doesn't matter so much which header file goes with which library.
The libc headers area very easy package to "build" — we're doing little more than copying header files into their installation location.
So building this next serves as a good test that our basic NetBSD make and building-for-NetBSD infrastructure is in place without needing any working cross compilers or other heftier depenencies.

Also needed are the "C start-up" objects.
These are crt0.o, crtbegin.o, crtend.o, among others.
This code that is directly linked into the executable which does tasks like initializing the stack, running constructors for C++ globals, and other tasks.
Fundamentally, this initialization code is interfacing between C (and C++) standards, libc implementation details, and the OS syscall interface — it is morally part of libc since many of these interfaces are internal and unstable, but is always statically linked (not just part of libc.so or similar) because this initialization needs to happen before dynamic linking itself happens.
When compiling executables and libraries, including libc, the C compiler will expect these objects to already exist, so we must build them early too.

With the headers and C start-up objects both built, we can then move onto libc itself.
Actual libc builds fine, but has less extra bells and wistles than GNU libc.
(e.g. Posix threads are a separate library — -pthread actually does something.)
Because of Nixpkgs's origins as just Linux packages, many packages have come to expect this more featureful libc.
The easiest thing is to just throw in more functionality to meet this expectation.
We're not alone in this: Musl has added extra features too for this same reason.
In the case of NetBSD (and the other BSDs) we have the libraries, and is is OK if they are separate.
We just need them all to appear in the same package.
So what we do is first build a "libc minimal" — libc according to BSD, then build some other basic libraries that are part of libc according to GNU libc and Musl, and finally combine all these together into one package.

Aside: This feature creep is a bit of a Postel's law slippery slope — does any new platform with a few libc feature force the other libcs to add it?
The other opposite approach would be to do the opposite, and try to break up GNU libc into multiple packages.
(This can be done without actually breaking up libc.so, making it more feasible than it at first might sound.)
I wrote a Nixpkgs issue for this: #122416`.
Intersecting rather than unioning libc feature sets is a bit more elegant, per the principle of "only pay for what you use", but it will probably not be worth the effort until BSD (and other "exotic" platform) usage in Nixpkgs is much more widespread.

With the (combined) libc package built, we can resume the generic NetBSD-agnostic bootstrap, per our diagram, building GCC a second time.
After that, and building few small Nixpkgs wrapper scripts (not BSD-specific) to put all the pieces together, we have our complete toolchain!

Discussions

So we separate the NetBSD's build steps into package that fit into Nixpkgs's general cross bootstrapping idioms.
The end result, I'm happy to report, is quite nice, only deparating from the Linux bootstrap for the package that are, in fact NetBSD-specific, just as GNU libc (mostly) and musl are Linux-specfic.

What does that dependency graph toolchain look like?
Something like this:

The Nix way of doing things completely overcomes the traditional trade-offs between one monorepo and many package repos:

  • By having deployment- or project-based root repos set in stone all the exact source code used for all packages, one has the exact control of the monorepo approach

  • By not having to physically vendor all the source code, and supporting alternatives like repo references and patch files, one also has the modularity discipline of separate repos, and easier tracking upstream projects.

But to reap the benefits, integrating BSD source code into our general Nixpkgs toolchain infra, it is necessary to reseparate those blurred together components, and get things once again building in rigid isolation.
NetBSD does have comprehensive instructions for cross compiling it from scratch, but if we were to follow them exactly with Nix, we'd have to build the whole BSD world in one singled mammoth sandboxed step.
There's two major problems with that however:

  • It would be a terrible debug cycle.

    Like with most build systems, individual Nix build steps are atomic: either they are cached, or they have to be entirely rebuilt.
    Like with most build systems, individual Nix build steps are atomic: either they are cached, or they have to be entirely rebuilt.
    Building an entire NetBSD toolchain in one step would naturally make for a very slow build step.
    If any part of it went wrong, we'd have to repeat all the parts that went correctly just to get back to that point.
    This would be very unproductive.

  • It would be too easy for pointless differences to creep in between the Linux and BSD way of doing things.

The Linux way doesn't build everything in one giant step, and if it it did, it would be a wholly different very large step.
When we have two different ways of building toolchains, not only do we end up with two toolchains that might differ in incidental unnecessary ways, but cascading futher incidental unnecessary differences in downstream packages.

Both of these are frictions that are really important to avoid for the sustainability of our efforts, both in the long and short terms.
Not surprisingly, to just get something working at all in a timely manner, we want a good debug cycle.
And a good debug cycle is, if anything, even more important, after the

When we package a new platform, like a BSD, its integral that

So we separate the NetBSD's build steps into package that fit into Nixpkgs's general cross bootstrapping idioms.
The end result, I'm happy to report, is quite nice, only deparating from the Linux bootstrap for the package that are, in fact NetBSD-specific, just as GNU libc (mostly) and musl are Linux-specfic.