Managing Rust Dependencies with Nix, Part II

Summary

We previously explored the problem of building Rust packages in a contained environment. Now, we’ll spend some time looking at Cargo’s dependency handling.

Company
11 min read

Last Time

In our previous post, we explored the problem of building Rust packages in a contained environment and made some first steps toward a solution. In this post we’ll take those ideas a little further and spend some time looking at Cargo’s dependency handling.

Rather than the slightly insipid example we used in the previous post, this time we’re going to struggle with the openssl crate. Not only is this crate significantly more complicated to allow us to test the robustness of our approach, it also lets us show off one of the primary advantages of integrating Cargo with Nix in the first place, namely that we can now entrust the provisioning of native libraries to Nix and have our Rust programs run on any system without worrying about what native dependencies are or aren’t pre-installed.

Removing Dependencies

If we take our previous attempt from the last post and apply it to the openssl crate, we’ll immediately run into some trouble:

openssl = mkRustCrate rec {
  name = "openssl";
  version = "0.10.15";
  src = fetchFromCratesIo {
    inherit name version;
    sha256 = "0fj5g66ibkyb6vfdfjgaypfn45vpj2cdv7d7qpq653sv57glcqri";
  };
};
error: the lock file /tmp/nix-build-openssl.drv-0/openssl-0.10.15.crate/Cargo.lock needs to be updated but --frozen was passed to prevent this

To avoid Cargo trying to update its lockfile, we’ll modify its inputs to remove any mention of dependencies.

The first part of that is just to replace Cargo.lock with a more innocuous version of our own, which is simple enough: let’s create a new file called Cargo.lock relative to our Nix expression:

[root]
name = "@crateName@"
version = "@crateVersion@"

and then we can just use substituteAll to substitute in our crate-specific values at a later date.

lockFile = substituteAll {
  src = ./Cargo.lock;
  crateName = name;
  crateVersion = version;
};

The next thing that makes Cargo go and try to fetch dependencies is Cargo.toml, and removing the dependencies from that is a little more delicate, since we need the rest of the file in order for Cargo to do its job properly. Unfortunately there aren’t many tools for manipulating TOML, but there does exist remarshal, which can convert TOML to JSON quite happily, and jq, which can do just about anything you might need to do to JSON. So let’s add a little conversion step to our build.sh from last time:

cp $lockFile Cargo.lock
$remarshal -if toml -of json -o Cargo.json Cargo.toml
$jq -f $cargoFilter < Cargo.json 
    | $remarshal -if json -of toml -o Cargo.toml

While we’re here we’ll also extract the value of the crate links property from the Cargo.toml, if one exists.

export CARGO_LINKS=$($jq -r .package.links < Cargo.json)

It’s a testament to the power of jq that our filter is very simple:

def all_dependencies:
  .["dependencies", "dev-dependencies", "build-dependencies"];

def optional_dependencies:
  [ all_dependencies | select(.) ]
  | add // {}
  | map_values(select(type == "object" and .optional));

def augment_features:
  .features = (.features + optional_dependencies);

def remove_dependencies:
  del(all_dependencies)
  | .target |= ((.//{}) | map_values(del(all_dependencies)));

def remove_feature_dependencies:
  .features |= map_values([]);

augment_features
  | remove_feature_dependencies
  | remove_dependencies

This filter does only two things: it removes any dependencies from dependencies or target.*.dependencies sections, and it takes any optional dependencies and converts them into features. An optional dependency in Cargo is essentially a feature flag for which Cargo promises to provide the appropriate dependency; if we simply remove the offending dependency then Cargo will complain that the package’s caller has specified an invalid feature, and the crate itself will be unable to tell whether it should use that dependency or not. Instead, by replacing it with a no-op feature, we allow the crate to continue to test for the feature. The optional dependency itself, of course, will be provided by us.

Dependencies in Cargo

First, let’s take a break for a quick recap of how Cargo handles dependencies. Note that all details in this section are informative about how Cargo handles dependencies at the time of writing on Linux x86_64, and shouldn’t be taken as any kind of prescriptive specification of what Cargo will always do. For the party line, read the Rust and Cargo books, e.g. the chapter on linkage.

Linkage

Primarily, Cargo library crates are provided as rlib files. rlib files are essentially just static libraries, with additional metadata about types, traits, and generic functions — they fulfil the rôle of header plus static library from a C program. Just like static C libraries, it is not conventional to include code from upstream dependencies into an rlib. The final linkage and symbol resolution happens only when the time comes to link an rlib into a binary, and until that time we must aggregate a list of all the dependencies we’ve seen so far.

rustc supports (as far as I can tell) two flags that tell it where to find a dependency in crate form. rustc -L /path/to/library (or, more specifically, rustc -L crate=/path/to/library) will cause rustc to, if it needs a crate named foo-bar, search for /path/to/library/libfoo_bar.rlib or /path/to/library/libfoo_bar-*.rlib. This form is used to provide indirect dependencies. Meanwhile, for direct dependencies of the current compilation (accessed by extern crate foo_bar;rustc supports the rustc --extern foo_bar=/path/to/libfoo_bar.rlib form, which not only links against /path/to/libfoo_bar.rlib, but also uses the metadata contained within the rlib to decide what symbols are made available from the extern crate.

It’s interesting (and later important) to note that part of the metadata contained in the rlib is a string that is used to mangle the symbols in that library. When invoked from Cargo, this string is a short hash calculated from the crate metadata. This enables Cargo to link in different versions of crates willy-nilly with no concern for symbol clashes. When compiling directly against a crate, the rlib metadata is used to reconstruct the specific symbols against which the crate is to be built.

Dependency Types

Cargo supports three classes of dependency, plus target-specific dependencies: a dependency may be a runtime dependency or simply ‘dependency’, a build dependency, or a development dependency. Runtime dependencies are available for use in the program and propagated to any dependants of the crate until a binary crate is reached and the whole dependency tree is linked in. Build dependencies are available only in build scripts and are not available in the source itself or propagated to dependants. Development crates are also not propagated, but are available in the source itself, essentially feature-gated by the test configuration option. They, as well as the option, are provided when running cargo testcargo bench, or cargo --example. In this post we will be looking only at dependencies and build dependencies, which is sufficient to build openssl.

Build Scripts

In addition to receiving special build dependencies, build scripts can output a variety of commands to standard-out that instruct Cargo modify its compilation or linking behaviour somehow. Most of these we can ignore, but a couple need to propagate through the dependency tree. These we will translate into a bash script that can be loaded by our build script in preparation for building. Remember earlier when I said we needed to keep the value of the package.links key around from Cargo.toml? This is it — that key is used to determine the name of the environment variable used by Cargo to pass info to dependants, a behaviour we replicate here.

function parse_depinfo {
    printf 'NIX_RUST_LINK_FLAGS=%qn' "${NIX_RUST_LINK_FLAGS-}"
    cat "$@" | while read line
    do
        [[ "x$line" =~ xcargo:([^=]+)=(.*) ]] || continue
        local key="${BASH_REMATCH[1]}"
        local val="${BASH_REMATCH[2]}"

        case $key in
            rustc-link-lib) ;&
            rustc-flags) ;&
            rustc-cfg) ;&
            rustc-env) ;&
            rerun-if-changed) ;&
            rerun-if-env-changed) ;&
            warning)
            ;;
            rustc-link-search)
                printf 'NIX_RUST_LINK_FLAGS+=" "-L%qn' "$val"
                ;;
            *)
                printf 'export DEP_%s_%s=%qn' 
                       "$(upper $CARGO_LINKS)" 
                       "$(upper $key)" 
                       "$val"
        esac
    done
}

Providing Dependencies

Dependency provision is done in three parts. In build.sh, we go through all the Nix store paths provided in dependencies and buildDependencies, which are available to us as environment variables, and collect any Rust libraries we might find into two directories called deps and build_deps, respectively, reading their dependency info as we go into RUSTFLAGSBUILD_RUSTFLAGSdepFlags, and buildDepFlags:

function add_deps {
    local dep_type=$1; shift
    local dep_dir=$1; shift
    local dep_flags=$1; shift
    for dep in ${!dep_type}
    do
        printf -v$dep_flags "%s --extern %s=%s" 
               "${!dep_flags}" 
               "$(crate_name $dep)" 
               "$dep/lib/lib$(crate_name $dep).rlib"
        stat $dep/lib/deps/* &>/dev/null 
            && cp -dn $dep/lib/deps/* deps 
            || :
        for depinfo in $dep/lib/*.depinfo
        do
            source $depinfo
        done
        for lib in $dep/lib/*.rlib
        do
            local dest=$(basename $lib .rlib)-$(crate_hash $dep).rlib
            copy_or_link "$lib" "$dep_dir/$dest"
        done
    done
}

mkdir deps
add_deps dependencies deps depFlags
RUSTFLAGS="-Ldependency=deps $NIX_RUST_LINK_FLAGS"

link_flags="$NIX_RUST_LINK_FLAGS"
NIX_RUST_LINK_FLAGS=
mkdir build_deps
add_deps buildDependencies build_deps buildDepFlags
BUILD_RUSTFLAGS="-Ldependency=build_deps $NIX_RUST_LINK_FLAGS"
NIX_RUST_LINK_FLAGS="$link_flags"

If there were a way to separate the flags passed to rustc for program vs build script builds, we would be essentially done. However, given that there isn’t and both are invoked from the same cargo build call giving us no time to change the variables around, our trick is to wrap RUSTC and RUSTDOC in a small script. This script does two things. Firstly, it scans the arguments passed in for mentions of the mangling nonce and replaces it with the Nix hash of the derivation — since we have removed all the dependency info from the crate, the Cargo hash is significantly less useful for preventing name clashes than it used to be. Secondly, it checks whether the file being compiled is a build script, and if so adds in the build dependencies.

#!@bash@/bin/bash

source $utils

isBuildScript=
args=("$@")

for i in ${!args[@]}
do
    if [ "x${args[$i]::9}" = "xmetadata=" ]
    then
        args[$i]=metadata=$(crate_hash $out)
    elif [ "x${args[$i]}" = "x--crate-name" ] 
             && [ "x${args[$i+1]::13}" = "xbuild_script_" ]
    then
        isBuildScript=1
    fi
done

if [ "$isBuildScript" ]
then
    depFlags+=" $buildDepFlags $BUILD_RUSTFLAGS"
fi

>&2 echo @cmd@ $depFlags "${args[@]}"
env @cmd@ $depFlags "${args[@]}"

All together, this is sufficient for us to compile the openssl crate with all its dependencies!

openssl = mkRustCrate rec {
  name = "openssl";
  version = "0.10.15";
  src = fetchFromCratesIo {
    inherit name version;
    sha256 = "0fj5g66ibkyb6vfdfjgaypfn45vpj2cdv7d7qpq653sv57glcqri";
  };
  dependencies = [ bitflags cfg-if foreign-types lazy_static libc openssl-sys ];
};

openssl-sys = mkRustCrate rec {
  name = "openssl-sys";
  version = "0.9.39";
  src = fetchFromCratesIo {
    inherit name version;
    sha256 = "1lraqg3xz4jxrc99na17kn6srfhsgnj1yjk29mgsh803w40s2056";
  };
  buildInputs = [ pkgs.openssl pkgs.pkgconfig ];
  dependencies = [ libc ];
  buildDependencies = [ cc pkg-config ];
};

Note particularly that there was no need to specify pkgs.openssl as a buildInput of openssl. Even though it hasn’t been linked into the rlib for openssl-sys, the link flags produced as part of openssl-sys’s dependency info serve to locate the library for the whole dependency subtree. As far as any dependant package is concerned, openssl is a completely pure Rust package, and since the native dependency is handled by Nix, the user should never have to worry about it.

Get Involved

All the code from this blog post series is available on GitHub, and the project will hopefully continue to grow.

Next time: testing, development dependencies, and auto-generating Nix expressions from crates.io metadata.