Archives

Categories

Firebuild

After reading Bálint’s blog post about Firebuild (a compile cache) [1] I decided to give it a go. It’s non-free, the project web site [2] says that it’s free for non-commercial use or commercial trials.

My first attempt at building a Debian package failed due to man-recode using a seccomp() sandbox, I filed Debian bug #1032619 [3] about this (thanks for the quick response Bálint). The solution for me was to edit /etc/firebuild.conf and add man-recode to the dont_intercept list. The new version that’s just been uploaded to Debian fixes it by disabling seccomp() and will presumably allow slightly better performance.

Here are the results of building the refpolicy package with Firebuild, a regular build, the first build with Firebuild (30% slower) and a rebuild with Firebuild that reduced the time by almost 42%.

real    1m32.026s
user    4m20.200s
sys     2m33.324s

real    2m4.111s
user    6m31.769s
sys     3m53.681s

real    0m53.632s
user    1m41.334s
sys     3m36.227s

Next I did a test of building a Linux 6.1.10 kernel with “make bzImage -j18“, here are the results from a normal build, first build with firebuild, and second build. The real time is worse with firebuild for this on my machine. I think that the relative speeds of my CPU (reasonably fast 18 core) and storage (two of the slower NVMe devices in a BTRFS RAID-1) is the cause of the first build being relatively so much slower for “make bzImage” than for building the refpolicy, as the kernel build process involves a lot more data. For the final build I moved ~/.cache/firebuild to a tmpfs (I have 128G of RAM and not much running on my machine at the time of the tests), even then building with firebuild was slightly slower in real time but took significantly less CPU time (user+real being 20mins instead of 36m). I also ran several tests with the kernel source tree on a tmpfs but for unknown reasons those tests each took about 6 minutes. Does firebuild or the Linux kernel build process dislike tmpfs for some reason?

real    2m43.020s
user    31m30.551s
sys     5m15.279s

real    8m49.675s
user    64m11.258s
sys     19m39.016s

real    3m6.858s
user    7m47.556s
sys     9m22.513s

real    2m51.910s
user    10m53.870s
sys     9m21.307s

One thing I noticed from the kernel build tests is that the total CPU time taken by the firebuild process (as reported by ps) was more than 2/3 of the run time and top usually reported it as taking around 75% of a CPU core. It seems to me that the firebuild process itself is a bottleneck on build speed. Building refpolicy without firebuild has an average of 4.5 cores in use while building the kernel haas 13.5. Unless they make a multi-threaded version of firebuild it seems that it won’t give the performance one would hope for from a CPU with 18+ cores. I presume that if I had been running with hyper-threading enabled then firebuild would have been even worse for kernel builds as it would sometimes get on the second thread of a core. It looks like firebuild would perform better on AMD CPUs as they tend to have fewer CPU cores with greater average performance per core so a single CPU core for firebuild will be less limited. I presume that the firebuild developers will make it perform better with large numbers of cores in future, the latest Intel laptop CPUs have 16+ cores and servers with 2*40core CPUs are common.

The performance improvement for refpolicy is significant as a portion of build time, but insignificant in terms of real time. A full build of refpolicy doesn’t take enough time to get a Coke and reducing it doesn’t offer a huge benefit, if Firebuild was available in past years when refpolicy took 20 minutes to build (when DDR2 was the best RAM available) then it would be a different story.

There is some potential to optimise the build of refpolicy for the non-firebuild case. Getting it to average more than 4.5 cores in use when there’s 18 available should be possible, there are a number of shell for loops in the main Makefile and maybe some of them can be replaced by make constructs to allow running in parallel. If it used 7 cores on average then it would be faster in a regular build than it currently is with firebuild and a hot cache. Any advice from make experts would be appreciated.

10 comments to Firebuild

  • Josh

    How does ccache fare on the same builds and same machine?

  • I haven’t tried. I just tested Firebuild because it’s a new thing, I guess I should give ccache a go.

  • Thank you for Russell, for testing Firebuild!

    In the original article at https://balintreczey.hu/blog/how-to-speed-up-your-next-build-with-firebuild/ there are a few hints for improving the acceleration.

    There are two major reasons for parts of the builds staying not accelerated.
    The first is dpkg-builddeb’s fakeroot usage. The fakeroot binary communicates with its descendant processes and firebuild disables interception and acceleration for fakeroot and its descendants.
    The second is an easy one, bzip2 changes filestamps and this also disables acceleration:

    Debugging the build with firebuild -d proc reveals that:

    FIREBUILD: Command “/usr/bin/bzip2” can’t be short-cut due to: Changing file timestamps is not supported,

    I used the following patch to fix those issues:

    diff -Nru refpolicy-2.20210203/debian/control refpolicy-2.20210203/debian/control
    — refpolicy-2.20210203/debian/control 2021-11-09 09:42:19.000000000 +0100
    +++ refpolicy-2.20210203/debian/control 2023-03-14 14:21:06.000000000 +0100
    @@ -3,6 +3,7 @@
    VCS-Browser: https://salsa.debian.org/selinux-team/refpolicy
    Priority: optional
    Section: admin
    +Rules-Requires-Root: no
    Homepage: https://github.com/SELinuxProject/refpolicy/releases
    Maintainer: Debian SELinux maintainers
    Uploaders: Russell Coker
    diff -Nru refpolicy-2.20210203/debian/rules refpolicy-2.20210203/debian/rules
    — refpolicy-2.20210203/debian/rules 2021-10-21 05:03:47.000000000 +0200
    +++ refpolicy-2.20210203/debian/rules 2023-03-14 14:21:06.000000000 +0100
    @@ -37,7 +37,7 @@
    dh_compress
    for flavour in $(FLAVOURS) ; do \
    for f in $(CURDIR)/debian/selinux-policy-$$flavour/usr/share/selinux/$$flavour/*.pp ; do \
    – bzip2 -9f $$f; \
    + bzip2 -9f > $$f.bz2 < $$f; \
    done; \
    done

    @@ -164,7 +164,7 @@
    mv modules.conf modules.conf.dist; \
    fi; \
    ln -sf modules.conf.mls modules.conf)
    – install -p -o root -g root -m 644 debian/build.conf.default \
    + install -p -m 644 debian/build.conf.default \
    $(CURDIR)/debian/tmp/etc/selinux/default/src/policy/build.conf
    (cd $(CURDIR)/debian/tmp/etc/selinux/default/src/; mv policy selinux-policy-src; \
    rm -rf selinux-policy-src/support/__pycache__/; \

    An other tweak that can be used is allowing firebuild to cache more commands at the expense of extra cache space.

    Using all of the above I used the following command to measure the acceleration on my Lenovo ThinkPad X1 Nano (i7-1160G7) laptop running Ubuntu 22.04:

    time env PATH=${PATH/ccache/ccache-DISABLED/} firebuild -o ‘processes.skip_cache = []’ dpkg-buildpackage -j8

    The results are the following:

    vanilla:
    real 2m18,460s
    user 9m38,228s
    sys 0m54,848s

    firebuild 1st:
    real 2m50,101s
    user 12m16,437s
    sys 1m17,813s

    firebuild 2nd:
    real 0m31,137s
    user 1m29,773s
    sys 0m5,660s

    The ~77% real time (85% user+sys) improvement is in line with my expectations, since the refpolicy package is not a particularly good target for firebuild with the many quick scripts it executes.

    I also checked the acceleration on my custom-built desktop with AMD Ryzen 9 5900X 12-Core Processor and hyperthreading enabled and with -j18 to stay closer to the original tests:

    real 0m35.161s
    user 2m19.945s
    sys 0m12.397s

    real 0m42.339s
    user 3m2.505s
    sys 0m20.322s

    real 0m15.643s
    user 0m40.775s
    sys 0m4.592s

    Here the 55% (70% user+sys) improvement is noticeably less and it indeed partly comes from firebuild’s supervisor process decreasing parallelization. There are plans improving performance on high core count systems, but our main focus is performing well in CI setups and on laptops with lower core counts.

    Do you see additional improvements with the changes above?

    I’ll look into the Linux builds, too.

  • Regarding the Linux builds I measured much better acceleration on my desktop system (AMD Ryzen 9 5900X 12-Core, Ubuntu 22.04, BTRFS, single fast NVME)

    Linux v6.1.10

    $ make defconfig
    $ git commit .config
    $ git clean -dxf

    time env PATH=${PATH/ccache/ccache-DISABLED/} firebuild -o ‘processes.skip_cache = []’ make bzImage -j18

    real 1m11.329s
    user 16m33.903s
    sys 2m14.370s

    real 1m48.594s
    user 17m43.468s
    sys 3m27.901s

    real 0m9.313s
    user 0m19.433s
    sys 0m7.888s

    That’s 86% (97.5% user+sys) improvement, similar to our previous internal tests on the Linux project.
    Could you please check if you had ccache installed? When it is installed Firebuild lets it work, but it is recommended to disable it.

  • for flavour in $(FLAVOURS) ; do \
    find $(CURDIR)/debian/selinux-policy-$$flavour/usr/share/selinux/$$flavour/ -name “*.pp” | xargs -n1 -P $(NUMJOBS) bzip2 -9 ; \
    done

    Thanks for your informative comments, future comments should be automatically approved now I’ve approved some of them. The above is the latest Makefile snippet for bzip2, I have it run via xargs for best performance on multi-core systems without firebuild etc. Do you have any suggestion for how to parallelise bzip while avoiding the timestamp issue? The 31s run is very impressive.

    I have never had ccache installed. Slow NVMe might be part of my problem, I ordered new NVMe devices just before the prices dropped dramatically :( and didn’t realise the real world impact of different performance levels of NVMe until after I bought them. :( My NVMes are Crucial P3, what is your “fast NVMe”?

  • for flavour in $(FLAVOURS) ; do \
    find $(CURDIR)/debian/selinux-policy-$$flavour/usr/share/selinux/$$flavour/ -name “*.pp” | xargs -n1 -P $(NUMJOBS) -I STR sh -c “bzip2 -9 < STR > STR.bz2 && rm STR” ; \
    done

    I’ve just done a test with the above in debian/rules and got the following results for non-firebuild, first-firebuild, and cached-firebuils.

    real 1m29.377s
    user 4m15.289s
    sys 2m25.317s

    real 2m4.733s
    user 6m16.949s
    sys 4m2.369s

    real 0m56.884s
    user 1m28.463s
    sys 3m47.583s

    Running it with “firebuild -d proc” shows that “sh” is in skip_cache. As an experiment I commented out the “sh” entry and got the following result:

    real 0m47.951s
    user 0m50.577s
    sys 2m48.826s

    Not ideal I guess and I’m probably losing some time by having it try to cache shell commands that can’t be cached. Is there a better way of running xargs?

    Also in Testing the refpolicy package has the Rules-Requires-Root change already.

  • I’ve just checked and my Sabrent Rocket 4.0 2TB is not that faster than the Crucial P3, thus it can hardly be a major issue.

    I have now checked a few things in a clean bookworm vm started with:

    lxc launch images:debian/bookworm –vm -c limits.cpu=18 -c limits.memory=16GB on my 5900X

    I checked the impact of caching/not caching “sh” or everything, cache on tmpfs and enabling SELinux.

    Firebuild’s default configuration gave the best results in wall clock time with ~61% improvement.
    I used the bzip2 snippet patch on bookworm’s refpolicy package.

    vanilla:

    real 0m26.634s
    user 1m43.924s
    sys 0m15.995s

    with firebuild default configuration:

    real 0m30.154s
    user 2m29.635s
    sys 0m27.870s

    # firebuild -s
    Statistics of stored cache:
    Hits: 20 / 8012 (0.25 %)
    Misses: 7992
    Uncacheable: 13686
    GC runs: 0
    Cache size: 582.58 MB
    Saved CPU time: -7.92 seconds

    real 0m10.381s
    user 0m32.788s
    sys 0m8.458s
    # firebuild -s
    Statistics of stored cache:
    Hits: 4919 / 14405 (34.15 %)
    Misses: 9486
    Uncacheable: 27370
    GC runs: 0
    Cache size: 583.68 MB
    Saved CPU time: 1.90 minutes

    Caching “sh”, too, with firebuild -o ‘processes.skip_cache -= “sh”‘:
    real 0m30.145s
    user 2m28.941s
    sys 0m27.501s
    firebuild -s
    Statistics of stored cache:
    Hits: 2073 / 13873 (14.94 %)
    Misses: 11800
    Uncacheable: 7806
    GC runs: 0
    Cache size: 605.44 MB

    real 0m11.277s
    user 0m36.016s
    sys 0m7.785s
    # firebuild -s
    Statistics of stored cache:
    Hits: 9200 / 22612 (40.69 %)
    Misses: 13412
    Uncacheable: 15509
    GC runs: 0
    Cache size: 606.69 MB
    Saved CPU time: 1.82 minutes

    Caching every possible command with firebuild -o ‘processes.skip_cache = []’:

    real 0m31.002s
    user 2m28.602s
    sys 0m28.399s
    # firebuild -s
    Statistics of stored cache:
    Hits: 2275 / 21644 (10.51 %)
    Misses: 19369
    Uncacheable: 35
    GC runs: 0
    Cache size: 615.32 MB
    Saved CPU time: -8.90 seconds

    real 0m13.831s
    user 0m37.003s
    sys 0m8.886s
    # firebuild -s
    Statistics of stored cache:
    Hits: 14079 / 39668 (35.49 %)
    Misses: 25589
    Uncacheable: 70
    GC runs: 0
    Cache size: 617.97 MB
    Saved CPU time: 1.76 minutes

    default firebuild configuration with cache on tmpfs:

    real 0m28.894s
    user 2m28.857s
    sys 0m27.018s

    real 0m11.821s
    user 0m38.088s
    sys 0m9.011s

    default firebuild configuration with SELinux enabled:

    real 0m30.561s
    user 2m32.238s
    sys 0m34.916s

    real 0m10.920s
    user 0m37.020s
    sys 0m15.039s

    There is something oddly inefficient affecting sys CPU time on the system you tested on. Since I can’t reproduce that configuration could you please give firebuild a try in a clean VM, to narrow down the difference to the HW?

  • Oh, please use firebuild from unstable, it accelerates man and man-recode now. Please also enable accelerating man-recode if the upgrade does not fix /etc/firebuild.conf.

  • Also please check the linux build, too, in the VM, since I measured way better acceleration there, too.