FreeNAS: insufficient space to install update (how to replace the USB boot device with a larger one)

I have been running a FreeNAS system at home over the last six years on a self-built machine. In 2013 I started with FreeNAS 9.1.0. I updated conservatively over the years without running into problems (this is rare — kudos to the engineering team behind this!). Recently I tried to install one of the last 11.2 patch releases and ran into the following error:

insufficient space to install update

My 4 GB USB thumb drive (which I have been using as the boot device during all these years) got too small.

Here is how to replace the FreeNAS boot device with a device of larger capacity; without downtime. 

Step 1

Plug in an additional USB thumb drive (8 GB capacity in my case).

Step 2

In the web interface, under System -> Boot Environments choose attach and select the device representing the newly plugged-in USB stick (da2, in my case). Select use all disk space. The message attaching device pops up and disappears shortly thereafter. Behind the scenes this adds the new storage device to the existing ZFS pool freenas-boot.

Step 3

Wait for the “re-silvering” to complete: under the hood, ZFS mirrors (copies) all data from the original USB thumb drive to the new device (which is now also part of the ZFS pool freenas-boot). Quote from the ZFS documentation:

The process of moving data from one device to another device is known as resilvering and can be monitored by using the zpool status command.

I monitored the progress with a shell, periodically invoking the command

zpool status freenas-boot

The output of that command is unambiguous, it either says that re-silvering is currently in progress, or not. In my case, the re-silvering took more than 1 hour to complete because the thumb drive I added has some quite slow write performance.

Step 4

Only do this after making sure that re-silvering (step 3) completed :-).

Remove (unplug) the old (small capacity) USB boot device. Verify that the pool is healthy after unplugging: zpool status freenas-boot must show pool state: ONLINE with one device being online. In my case: da2p2 ONLINE. Remember the device name; it is needed in step 6.

Step 5

Set the autoexpand property on the boot pool:

zpool set autoexpand=on freenas-boot

In my case, this command took quite a while to return.

Note: after this command returns zfs list still shows the old (small) capacity for freenas-boot (1.69 G available, in my case).

Step 6

Trigger the automatic expansion:

zpool online -e freenas-boot da2p2

What is this doing? Why does this work? Quote from the ZFS documentation:

You can expand the pool size automatically when a smaller disk is replaced by a larger disk by using the zpool online -e command even if the disk is already online.

Step 7

Validate. zfs list should now show the newly available capacity (in my case: 5.39 G available).

Step 8

The ZFS pool from which the system boots now has more capacity. If you came here because your system update failed with “insufficient space to install update” then you can now retry updating FreeNAS.

If you are using a USB thumb drive as slow as mine then the update procedure can easily take 2-3 hours. During that time I used iostat -x da2 1 (executed in a remote shell, on the FreeNAS system) to confirm that data is actually being copied around (as opposed to the update procedure hanging indefinitely as of an error).

Final words

You can also use this technique to simply mirror your boot device; to operate the freenas-boot pool from two USB sticks. For enhanced fault tolerance of your FreeNAS setup. That is advisable and indeed what I did after switching to a larger pool size.

Command line: extract all zip files into individual directories

I have been using Linux desktop environments professionally for the last 10 years. There is a lot to not like about them. For example, the lack of a really good graphical archive extraction helper, integrated with the graphical file manager.

On a fresh Windows system one of the first applications I typically install is 7zip. It adds convenient entries to the context menu of archive files. For example, it allows selecting multiple archive files at once, offering a 7-Zip - Extract To "*\" in the context menu (found an example screenshot here). That will extract each selected archive file into an individual subdirectory (with the subdirectory’s name being the base name of the archive file w/o file extension). That can be very useful. In a number of attempts, I could not find something similar for a modern Gnome desktop environment (let me know if you know of a reliable solution that is well-integrated with one of the popular graphical file managers).

Not such a big deal, of course, the same can be achieved from the terminal. This is the one-liner I have been using for that for a couple of years. I usually look it up from my shell command history:

find -name '*.zip' -exec sh -c 'unzip -d "${1%.*}" "$1"' _ {} \;

This extracts all zip files in the current directory into individual sub-directories.

If you do not want to extract all zip files in the current directory but only a selection thereof then adjust the command (well, see, this is where the GUI-based solution I referred to above is actually quite useful).

Running an eBPF program may require lifting the kernel lockdown

Update Sep 28: discussion on Hacker News
Update Sep 30: kernel lockdown merged into mainline kernel

A couple of days ago I wanted to try out the hot eBPF things using the BPF Compiler Collection (BCC) on my Fedora 30 desktop system, with Linux kernel 5.2.15. I could not load eBPF programs into the kernel: strace revealed that the bpf() system call failed with EPERM:

bpf(BPF_PROG_LOAD,{prog_type=[...]}, 112) = -1 EPERM
(Operation not permitted)

So, a lack of privileges. Why? I tried …

  • running natively as root instead of in a sudo environment.
  • disabling SELinux completely (instead of running in permissive mode).
  • following setuid-related hints.
  • building BCC from source to make it more likely that it correctly integrates with my system.
  • consulting BCC maintainers via GitHub.

No obvious solution, still EPERM.

I jumped on a few more discussions on GitHub and got a hint from GitHub user deg00 (thank you, anonymous person with no GitHub activity and a picture of a snail!). She wrote: “For Fedora 30, the problem is not selinux but kernel-lockdown”.

I did not know what kernel lockdown is, but I wondered how to disable it. I found the following resources useful:

Temporarily disabling kernel lockdown solved the problem

In the resources linked above, we find that there is a so-called sysrq mechanism that can influence kernel behavior. When configured with a 1 in /proc/sys/kernel/sysrq it has the widest set of privileges, including the privilege to lift the kernel lockdown. Sending an x into /proc/sysrq-trigger then actually uses the sysrq mechanism to lift the kernel lockdown.

That indeed worked for me. The following snippet shows the original symptom, despite running as root:

[root@jpx1carb jp]# python3 /usr/share/bcc/examples/hello_world.py 
bpf: Failed to load program: Operation not permitted
 
Traceback (most recent call last):
  File "/usr/share/bcc/examples/hello_world.py", line 12, in 
    BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()
  File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 344, in __init__
    self._trace_autoload()
  File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 1090, in _trace_autoload
    fn = self.load_func(func_name, BPF.KPROBE)
  File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 380, in load_func
    raise Exception("Need super-user privileges to run")
Exception: Need super-user privileges to run

The last error message “Need super-user privileges to run” is misleading. The “Operation not permitted” error further above corresponds to the EPERM shown in the strace output above.

This lifts the kernel lockdown via the sysrq mechanism, as discussed:

[root@jpx1carb jp]# echo 1 > /proc/sys/kernel/sysrq
[root@jpx1carb jp]# echo x > /proc/sysrq-trigger

Now BCC’s hello world example runs fine:

[root@jpx1carb jp]# python3 /usr/share/bcc/examples/hello_world.py 
b'     gnome-shell-3215  [005] .... 58317.922716: 0: Hello, World!'
b'   Socket Thread-26509 [001] .... 58322.093849: 0: Hello, World!'
b'     gnome-shell-3215  [003] .... 58322.923562: 0: Hello, World!'
[...]

Cool, stuff works.

What the heck just happened? I did not understand a thing and correspondingly started to read a bit about these new shiny topics.

What is the “kernel lockdown”?

Most importantly the concept of the “kernel lockdown” seems to still be evolving.

The the mission statement behind the kernel lockdown is hard to put into words without stepping onto anyone’s toes. This is how RedHat worded the goal in 2017:

The kernel lockdown feature is designed to prevent both direct and indirect access to a running kernel image, attempting to protect against unauthorised modification of the kernel image and to prevent access to security and cryptographic data located in kernel memory, whilst still permitting driver modules to be loaded.

However, that goal was and seems to still be subject to a technical as well as a political debate in the Linux ecosystem: In 2018, Zack Brown from LINUX Journal published a well-researched and quite entertaining article summarizing the heated discussion about the initial set of lockdown patches. If you would like to try to understand what kernel lockdown is (or tries to be) then that article is worth reading. A quote from the article’s last few paragraphs:

This type of discussion is unusual for kernel development, but not for this particular type of patch. The attempts to slip code into the kernel that will enable a vendor to lock users out of controlling their own systems always tend to involve the two sides completely talking past each other. Linus and Andy were unable to get Matthew to address the issue the way they wanted, and Matthew was unable to convince Linus and Andy that his technical explanations were genuine and legitimate.

Also, Jonathan Corbet’s LWN article titled Kernel lockdown in 4.17? from April 2018 is worth a read.

And how do I know if my kernel is locked down? dmesg!

Here’s some dmesg output from my system. It is quite revealing, almost prose:

[    0.000000] Kernel is locked down from EFI secure boot; see man kernel_lockdown.7
[...]
[    2.198433] Lockdown: systemd: BPF is restricted; see man kernel_lockdown.7
[...]
[58310.913828] Lifting lockdown

First, as you can see, the kernel told me exactly that it is “locked down” (even providing the reason: because EFI secure boot is enabled on my system).

Secondly, it was kind enough to say that this affects (e)BPF things! Maybe I should read the kernel messages more often :-).

Thirdly, after quite a bit of system uptime, the “Lifting lockdown” was emitted in response to, well, me lifting the lockdown with the above-mentioned sysrq mechanism.

That is, if you wonder if and how this affects your system, try doing a dmesg | grep lockdown !

The kernel acting “differently depending on some really esoteric detail in how it was booted”…?

When I approached the BCC maintainers about the EPERM error on Fedora 30 they first responded with (basically) “it’s working for me”. Someone actually booted a VM with a fresh Fedora 30 install. And they were unable to reproduce. How can that be? The difference was whether secure boot was enabled or not: it was for my actual desktop machine, but not for their VM setup. That is quite a lesson learned, and maybe an important take-home message from this blog post.

This annoying debugging experience was predicted by Linus Torvalds. A quote from one of his initial reviews of the kernel lockdown patches (source, April 2018):

I do not want my kernel to act differently depending on some really esoteric detail in how it was booted. That is fundamentally wrong. […] Is that really so hard to understand? […] Look at it this way: maybe lockdown breaks some application because that app does something odd. I get a report of that happening, and it so happens that the reporter is running the same distro I am, so I try it with his exact kernel configuration, and it works for me. […] It is *entirely* non-obvious that the reporter happened to run a distro kernel that had secure boot enabled, and I obviously do not.

Well, he was right.

Which kernel versions have the lockdown feature built-in?

Lockdown did not yet land in the mainline kernel. My Fedora 30 with kernel 5.2.15 is affected (with a specific variant of the lockdown patches, not necessarily the final thing!) because RedHat has chosen to build the lockdown patches into recent Fedora kernels, to try it out in the wild.

Will it land in the mainline kernel? When? And how will it behave, exactly? Just a couple of days ago Phoronix published an interesting article, titled Kernel Lockdown Feature Will Try To Land For Linux 5.4. Quote:

After going through 40+ rounds of revisions and review, the Linux kernel “LOCKDOWN” feature might finally make it into the Linux 5.4 mainline kernel.

While not yet acted upon by Linus Torvalds with the Linux 5.4 merge window not opening until next week, James Morris has submitted a pull request introducing the kernel lockdown mode for Linux 5.4.

The kernel lockdown support was previously rejected from mainline but since then it’s been separated from the EFI Secure Boot code as well as being implemented as a Linux security module (LSM) to address some of the earlier concerns over the code. There’s also been other improvements to the design of this module.

Various organizations seem to be pushing hard for this feature to land. It is taking long, but convergence around the details seems to take place.

What is the relationship between kernel lockdown and (e)BPF?

I think it is quite fair to ask: does it make sense that all-things-BPF are affected by the kernel lockdown feature? What does lockdown even have to do with eBPF in the first place?

I should say that I am not super qualified to talk about this because I have only researched this topic for about a day now. But I find highly interesting that

  • these questions seemingly have been under active debate since the first lockdown patch proposals
  • these questions seem to still be actively debated!

Andy Lutomirski reviewed in 2018:

“bpf: Restrict kernel image access functions when the kernel is locked
down”: This patch just sucks in general. At the very least, it should
only apply to […] But you should probably just force all eBPF
users through the unprivileged path when locked down instead, since eBPF
is really quite useful even with the stricter verification mode.

This shows that there was some pretty fundamental debate about the relationship between eBPF and kernel lockdown from the start.

I believe that the following quote shows how eBPF can, in general, conflict with the goal(s) of kernel lockdown (commit message of a 2019 version lockdown patch fragment):

From: David Howells <dhowells@redhat.com>

There are some bpf functions can be used to read kernel memory:
bpf_probe_read, bpf_probe_write_user and bpf_trace_printk.  These allow
private keys in kernel memory (e.g. the hibernation image signing key) to
be read by an eBPF program and kernel memory to be altered without
restriction. Disable them if the kernel has been locked down in
confidentiality mode.

Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Matthew Garrett <mjg59@google.com>
cc: netdev@vger.kernel.org
cc: Chun-Yi Lee <jlee@suse.com>
cc: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>

This commit message rather convincingly justifies that something needs to be done about eBPF when the kernel is locked down (so that the goals of the lockdown do not get undermined!). However, it is not entirely clear what exactly should be done, how exactly eBPF is supposed to be affected, how its inner workings and aspects are to be confined when the kernel is locked down: what follows is a reply from Andy Lutomirski to the above’s commit message: “:) This is yet another reason to get the new improved bpf_probe_user_read stuff landed!

And indeed, only last month (August 2019) Andy published a work-in-progress patch set titled bpf: A bit of progress toward unprivileged use.

What I learned is that with the current Fedora 30 and its 5.2.x kernel I neither see the “final” lockdown feature nor the “final” relationship between lockdown and eBPF. This is very much work in progress, worse than “cutting edge”: what works today might break tomorrow, with the next kernel update :-)!

By the way, I started to look into eBPF for https://github.com/jgehrcke/goeffel, a tool for measuring the resource utilization of a specific process over time.

Update Sept 30:  lockdown just landed in the mainline kernel, wow! Quote from the commit message, clarifying important topics (such as that lockdown will not be tightly coupled to secure boot):

This is the latest iteration of the kernel lockdown patchset, from
  Matthew Garrett, David Howells and others.
 
  From the original description:
 
    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.
 
    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.
 
  There are two major changes since this was last proposed for mainline:
 
   - Separating lockdown from EFI secure boot. Background discussion is
     covered here: https://lwn.net/Articles/751061/
 
   -  Implementation as an LSM, with a default stackable lockdown LSM
      module. This allows the lockdown feature to be policy-driven,
      rather than encoding an implicit policy within the mechanism.

LaTeX Briefvorlage: 2019 update

Im Jahr 2009 habe ich einen Blog-Post mit einer LaTeX-Briefvorlage für den deutschsprachigen Raum veröffentlicht: https://gehrcke.de/2009/12/latex-briefvorlage/

Eckdaten:

  • Output: PDF-Dokument im A4-Format.
  • Perfekt nach DIN-Regeln gesetzt, inklusive Faltmarken am linken Rand.
  • Professionelle Typographie.

Über die Jahre habe ich dazu sehr viel positives Feedback bekommen (Danke!) und 2013 ein größeres Update vorgenommen.

Auch in den letzten Jahren (2013 bis 2019) habe ich die Briefvorlage für mich selbst sehr viel benutzt und leicht weiterentwickelt (eine wesentliche technische Änderung im Vergleich zur Version von 2009 und 2013 ist der Umstieg von pdflatex nach lualatex für die Kompilierung in ein PDF-Dokument). In diesem Code-Repository findet Ihr nun die aktuelle Variante: https://github.com/jgehrcke/latex-briefvorlage

Ihr könnt die Vorlage gerne für alles Erdenkliche benutzen. Und bitte gebt mir gerne weiterhin Feedback!

Ergebnis: brief.pdf

pyenv has proven itself over the years

Back around 2010 I was manually managing production-critical CPython installations, compiled in special ways. I learned a lot about how to properly isolate Python installations from each other. I also learned how important it is to never confuse a “system Python” (as many Linux distributions have one) with a Python build that you can be in control of. Since then I have never touched a system Python anymore (it is the system’s Python after all, and none of my business).

For developing, testing, and running my software, I needed to manage custom Python installations. That involved building CPython from source, many times. At some point around 2015 I adopted pyenv for doing exactly that. Back then, I also adopted pyenv-virtualenv for managing so-called “virtual environments” derived from the individual Python builds managed by pyenv.

I come to say: this method has been a joy to use over the last years. It has never let me down; it has always provided me with simplicity, reliability, a predictable outcome based on which serious engineering work could be done. I have adopted it for continuous integration pipelines; I use it for building container images, towards more reproducible test environments. It always works.

I want to share the boring stuff I did today with pyenv/pyenv-virtualenv, as I have done it for years, always with a smile, thinking and appreciating “that just worked as expected”.

Today, for a program I am developing right now, I wanted to set up a CPython 3.6.9 build, with a virtual environment derived from it to install specific dependencies into it. I started with

cd ~/.pyenv && git pull

for updating my local pyenv installation to learn about current Python releases (I have been using the same pyenv installation for years, migrated it between development machines). I issued the following command for pyenv to download a CPython 3.6.9 source archive, and to build it:

pyenv install 3.6.9

I found that the resulting build didn’t provide the lzma module because I didn’t have the corresponding header files available on my Fedora installation. I installed the relevant header files with a

dnf install xz-devel

and simply repeated

pyenv install 3.6.9

This resulted in a CPython build including the lzma module. I then went ahead and derived a virtual environment from that build, for the specific purpose of developing my current pet project:

pyenv virtualenv 3.6.9 venv369-goeffel

I then activated said virtual environment with

pyenv activate venv369-goeffel

and started to use it for specific development work.

Not spectacular, right? That’s it, precisely. The power here comes from the simplicity and reliability. A big thank you goes out to the pyenv maintainers.

I think the following shows that I should start deleting old things, but I think it also shows the use case where pyenv/pyenv-virtualenv really shines:

$ pyenv versions
* system (set by /home/jp/.pyenv/version)
  2.7.15
  2.7.15/envs/gipcdev-2715
  3.5.2
  3.5.2/envs/venv352-analysis-2
  3.5.2/envs/venv352-beets
  3.5.2/envs/venv352-bouncer
  3.5.2/envs/venv352-bouncer19
  3.5.2/envs/venv352-bouncer19-2
  3.5.2/envs/venv352-bouncer-open
  3.5.2/envs/venv352-dcos-commons-simumation
  3.5.2/envs/venv352-jupyter
  3.5.2/envs/venv352-modern-crypto
  3.6.6
  3.6.6/envs/dcos-docker
  3.6.6/envs/dcos-e2e-zk35
  3.6.6/envs/gipc-py-366-gevent14
  3.6.6/envs/messer-hdf5
  3.6.6/envs/pandas-dev-env
  3.6.6/envs/test-gipc-100-release
  3.6.6/envs/test-gipc-101-release
  3.6.6/envs/venv366-bouncer-enterprise
  3.6.6/envs/venv366-bouncer-open
  3.6.6/envs/venv366-dataanalysis
  3.6.9
  bouncer-deptest
  cryptotest
  dataanalysis
  dcos-docker
  dcos-e2e-zk35
  gipcdev-2715
  gipc-py-366-gevent14
  messer-hdf5
  pandas-dev-env
  pypy2.7-5.9.0
  pypy2.7-5.9.0/envs/pypy27-gevent-gipc-dev
  pypy27-gevent-gipc-dev
  pypy3.5-5.9.0
  pypy3.5-5.9.0/envs/pypy35-gevent-gipc-dev
  pypy3.5-6.0.0
  pypy3.5-6.0.0/envs/venvpypy35600-gipc
  pypy35-gevent-gipc-dev
  pypy-5.6.0
  pypy-5.6.0/envs/venvpypy560-gevent120-gipc
  test-gipc-100-release
  test-gipc-101-release
  venv2710-kazoo
  venv2710-pysamltest
  venv2710-samldev
  venv342
  venv342-bouncer
  venv342-bouncer2
  venv342-bouncerdeps
  venv342-bouncerdeps2
  venv342-bouncerold
  venv342-bouncertmep
  venv342-pyoidc
  venv342-samldev
  venv342-test
  venv342-tmp
  venv343-bouncer
  venv343-saml
  venv344-bouncer
  venv344-consensus
  venv344-dev
  venv344-gunicorndev
  venv345-bouncer
  venv345-cryptography-patch
  venv345-kazam
  venv352-analysis-2
  venv352-beets
  venv352-bouncer
  venv352-bouncer19
  venv352-bouncer19-2
  venv352-bouncer-open
  venv352-dcos-commons-simumation
  venv352-jupyter
  venv352-modern-crypto
  venv366-bouncer-enterprise
  venv366-bouncer-open
  venv366-dataanalysis
  venv-dcos-342
  venv-graphtool
  venv-kazam
  venvpy3boto3
  venvpypy35600-gipc
  venvpypy560-gevent120-gipc
  venvtmp