Linux – Jan-Philip Gehrcke, PhD

gnome: minimize all other windows

Jan-Philip Gehrcke — Mon, 26 Feb 2024 09:49:46 +0000

A quick solution for minimizing all windows but the one that is currently in foreground. Tested with GNOME Shell 44.9, Fedora 38.

Install xdotool:

sudo dnf install xdotool

Create executable shell script:

#!/bin/bash
currentwindowid=$(xdotool getactivewindow)
currentdesktopid=$(xdotool get_desktop)

for w in $(xdotool search --all --maxdepth 3 --desktop $currentdesktopid --name ".*"); do
  if [ $w -ne $currentwindowid ] ; then
    xdotool windowminimize "$w"
  fi
done

Create custom shortcut, example:

Kudos to:

rsync command for copying/archiving files with reasonable progress report

Jan-Philip Gehrcke — Sat, 29 Apr 2023 11:26:11 +0000

I’d like to blog more about those powerful commands that we carefully craft in our day-to-day.

For a file transfer I just ran

$ rsync -a --info=progress2 source dest
 17,310,485,259  98%   49.04MB/s    0:05:36 (xfr#87446, to-chk=0/99316)

As you can see in the output, it transferred 99316 files (~17 GB of data) within ~5 minutes.

Re-running the same command immediately again compared the same 99316 files via metadata in a matter of ~2 seconds:

$ time rsync -a --info=progress2 source dest
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/99316)   
0.776u 2.353s 0:02.49 125.3%    422+224k 5+0io 3pf+0w

This was with rsync version 3.2.5 protocol version 31.

Another example with --stats to get some aggregate numbers upon command completion:

$ rsync -a --info=progress2 --stats . /mnt/pool2022/jpghome/backup/from-nas-old-pool/audio/last-state
 48,401,206,545  76%  103.85MB/s    0:07:24 (xfr#4887, to-chk=0/7692)   

Number of files: 7,692 (reg: 7,101, dir: 591)
Number of created files: 5,202 (reg: 4,887, dir: 315)
Number of deleted files: 0
Number of regular files transferred: 4,887
Total file size: 63,275,198,202 bytes
Total transferred file size: 48,401,206,545 bytes
Literal data: 48,401,206,545 bytes
Matched data: 0 bytes
File list size: 218,538
File list generation time: 0.019 seconds
File list transfer time: 0.000 seconds
Total bytes sent: 48,413,582,133
Total bytes received: 97,026

sent 48,413,582,133 bytes  received 97,026 bytes  108,429,292.63 bytes/sec
total size is 63,275,198,202  speedup is 1.31

What are `-a` and `--info=progress2` doing?

There’s quite a bit of specific functionality defined by -a and --info=progress2. -a is the archive mode which really is representing a specific permutation of options:

archive mode is -rlptgoD (no -A,-X,-U,-N,-H)

The full specification for --info isn’t even in the man page, it’s provided by running rsync --info=help:

± rsync --info=help
Use OPT or OPT1 for level 1 output, OPT2 for level 2, etc.; OPT0 silences.

BACKUP     Mention files backed up
COPY       Mention files copied locally on the receiving side
DEL        Mention deletions on the receiving side
FLIST      Mention file-list receiving/sending (levels 1-2)
MISC       Mention miscellaneous information (levels 1-2)
MOUNT      Mention mounts that were found or skipped
NAME       Mention 1) updated file/dir names, 2) unchanged names
NONREG     Mention skipped non-regular files (default 1, 0 disables)
PROGRESS   Mention 1) per-file progress or 2) total transfer progress
REMOVE     Mention files removed on the sending side
SKIP       Mention files skipped due to transfer overrides (levels 1-2)
STATS      Mention statistics at end of run (levels 1-3)
SYMSAFE    Mention symlinks that are unsafe

ALL        Set all --info options (e.g. all4)
NONE       Silence all --info options (same as all0)
HELP       Output this help message

Options added at each level of verbosity:
0) NONREG
1) COPY,DEL,FLIST,MISC,NAME,STATS,SYMSAFE
2) BACKUP,MISC2,MOUNT,NAME2,REMOVE,SKIP

rsync flags are valuable read about and to choose carefully. In 2014, I started building a backup solution for myself involving the following powerful flags — they haven’t let me down until today:

        rsync_cmd = [
            "/usr/local/bin/rsync",
            "--archive",
            "--verbose",
            "--hard-links",
            "--delete",
            "--fuzzy",
            "--stats",
            self._source_dir,
            self._target_dir,
            ]

Note that sometimes you also find --flags=name0 being recommended. This makes sense combination with -v (verbose). Then, --flags=name0 prevents each individual file name from being emitted to the terminal.

How to create a bootable USB thumb drive for Lenovo Thinkpad BIOS updates

Jan-Philip Gehrcke — Fri, 20 May 2022 12:46:13 +0000

This assumes a local file such as n23ur33w.iso downloaded from the Lenovo support website (e.g. from https://support.lenovo.com/fr/en/downloads/DS502281).

Use geteltorito to extract the relevant boot image from the ISO file (following the “El Torito” standard):

$ sudo dnf install geteltorito
...
$ geteltorito -o lenovo-bios.img n23ur33w.iso 
Booting catalog starts at sector: 20 
Manufacturer of CD: NERO BURNING ROM VER 12
Image architecture: x86
Boot media type is: harddisk
El Torito image starts at sector 27 and has 67584 sector(s) of 512 Bytes

Image has been written to file "lenovo-bios.img".

Write the data as-is onto the thumb drive (in my case the thumb drive is /dev/sda, as discovered with fdisk -l):

$ sudo dd if=lenovo-bios.img of=/dev/sda
67584+0 records in
67584+0 records out
34603008 bytes (35 MB, 33 MiB) copied, 3.7256 s, 9.3 MB/s

The result is a bootable thumb drive via which you can perform the desired update operation.

At the time of writing, popular tools such as rufus do not offer a one-click solution for creating a bootable thumb drive straight from the ISO file offered by Lenovo. Context and discussion can be found at https://github.com/pbatard/rufus/issues/63.

Related references:

Fedora: reinstall kernel and update GRUB config

Jan-Philip Gehrcke — Fri, 22 Oct 2021 12:57:29 +0000

This is to show how dnf reinstall can be used to rewrite the GRUB config to boot a specific kernel on Fedora.

My Fedora 34 ended up in a bit of a confused state. From DNF’s point of view (only) kernel 5.14.13 was installed:

$ dnf list kernel
Last metadata expiration check: 0:00:57 ago on Fri 22 Oct 2021 02:41:15 PM CEST.
Installed Packages
kernel.x86_64 5.14.13-200.fc34 @updates

But grubby showed that 5.14.9 was configured for boot:

grubby --default-kernel
/boot/vmlinuz-5.14.9-200.fc34.x86_64

This reinstall command set things straight again:

dnf reinstall kernel-core-5.14.13-200.fc34.x86_64

Outcome:

$ grubby --default-kernel
/boot/vmlinuz-5.14.13-200.fc34.x86_64

How to run minikube on Fedora (30)

Jan-Philip Gehrcke — Thu, 06 Feb 2020 14:53:43 +0000

Today I ran minikube on my notebook running Fedora 30. What follows is a list of steps to do that.

First, install KVM/libvirt and periphery in one go:

sudo dnf install @virtualization
sudo systemctl start libvirtd
sudo systemctl enable libvirtd

Do a bit of verification:

lsmod | grep kvm
kvm_intel             303104  0
kvm                   782336  1 kvm_intel
irqbypass              16384  1 kvm

Set up the most recent minikube release:

curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo mkdir -p /usr/local/bin/
sudo install minikube /usr/local/bin/

Start minikube using the kvm2 driver:

$ minikube start --vm-driver=kvm2
  minikube v1.7.1 on Fedora 30
  Using the kvm2 driver based on user configuration
  Downloading driver docker-machine-driver-kvm2:
    > docker-machine-driver-kvm2.sha256: 65 B / 65 B [-------] 100.00% ? p/s 0s
    > docker-machine-driver-kvm2: 13.82 MiB / 13.82 MiB  100.00% 1.19 MiB p/s 1
  Downloading VM boot image ...
    > minikube-v1.7.0.iso.sha256: 65 B / 65 B [--------------] 100.00% ? p/s 0s
    > minikube-v1.7.0.iso: 166.68 MiB / 166.68 MiB [-] 100.00% 9.21 MiB p/s 19s
  Creating kvm2 VM (CPUs=2, Memory=2000MB, Disk=20000MB) ...
  Preparing Kubernetes v1.17.2 on Docker '19.03.5' ...
  Downloading kubeadm v1.17.2
  Downloading kubelet v1.17.2
  Downloading kubectl v1.17.2
  Pulling images ...
  Launching Kubernetes ... 
  Enabling addons: default-storageclass, storage-provisioner
  Waiting for cluster to come online ...
  Done! kubectl is now configured to use "minikube"
  For best results, install kubectl: https://kubernetes.io/docs/tasks/tools/install-kubectl/

Process Google Sheets data in Python with Pandas (example: running distance over time)

Jan-Philip Gehrcke — Sun, 12 Jan 2020 21:41:25 +0000

Every now and then I do a little bit of data mangling for personal use using tools that I got to appreciate, mainly during professional work.

In this blog post I would like to share an example for simple and yet powerful end-to-end data processing using Google Sheets, Python, and Pandas (and I have to notice: my first blog post about data analysis with Pandas is already more than six years old, time flies by).

Raw data in a spreadsheet: a date and a distance for every run

I have a chaotic spreadsheet in Google Sheets where I keep track of my runs. It is easy to edit from the smartphone, and synchronized across devices.

This is a screenshot of a part of that spreadsheet (hiding some irrelevant columns, showing only a small fraction of the rows):

Each row in the date column refers to a specific day using the text format YYYY-MM-DD. Some rows in the km column contain numerical values. Each of these means that I have made a run of the distance given by the value, in kilometers, on the corresponding day. Missing values in the km column mean that on those days I did not do a run (or forgot to keep track of it). In the screenshot, it looks like days in the date column are represented without gaps, but that is not important.

Now you know about the kind of data that I have been entering manually for about half a year, about my runs.

Analysis goal

My goal today was to do a tiny bit of data analysis — for myself — and to then write this blog post, for you :-).

In terms of data analysis, my goal was to look into the evolution of my running performance over time. I wanted to start by looking at “running distance per week”. Specifically, my goal was to perform a rolling window analysis with a window width wider than a week (to focus on changes on longer time scales more than on fast fluctuations), and to plot the outcome.

So I built a small tool using Python and Pandas which

automatically fetches the current dataset from Google Sheets (as a CSV document)
processes and prepares the data (removing dirt, filling gaps, …)
performs statistical analyses
creates a plot (and writes a PNG graphics file)

Results first

Here is some code: https://github.com/jgehrcke/runni/blob/master/runni.py

Here is how I use it, and how you can use it, too:

# Get the code.
$ git clone https://github.com/jgehrcke/runni && cd runni
 
# Enable link sharing to the Google Sheet (so that anyone
# with the link can access the sheet). Get the corresponding
# ID/key from the URL, and set it as an environment variable
# (it's sensitive data).
$ export RUNNI_GSHEET_KEY='[snip]'
 
# Dependencies: Python 3, pandas, matplotlib, requests
 
# Run the analysis program.
$ python runni.py
...
200112-19:57:39.060 INFO: Writing PNG figure to 2020-01-12_running-distance-per-week-over-time.png

The resulting plot:

For each day in the data interval, a small gray data point explicitly shows the distance I ran on that very day (on most of the days this is 0 km). In the majority of the cases, every non-zero gray data point corresponds to a single run (the only run on the corresponding day), but more generally it is the distance sum of all runs on that very day.

The thick black line is the “distance per week”, derived from a rolling window analysis with a window width of 14 days.

in-Pandas data processing in more detail (the non-trivial part)

The following code block (from here) with its code comments shows the core of the in-Pandas data processing and is the main reason for why I write this blog post: I think this is a non-trivial part. In previous projects (goeffel, bouncer-log-analysis, dcos-dev-prod-analysis) I actually put a bit of thought into how to do a meaningful rolling window analysis with Pandas, and here I am simply re-using what I learned before. But some of that it is still non-trivial and deserves an explanation.

The code comments are supposed to provide a hopefully helpful level of explanation. From the comments, it should at least be obvious that several decisions need to be made before the data in the spreadsheet can be analyzed in a meaningful way in a rolling/sliding window analysis.

# Keep only those rows that have a value set in the `km` column. That is
# the criterion for having made a run on the corresponding day.
df = df[df.km.notnull()]
 
# Parse text in `date` column into a pd.Series of `datetime64` type values
# and make this series be the new index of the data frame.
df.index = pd.to_datetime(df["date"])
 
# Sort data frame by index (sort from past to future).
df = df.sort_index()
 
# Turn the data frame into a `pd.Series` object, representing the distance
# ran over time. Every row / data point in this series represents a run:
# the index field is the date (day) of the run and the value is the
# distance of the run.
km_per_run = df["km"]
 
# There may have been more than one run per day. In these cases, sum up the
# distances and have a single row represent all runs of the day.
 
# Example: two runs on 07-11:
# 2019-07-10    3.2
# 2019-07-11    4.5
# 2019-07-11    5.4
# 2019-07-17    4.5
 
# Group events per day and sum up the run distance:
km_per_run = km_per_run.groupby(km_per_run.index).sum()
 
# Outcome for above's example:
# 2019-07-10    3.2
# 2019-07-11    9.9
# 2019-07-17    4.5
 
# The time series index is expected to have gaps: days on which no run was
# recorded. Up-sample the time index to fill these gaps, with 1 day
# resolution. Fill the missing values with zeros. This is not strictly
# necessary for the subsequent analysis but makes the series easier to
# reason about, and makes the rolling window analysis a little simpler: it
# will contain one data point per day, precisely, within the represented
# time interval.
#
# Before:
#   In [28]: len(km_per_run)
#   Out[28]: 75
#
#   In[27]: km_per_run.head()
#   Out[27]:
#   2019-05-27    2.7
#   2019-06-06    2.9
#   2019-06-11    4.6
#   ...
#
# After:
#   In [30]: len(km_per_run)
#   Out[30]: 229
#
#   In [31]: km_per_run.head()
#   Out[31]:
#   2019-05-27    2.7
#   2019-05-28    0.0
#   2019-05-29    0.0
#   2019-05-30    0.0
#   ...
#
km_per_run = km_per_run.asfreq("1D", fill_value=0)
 
# Should be >= 7 to be meaningful.
window_width_days = opts.window_width_days
window = km_per_run.rolling(window="%sD" % window_width_days)
 
# For each window position get the sum of distances. For normalization,
# divide this by the window width (in days) to get values of the unit
# km/day -- and then convert to the new desired unit of km/week with an
# additional factor of 7.
km_per_week = window.sum() / (window_width_days / 7.0)
 
# During the rolling window analysis the value derived from the current
# window position is assigned to the right window boundary (i.e. to the
# newest timestamp in the window). For presentation it is more convenient
# and intuitive to have it assigned to the temporal center of the time
# window. Invoking `rolling(..., center=True)` however yields
# `NotImplementedError: center is not implemented for datetimelike and
# offset based windows`. As a workaround, shift the data by half the window
# size to 'the left': shift the timestamp index by a constant / offset.
offset = pd.DateOffset(days=window_width_days / 2.0)
km_per_week.index = km_per_week.index - offset

Closing remarks

What is shown above is I think a well-confined, simple example for real-world data analysis. I like the architecture of maintaining raw data in Google Sheets to then consume it via HTTP for analysis using proper tooling (the data analysis and plotting options within Google Sheets are very limited, heck). With that example, I hope I can inspire some of you people out there to do similar things. There are endless possibilities: in my spreadsheet, I have other columns such as the run duration, … :-).
I will keep adding data points to my spreadsheet after about every run and will keep re-generating the graph more or less regularly for my entertainment.
The meaning and impact of the window width in the rolling window analysis are critical. I have not explained that above. I think one of the best ways to grasp it is to visually play with it — that’s what the --window-width-days argument can help with.
Again, you can find the code for inspiration here: https://github.com/jgehrcke/runni

Setting und using variables in a Makefile: pitfalls

Jan-Philip Gehrcke — Fri, 06 Dec 2019 12:52:44 +0000

A debug session. I ran into a problem in CI where accessing /var/run/docker.sock from within a container failed with a permission error. I had this specific part working before. So I did a bit of bisecting and added debug output and found the critical difference. In this case I get a permission error (EACCES):

uid=2000(buildkite-agent) gid=0(root) groups=0(root)

In this case not:

uid=2000(buildkite-agent) gid=1001 groups=1001

The difference is in the unix group membership specifics of unix user uid=2000(buildkite-agent) within the specific container. If the user is member of gid=1001 then access is allowed, if the user is member of gid=0 then access is denied.

I found that in the erroneous case I was passing this command line argument to docker run ...:

-u 2000:

Nothing after the colon. This is where the group ID (gid) belongs. docker run ... did not error out. That is no good. This input was treated the same as -u 2000 or -u 2000:0.

But why was there no gid after the colon when I have this in my Makefile?

-u $(shell id -u):${DOCKER_GID_HOST}

Because when this line was run DOCKER_GID_HOST (a Make variable, not an environment variable) was actually not set.

A shell program would catch this case (variable used, but not set) when being used with the -o nounset option, and error out. Make does not have this kind of protection. There is no general protection against using variables that are not set. As far as I know!

Okay, but why was the variable DOCKER_GID_HOST not set when I have

DOCKER_GID_HOST := $(shell getent group docker | awk -F: '{print $$3}')

right before executing docker run? Well, because this is a Makefile. Where you cannot set a variable in one line of a recipe, and use it in the next one (a pattern that we use in almost every other programming environment).

The lines of a recipe are executed in independent shells. The next line’s state is very much decoupled from the previous line’s state, that’s how Makefiles work.

This is probably the most important thing to know about Make, and one of the most common mistakes, and I certainly knew this before, and I certainly made this same mistake before. And I made it again, like probably every single time that I had to do things with Make.

Makefiles are flexible and great, and sometimes they make you waste a great deal of time compared to other development environments, man.

Stack Overflow threads on the matter, with goodies like workarounds, best practices, and generally helpful discussion:

Command line: extract all zip files into individual directories

Jan-Philip Gehrcke — Sun, 13 Oct 2019 17:23:11 +0000

I have been professionally using Linux desktop environments for the last 10 years. They all have a number of deficiencies that get in the way of productivity. The lack of a decent GUI archive extraction helper, integrated with the graphical file manager, is just one tiny aspect.

On a fresh Windows system one of the first applications I typically install is 7zip. It adds convenient entries to the context menu of archive files. For example, it allows for selecting multiple archive files at once, offering a 7-Zip - Extract To "*\" in the context menu (found an example screenshot here). That will extract each selected archive file into an individual sub-directory (with the sub-directory’s name being the base name of the archive file w/o file extension). Quite useful!

I have looked a couple of times over the years, but I never found a similar thing for a modern Gnome desktop environment. Let me know if you know of a reliable solution that is well-integrated with one of the popular graphical file managers such as Thunar.

The same can of course be achieved from the terminal. What follows is the one-liner I have been using for the past couple of years. I usually look it up from my shell command history:

find -name '*.zip' -exec sh -c 'unzip -d "${1%.*}" "$1"' _ {} \;

This extracts all zip files in the current directory into individual sub-directories.

If you do not want to extract all zip files in the current directory but only a selection thereof then adjust the command (well, this is where a GUI-based solution would actually be quite handy, no?).

Running an eBPF program may require lifting the kernel lockdown

Jan-Philip Gehrcke — Thu, 26 Sep 2019 15:34:36 +0000

Update Sep 28: discussion on Hacker News
Update Sep 30: kernel lockdown merged into mainline kernel

A couple of days ago I wanted to try out the hot eBPF things using the BPF Compiler Collection (BCC) on my Fedora 30 desktop system, with Linux kernel 5.2.15. I could not load eBPF programs into the kernel: strace revealed that the bpf() system call failed with EPERM:

bpf(BPF_PROG_LOAD,{prog_type=[...]}, 112) = -1 EPERM
(Operation not permitted)

So, a lack of privileges. Why? I tried …

running natively as root instead of in a sudo environment.
disabling SELinux completely (instead of running in permissive mode).
following setuid-related hints.
building BCC from source to make it more likely that it correctly integrates with my system.
consulting BCC maintainers via GitHub.

No obvious solution, still EPERM.

I jumped on a few more discussions on GitHub and got a hint from GitHub user deg00 (thank you, anonymous person with no GitHub activity and a picture of a snail!). She wrote: “For Fedora 30, the problem is not selinux but kernel-lockdown”.

I did not know what kernel lockdown is, but I wondered how to disable it. I found the following resources useful:

Temporarily disabling kernel lockdown solved the problem

In the resources linked above, we find that there is a so-called sysrq mechanism that can influence kernel behavior. When configured with a 1 in /proc/sys/kernel/sysrq it has the widest set of privileges, including the privilege to lift the kernel lockdown. Sending an x into /proc/sysrq-trigger then actually uses the sysrq mechanism to lift the kernel lockdown.

That indeed worked for me. The following snippet shows the original symptom, despite running as root:

[root@jpx1carb jp]# python3 /usr/share/bcc/examples/hello_world.py 
bpf: Failed to load program: Operation not permitted
 
Traceback (most recent call last):
  File "/usr/share/bcc/examples/hello_world.py", line 12, in 
    BPF(text='int kprobe__sys_clone(void *ctx) { bpf_trace_printk("Hello, World!\\n"); return 0; }').trace_print()
  File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 344, in __init__
    self._trace_autoload()
  File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 1090, in _trace_autoload
    fn = self.load_func(func_name, BPF.KPROBE)
  File "/usr/lib/python3.7/site-packages/bcc/__init__.py", line 380, in load_func
    raise Exception("Need super-user privileges to run")
Exception: Need super-user privileges to run

The last error message “Need super-user privileges to run” is misleading. The “Operation not permitted” error further above corresponds to the EPERM shown in the strace output above.

This lifts the kernel lockdown via the sysrq mechanism, as discussed:

[root@jpx1carb jp]# echo 1 > /proc/sys/kernel/sysrq
[root@jpx1carb jp]# echo x > /proc/sysrq-trigger

Now BCC’s hello world example runs fine:

[root@jpx1carb jp]# python3 /usr/share/bcc/examples/hello_world.py 
b'     gnome-shell-3215  [005] .... 58317.922716: 0: Hello, World!'
b'   Socket Thread-26509 [001] .... 58322.093849: 0: Hello, World!'
b'     gnome-shell-3215  [003] .... 58322.923562: 0: Hello, World!'
[...]

Cool, stuff works.

What the heck just happened? I did not understand a thing and correspondingly started to read a bit about these new shiny topics.

What is the “kernel lockdown”?

Most importantly the concept of the “kernel lockdown” seems to still be evolving.

The the mission statement behind the kernel lockdown is hard to put into words without stepping onto anyone’s toes. This is how RedHat worded the goal in 2017:

The kernel lockdown feature is designed to prevent both direct and indirect access to a running kernel image, attempting to protect against unauthorised modification of the kernel image and to prevent access to security and cryptographic data located in kernel memory, whilst still permitting driver modules to be loaded.

However, that goal was and seems to still be subject to a technical as well as a political debate in the Linux ecosystem: In 2018, Zack Brown from LINUX Journal published a well-researched and quite entertaining article summarizing the heated discussion about the initial set of lockdown patches. If you would like to try to understand what kernel lockdown is (or tries to be) then that article is worth reading. A quote from the article’s last few paragraphs:

This type of discussion is unusual for kernel development, but not for this particular type of patch. The attempts to slip code into the kernel that will enable a vendor to lock users out of controlling their own systems always tend to involve the two sides completely talking past each other. Linus and Andy were unable to get Matthew to address the issue the way they wanted, and Matthew was unable to convince Linus and Andy that his technical explanations were genuine and legitimate.

Also, Jonathan Corbet’s LWN article titled Kernel lockdown in 4.17? from April 2018 is worth a read.

And how do I know if my kernel is locked down? dmesg!

Here’s some dmesg output from my system. It is quite revealing, almost prose:

[    0.000000] Kernel is locked down from EFI secure boot; see man kernel_lockdown.7
[...]
[    2.198433] Lockdown: systemd: BPF is restricted; see man kernel_lockdown.7
[...]
[58310.913828] Lifting lockdown

First, as you can see, the kernel told me exactly that it is “locked down” (even providing the reason: because EFI secure boot is enabled on my system).

Secondly, it was kind enough to say that this affects (e)BPF things! Maybe I should read the kernel messages more often :-).

Thirdly, after quite a bit of system uptime, the “Lifting lockdown” was emitted in response to, well, me lifting the lockdown with the above-mentioned sysrq mechanism.

That is, if you wonder if and how this affects your system, try doing a dmesg | grep lockdown !

The kernel acting “differently depending on some really esoteric detail in how it was booted”…?

When I approached the BCC maintainers about the EPERM error on Fedora 30 they first responded with (basically) “it’s working for me”. Someone actually booted a VM with a fresh Fedora 30 install. And they were unable to reproduce. How can that be? The difference was whether secure boot was enabled or not: it was for my actual desktop machine, but not for their VM setup. That is quite a lesson learned, and maybe an important take-home message from this blog post.

This annoying debugging experience was predicted by Linus Torvalds. A quote from one of his initial reviews of the kernel lockdown patches (source, April 2018):

I do not want my kernel to act differently depending on some really esoteric detail in how it was booted. That is fundamentally wrong. […] Is that really so hard to understand? […] Look at it this way: maybe lockdown breaks some application because that app does something odd. I get a report of that happening, and it so happens that the reporter is running the same distro I am, so I try it with his exact kernel configuration, and it works for me. […] It is *entirely* non-obvious that the reporter happened to run a distro kernel that had secure boot enabled, and I obviously do not.

Well, he was right.

Which kernel versions have the lockdown feature built-in?

Lockdown did not yet land in the mainline kernel. My Fedora 30 with kernel 5.2.15 is affected (with a specific variant of the lockdown patches, not necessarily the final thing!) because RedHat has chosen to build the lockdown patches into recent Fedora kernels, to try it out in the wild.

Will it land in the mainline kernel? When? And how will it behave, exactly? Just a couple of days ago Phoronix published an interesting article, titled Kernel Lockdown Feature Will Try To Land For Linux 5.4. Quote:

After going through 40+ rounds of revisions and review, the Linux kernel “LOCKDOWN” feature might finally make it into the Linux 5.4 mainline kernel.

While not yet acted upon by Linus Torvalds with the Linux 5.4 merge window not opening until next week, James Morris has submitted a pull request introducing the kernel lockdown mode for Linux 5.4.

The kernel lockdown support was previously rejected from mainline but since then it’s been separated from the EFI Secure Boot code as well as being implemented as a Linux security module (LSM) to address some of the earlier concerns over the code. There’s also been other improvements to the design of this module.

Various organizations seem to be pushing hard for this feature to land. It is taking long, but convergence around the details seems to take place.

What is the relationship between kernel lockdown and (e)BPF?

I think it is quite fair to ask: does it make sense that all-things-BPF are affected by the kernel lockdown feature? What does lockdown even have to do with eBPF in the first place?

I should say that I am not super qualified to talk about this because I have only researched this topic for about a day now. But I find highly interesting that

these questions seemingly have been under active debate since the first lockdown patch proposals
these questions seem to still be actively debated!

Andy Lutomirski reviewed in 2018:

“bpf: Restrict kernel image access functions when the kernel is locked
down”: This patch just sucks in general. At the very least, it should
only apply to […] But you should probably just force all eBPF
users through the unprivileged path when locked down instead, since eBPF
is really quite useful even with the stricter verification mode.

This shows that there was some pretty fundamental debate about the relationship between eBPF and kernel lockdown from the start.

I believe that the following quote shows how eBPF can, in general, conflict with the goal(s) of kernel lockdown (commit message of a 2019 version lockdown patch fragment):

From: David Howells 

There are some bpf functions can be used to read kernel memory:
bpf_probe_read, bpf_probe_write_user and bpf_trace_printk.  These allow
private keys in kernel memory (e.g. the hibernation image signing key) to
be read by an eBPF program and kernel memory to be altered without
restriction. Disable them if the kernel has been locked down in
confidentiality mode.

Suggested-by: Alexei Starovoitov 
Signed-off-by: David Howells 
Signed-off-by: Matthew Garrett 
cc: netdev@vger.kernel.org
cc: Chun-Yi Lee 
cc: Alexei Starovoitov 
Cc: Daniel Borkmann

This commit message rather convincingly justifies that something needs to be done about eBPF when the kernel is locked down (so that the goals of the lockdown do not get undermined!). However, it is not entirely clear what exactly should be done, how exactly eBPF is supposed to be affected, how its inner workings and aspects are to be confined when the kernel is locked down: what follows is a reply from Andy Lutomirski to the above’s commit message: “:) This is yet another reason to get the new improved bpf_probe_user_read stuff landed!”

And indeed, only last month (August 2019) Andy published a work-in-progress patch set titled bpf: A bit of progress toward unprivileged use.

What I learned is that with the current Fedora 30 and its 5.2.x kernel I neither see the “final” lockdown feature nor the “final” relationship between lockdown and eBPF. This is very much work in progress, worse than “cutting edge”: what works today might break tomorrow, with the next kernel update :-)!

By the way, I started to look into eBPF for https://github.com/jgehrcke/goeffel, a tool for measuring the resource utilization of a specific process over time.

Update Sept 30: lockdown just landed in the mainline kernel, wow! Quote from the commit message, clarifying important topics (such as that lockdown will not be tightly coupled to secure boot):

This is the latest iteration of the kernel lockdown patchset, from
  Matthew Garrett, David Howells and others.
 
  From the original description:
 
    This patchset introduces an optional kernel lockdown feature,
    intended to strengthen the boundary between UID 0 and the kernel.
    When enabled, various pieces of kernel functionality are restricted.
    Applications that rely on low-level access to either hardware or the
    kernel may cease working as a result - therefore this should not be
    enabled without appropriate evaluation beforehand.
 
    The majority of mainstream distributions have been carrying variants
    of this patchset for many years now, so there's value in providing a
    doesn't meet every distribution requirement, but gets us much closer
    to not requiring external patches.
 
  There are two major changes since this was last proposed for mainline:
 
   - Separating lockdown from EFI secure boot. Background discussion is
     covered here: https://lwn.net/Articles/751061/
 
   -  Implementation as an LSM, with a default stackable lockdown LSM
      module. This allows the lockdown feature to be policy-driven,
      rather than encoding an implicit policy within the mechanism.

—

Sept 26, 2019: First published on September 26, 2019
Sept 27: Significantly edited on September 27
Sept 28: added note about Hacker News discussion thread
Sept 30: added paragraph about kernel lockdown merged into mainline kernel

gipc 1.0.0 release

Jan-Philip Gehrcke — Sat, 15 Dec 2018 18:43:40 +0000

More than three years after the 0.6.0 release I have published gipc version 1.0.0 today.

Quick links

Release highlights

This release focuses on reliability and platform compatibility. It brings along a number of changes relevant for running it on Windows and macOS, as well as for running it under PyPy.

Both, gevent 1.2 and 1.3 are now officially supported. On Linux, gipc now officially supports CPython 2.7, 3.4, 3.5, 3.6, PyPy2.7, and PyPy3. On Windows, gipc officially supports gevent 1.3 on CPython 2.7, 3.4, 3.5, 3.6, and 3.7. Support for gevent 1.1 and CPython 3.3 has been dropped.

The API did not change. In view of the stability of the API over the recent years I thought it is time to officially declare it as such, and to follow the semantic versioning spec‘s point 5: “Version 1.0.0 defines the public API” :-).

For this release most of the work went into

fixing a small number of platform-dependent bugs (this one was interesting, and this one was pretty insightful and ugly).
setting up a continuous integration (CI) pipeline for Linux and macOS (on Travis CI) as well as for Windows (via AppVeyor).
Re-writing and re-styling significant parts of the documentation: the new docs are online and can be found at https://gehrcke.de/gipc (for comparison: old docs).
moving the repository from Bitbucket to GitHub (I also migrated issues using this well-engineered helper).
making tests more stable.
running the example programs as part of CI, on all supported platforms (required a number of consolidations).

Acknowledgements

I would like to thank the following people who have helped with this release, be it by submitting bug reports, by asking great questions, with testing, or with a bit of code: Heungsub Lee, James Addison, Akhil Acharya, Oliver Margetts.

Changelog for this release

For the record, the complete changelog for this release copied from CHANGELOG.rst:

New platform support:

Add support for PyPy on Linux. Thanks to Oliver Margetts and to Heungsub Lee for patches.

Fixes:

Fix a bug as of which gipc crashed when passing “raw” pipe handles between processes on Windows (see issue #63).

Fix can't pickle gevent._semaphore.Semaphore error on Windows.

Fix ModuleNotFoundError in test_wsgi_scenario.

Fix signal handling in example infinite_send_to_child.py.

Work around segmentation fault after fork on Mac OS X (affected test_wsgi_scenario and example program wsgimultiprocessing.py).

Test / continuous integration changes:

Fix a rare instability in test_exitcode_previous_to_join.

Make test_time_sync more stable.

Run the example programs as part of CI (run all on Linux and Mac, run most on Windows).

Linux main test matrix (all combinations are covered):

gevent dimension: gevent 1.2.x, gevent 1.3.x.

Python implementation dimension: CPython 2.7, 3.4, 3.5, 3.6, PyPy2.7, PyPy3.

Also test on Linux: CPython 3.7, pyenv-based PyPy3 and PyPy2.7 (all with gevent 1.3.x only).

Mac OS X tests (all with gevent 1.3.x):

pyenv Python builds: CPython 2.7, 3.6, PyPy3

system CPython

On Windows, test with gevent 1.3.x and CPython 2.7, 3.4, 3.5, 3.6, 3.7.

Potentially breaking changes:

gevent 1.1 is not tested anymore.

CPython 3.3 is not tested anymore.

Atomically switch a directory tree served by nginx

Jan-Philip Gehrcke — Thu, 15 Nov 2018 19:32:31 +0000

This post briefly demonstrates how to atomically switch a directory tree of static files served by nginx.

Consider the following minimal nginx config file:

$ cat conf/nginx.conf 
events {
    use epoll;
}
 
http {
    server {
        listen 0.0.0.0:80;
        location / {
            root /static/current ;
        }
    }
}

The goal is to replace the directory /static/current atomically while nginx is running.

This snippet shows the directory layout that I started out with:

$ tree
.
├── conf
│   └── nginx.conf
└── static
    ├── version-1
    │   └── hello.html
    └── version-2
        └── hello.html

conf/nginx.conf is shown above. The static directory contains two sub trees, and the goal is to switch from version-1 to version-2.

For this demonstration I have started a containerized nginx from its official Docker image:

$ docker run -v $(realpath static):/static:ro -v $(realpath conf):/etc/nginx:ro -p 127.0.0.1:8088:80 nginx nginx -g 'daemon off;'

This mounts the ./static directory as well as the nginx configuration file into the container, and exposes nginx listening on port 8088 on the local network interface of the host machine.

Then, in the ./static directory one can choose the directory tree served by nginx by setting a symbolic link, and one can subsequently switch the directory tree atomically, as follows:

1) No symbolic link is set yet — leading to a 404 HTTP response (the path /static/current does not exist in the container from nginx’ point of view):

$ curl http://localhost:8088/hello.html

404 Not Found

404 Not Found
nginx/1.15.6

2) Set the current symlink to serve version-1:

$ cd static
$ ln -s version-1 current && curl http://localhost:8088/hello.html
hello 1

3) Prepare a new symlink for version-2 (but don’t switch yet):

$ ln -s version-2 newcurrent

4) Atomically switch to serving version-2:

$ mv -fT newcurrent current && curl http://localhost:8088/hello.html
hello 2

In step (4) It is essential to use mv -fT which changes the symlink with a rename() system call. ln -sfn would also appear to work, but it uses two system calls under the hood and therefore leaves a brief time window during which opening files can fail because the path is invalid.

Final directory layout including the symlink current (currently pointing to version-2):

$ tree
.
├── conf
│   └── nginx.conf
└── static
    ├── current -> version-2
    ├── version-1
    │   └── hello.html
    └── version-2
        └── hello.html

Kudos to https://rcrowley.org/2010/01/06/things-unix-can-do-atomically.html for being a great reference.

Download article as PDF file from Elsevier’s ScienceDirect via command line (curl)

Jan-Philip Gehrcke — Sat, 12 Sep 2015 14:42:40 +0000

When not in the office, we often times cannot directly access scientific literature, because access control is usually based on IP addresses. However, we usually have SSH access to the university network. Being logged in to a machine in the university network we should — in theory — be able to access a certain article. Most of the times it is the PDF file that we are interested in and not the “web page” corresponding to an article. So, can’t we just $ curl http://whatever.com/article.pdf to get that file? Most of the times, this does not work, because access to journal articles usually happens through rather complex web sites, such as Elsevier’s ScienceDirect:

ScienceDirect is a leading full-text scientific database offering journal articles and book chapters from nearly 2,500 journals and 26,000 books.

Such web sites add a considerable amount of complexity to the technical task of downloading a file. The problem usually starts with obtaining the direct URL to the PDF file. Also, HTTP redirection and cookies are usually involved. Often times, the only solution people see to solve these issues is to set up a VPN and then to use a fully fledged browser through that VPN, and let the browser deal with the complexity.

However, I prefer to get back to the basics and always strive to somehow find a direct URL to the PDF file to then download it via curl or wget.

This is my solution for Elsevier’s ScienceDirect:

Say, for instance, you wish to download the PDF version of this article: http://www.sciencedirect.com/science/article/pii/S0169433215012131

Then all you need is that URL and the following commands executed on a common Linux system:

export SDURL="http://www.sciencedirect.com/science/article/pii/S0169433215012131"
curl -Lc cookiejar "${SDURL}" | grep pdfurl | perl -pe 's|.* pdfurl=\"(.*?)\".*|\1|' > pdfurl
curl -Lc cookiejar "$(cat pdfurl)" > article.pdf

The method first parses the HTML source code of the main page corresponding to the article and extracts a URL to the PDF file. At the same time, it also stores the HTTP cookie(s) set by the web server when accessing named web page. These cookies are then re-used when accessing the PDF file directly. This has reproducibly worked for me.

If it does not work for you, I recommend having a look into the file pdfurl and see if that part of the process has lead to a meaningful result or not. Obviously, the second step can only succeed aver having obtained a proper URL to the PDF file.

This snippet should not be treated as a black box. Please execute it in an empty directory. Also note that this snippet only works subject to the condition that ScienceDirect keeps functioning the way it does right now (which most likely is the case for the next couple of months or years).

Don’t hesitate to get back to me if you have any questions!

Uploaded.to download with wget

Jan-Philip Gehrcke — Mon, 02 Mar 2015 23:14:04 +0000

Downloading from uploaded.to/.net with premium credentials through the command line is possible using standard tools such as wget or curl. However, there is no official API and the exact method required depends on the mechanism implemented by the uploaded.to/.net website. Finding these implementation details requires a little amount reverse engineering.

Here I share a small shell script that should work on all POSIX-compliant platforms (e.g. Mac or Linux). The method is based on current behavior of the uploaded.to website. There are no special tools involved, just wget, grep, sed, mktemp.

(The solutions I found on the web did not work (anymore) and/or were suspiciously wrong.)

Usage

Copy the script content below, define username and password, and save the script as, for instance, download.sh. Then, invoke the script like so:

$ /bin/sh download.sh urls.txt

The file urls.txt should contain one uploaded.to/.net URL per line, such as in this example:

http://uploaded.net/file/98389123/foo.rar
http://uploaded.net/file/bxmsdkfm/bar.rar
http://uploaded.net/file/72asjh98/not.zip

Method

This paragraph is just for the curious ones. The script first POSTs your credentials to http://uploaded.net/io/login and stores the resulting authentication cookie in a file. This authentication cookie is then used for retrieving the website corresponding to an uploaded.to file. That website contains a temporarily valid download URL corresponding to the file. Using grep and sed, the HTML code is filtered for this URL. The payload data transfer is triggered by firing a POST request with empty body against this URL (cookie not needed). Files are downloaded to the current working directory. All intermediate data is stored in a temporary directory. That directory is automatically deleted upon script exit (no data is leaked, unless the script is terminated with SIGKILL).

The script

#!/bin/sh
# Copyright 2015 Jan-Philip Gehrcke, http://gehrcke.de
# See http://gehrcke.de/2015/03/uploaded-to-download-with-wget/
 
 
USERNAME="user"
PASSWORD="password"
 
 
if [ "$#" -ne 1 ]; then
    echo "Missing argument: URLs file (containing one URL per line)." >&2
    exit 1
fi
 
 
URLSFILE="${1}"
if [ ! -r "${URLSFILE}" ]; then
    echo "Cannot read URLs file ${URLSFILE}. Exit." >&2
    exit 1
fi
if [ ! -s "${URLSFILE}" ]; then
    echo "URLs file is empty. Exit." >&2
    exit 1
fi
 
 
TMPDIR="$(mktemp -d)"
# Install trap that removes the temporary directory recursively
# upon exit (except for when this program retrieves SIGKILL).
trap 'rm -rf "$TMPDIR"' EXIT
 
 
LOGINRESPFILE="${TMPDIR}/login.response"
LOGINOUTPUTFILE="${TMPDIR}/login.outerr"
COOKIESFILE="${TMPDIR}/login.cookies"
LOGINURL="http://uploaded.net/io/login"
 
 
echo "Temporary directory: ${TMPDIR}"
echo "Log in via POST request to ${LOGINURL}, save cookies."
wget --save-cookies=${COOKIESFILE} --server-response \
    --output-document ${LOGINRESPFILE} \
    --post-data="id=${USERNAME}&pw=${PASSWORD}" \
    ${LOGINURL} > ${LOGINOUTPUTFILE} 2>&1
 
# Status code is 200 even if login failed.
# Uploaded sends a '{"err":"User and password do not match!"}'-like response
# body in case of error.
 
echo "Verify that login response is empty."
# Response is more than 0 bytes in case of login error.
if [ -s "${LOGINRESPFILE}" ]; then
    echo "Login response larger than 0 bytes. Print response and exit." >&2
    cat "${LOGINRESPFILE}"
    exit 1
fi
 
# Zero response size does not necessarily imply successful login.
# Wget adds three commented lines to the cookies file by default, so
# set cookies should result in more than three lines in this file.
COOKIESFILELINES="$(cat ${COOKIESFILE} | wc -l)"
echo "${COOKIESFILELINES} lines in cookies file found."
if [ "${COOKIESFILELINES}" -lt "4" ]; then
    echo "Expected >3 lines in cookies file. Exit.". >&2
    exit 1
fi
 
echo "Process URLs."
# Assume that login worked. Iterate through URLs.
while read CURRENTURL; do
    if [ "x$CURRENTURL" = "x" ]; then
        # Skip empty lines.
        continue
    fi
    echo -e "\n\n"
    TMPFILE="$(mktemp --tmpdir=${TMPDIR} response.html.XXXX)"
    echo "GET ${CURRENTURL} (use auth cookie), store response."
    wget --no-verbose --load-cookies=${COOKIESFILE} \
        --output-document ${TMPFILE} ${CURRENTURL}
 
    if [ ! -s "${TMPFILE}" ]; then
        echo "No HTML response: ${TMPFILE} is zero size. Skip processing."
        continue
    fi
 
    # Extract (temporarily valid) download URL from HTML.
    LINEOFINTEREST="$(grep post ${TMPFILE} | grep action | grep uploaded)"
    # Match entire line, include space after action="bla" , replace
    # entire line with first group, which is bla.
    DLURL=$(echo $LINEOFINTEREST | sed 's/.*action="\(.\+\)" .*/\1/')
    echo "Extracted download URL: ${DLURL}"
    # This file contains account details, so delete as soon as not required
    # anymore.
    rm -f "${TMPFILE}"
    echo "POST to URL w/o data. Response is file. Get filename from header."
    # --content-disposition should extract the proper filename.
    wget --content-disposition --post-data='' "${DLURL}"
done < "${URLSFILE}"

Systemdatenträger wechseln ohne “Augen zu und durch”-Mentalität

Jan-Philip Gehrcke — Sun, 02 Nov 2014 22:35:45 +0000

Einleitung

In diesem Artikel beschreibe ich wie man Windows in wenigen Schritten von einem Speichermedium zum Anderen überträgt — mit professionellen Werkzeugen. Das ist ein kritischer Prozess. Aber ein simpler Prozess, den man verstehen kann und sollte, bevor man ihn einleitet. Wie immer gibt es zahlreiche schändliche Anleitungen da draußen, bei denen dieser Prozess nicht hinreichend erklärt und irgendwelchen bunten Klick-Tools überlassen wird. Das Problem dabei: der Vorgang wird dem Nutzer übersimplifiziert präsentiert. Wichtige Entscheidungen trifft die angepriesene Software selbst, essentielle Details werden vom Nutzer ferngehalten. Aber im Bereich von Datenträgerstrukturen gibt es zu viele potentielle Komplikationen, deren Behandlung man keiner Pseudointelligenz oder auch der “Holzhammermethode” überlassen darf:

Holzhammermethode, zweckentfremdet von datamartist.com.

In professionellen Umgebungen wird ein solcher Migrationsprozess normalerweise akribisch geplant und jeder Schritt mit dem einen dafür etablierten Werkzeug ausgeführt. Ergebnis: die Migration gelingt schnell und zuverlässig.

In privaten Umgebungen herrscht Unsicherheit, man kennt die technischen Details nicht und folgt gerne einer simplen Abstraktion des Problems. Es herrscht Augen zu und durch-Mentalität. Grafisch hübsch aufgemachte Software-Assistenten suggerieren schließlich eine gewisse Sicherheit. Doch Hoffnung ist keine Strategie. Das etwas andere Ergebnis: mit einiger Wahrscheinlichkeit geht etwas schief und man verbringt Stunden oder Tage mit einer chaotischen Phase der Problemlösung, bei der vielleicht Dritte mit einbezogen werden müssen.

Eine grundsätzliche Ursache dieses Gefälles zwischen professioneller und privater Umgebung ist sicherlich der Mangel an Detailwissen in Letzterer. Damit einher geht aber — und das ist das eigentliche Übel — fehlende Anerkennung, dass ein solcher Prozess akribisch geplant werden muss. Ich schreibe das hier auf Deutsch und mit dieser kleinen Einleitung, um in meinem direkten Umfeld ein besseres Bewusstsein für diese Thematik und Problematik zu schaffen.

Es wird unten stehend demonstriert, wie ein solcher Prozess zuverlässig umgesetzt werden kann. Als Beispiel dient ein Spezialfall, den die meisten Windows-Nutzer zu Hause haben. Die Grundidee — Detailplanung und die verwendeten Ansätze und Werkzeuge — sind jedoch auch auf andere Situationen und Betriebssysteme anwendbar. Ihr werdet merken, wenn Eure Situation von der hier beschriebenen abweicht. Dann könnt Ihr mich gerne fragen. Dementsprechend erhebt die folgende Anleitung keinen Anspruch auf Vollständigkeit.

Migrationsprozess

1) Live-Linux präparieren (5 Minuten)

Ihr braucht einen USB-Stick mit einem modernen Live-Linux-System, welches die notwendigen Werkzeuge bereithält, um zuverlässig Daten auf Datenträgern zu analysieren und zu manipulieren. Ich empfehle die Linux-Distribution SystemRescueCD. Gibt es hier zum Herunterladen. Wie bekommt Ihr das auf einen USB-Stick, von dem der Rechner starten kann? Mit dem quelloffenen Rufus. Dort wählt Ihr “create a bootable disk” und bei “select ISO file” wird das SystemRescueCD-Abbild ausgewählt, was gerade heruntergeladen wurde.

2) Transfersystem vorbereiten (5 Minuten)

Der Quell-Datenträger (mit der bisherigen Windows-Installation) wird aus dem Produktivsystem herausgenommen. Der Ziel-Datenträger wird bereitgelegt.

Es wird ein Rechner gebraucht, in dem diese beiden Datenträger gleichzeitig angeschlossen werden. Dieser Rechner wird vom zuvor präparierten USB-Stick gestartet. Beim Start von SystemRescueCD wird der default Start-Modus gewählt, das Tastaturlayout (z. B. DE) definiert und die Desktop-Umgebung mittels startx gestartet.

In diesem Zustand ist der Zugriff auf alle Datenträger möglich. SystemRescueCD wagt jedoch keine autonomen Zugriffe. Ohne Anweisung wird kein einziges Dateisystem eingehängt. Insbesondere wird ohne konkreten Befehl auf keinen Datenträger schreibend zugegriffen. Das ist ein konzeptioneller Schutz vor ungeplanten Ereignissen.

3) Quelle und Ziel eindeutig identifizieren (2 Minuten)

Jedes Linux-Betriebssystem nummeriert die Datenträger intern durch. Jeder Datenträger und jede Partition ist per /dev/sd* lesend und schreibend erreichbar. Jetzt muss die Bezeichnung des bisherigen Windows-Datenträgers ausfindig gemacht werden. Es hilft eine Auflistung der verfügbaren Datenträger samt Partitionen. Ein praktischer Befehl ist fsarchiver probe simple, welcher in ein Terminal eingegeben wird. Das ist die Ausgabe in meinem Fall:

[======DISK======] [=============NAME==============] [====SIZE====] [MAJ] [MIN]
[sda             ] [KINGSTON SVP200S               ] [   111.79 GB] [  8] [  0]
[sdb             ] [WDC WD10EAVS-00D               ] [   931.51 GB] [  8] [ 16]
[sdc             ] [Samsung SSD 850                ] [   476.94 GB] [  8] [ 32]
[sdd             ] [RALLY2                         ] [     3.74 GB] [  8] [ 48]
[sde             ] [USB DISK                       ] [     7.55 GB] [  8] [ 64]
 
[=====DEVICE=====] [==FILESYS==] [======LABEL======] [====SIZE====] [MAJ] [MIN] 
[loop0           ] [squashfs   ] [        ] [   278.66 MB] [  7] [  0] 
[sda1            ] [ntfs       ] [System Reserved  ] [   100.00 MB] [  8] [  1] 
[sda2            ] [ntfs       ] [        ] [   111.69 GB] [  8] [  2] 
[sdb1            ] [ntfs       ] [Software         ] [   196.16 GB] [  8] [ 17] 
[sdb2            ] [ntfs       ] [Daten            ] [   735.35 GB] [  8] [ 18] 
[sdd1            ] [vfat       ] [SYSRCD-4_4_      ] [     3.74 GB] [  8] [ 49] 
[sde1            ] [vfat       ] [        ] [     7.55 GB] [  8] [ 65]

Beim Lesen der Ausgabe braucht man ein bisschen Kenntnis über die eigene Hardware: in der oberen Liste sollten Gerätebezeichnung und Datenträgerkapazität genügend Aufschluss geben. Der bisherige Windows-Datenträger von Kingston heißt in diesem Fall sda. Er ist per Dateisystempfad /dev/sda erreichbar.

In der unteren Liste sind die automatisch erkannten Partitionen aufgeführt. Die Partition sda1 ist eine 100 MB große Partition mit dem Namen “System Reserved …”. Diese wird für gewöhnlich von Windows 7 & 8 bei der Installation angelegt und übernimmt essentielle Aufgaben (sie wird im Betrieb von Windows nicht als normaler Datenträger angezeigt aber muss bei der Migration mitgenommen werden). Die Partition sda2 ist das, was man aus Windows-Sicht als Systemlaufwerk bezeichnen würde (meist Laufwerk “C:”). Bei beiden Partitionen hat fsarchiver wie erwartet erkannt, dass sie ein NTFS-Dateisystem enthalten. Es ist keine weitere Partition auf /dev/sda vorhanden. Die Bestandsaufnahme der Quelle ist somit abgeschlossen.

/dev/sdc ist der Datenträger, der zukünftig Windows tragen soll. Bisher ist keine Partition auf diesem Datenträger eingerichtet.

4) Die Transferstrategie festlegen

Die Strategie im Groben: Die Inhalte beider Quell-Partitionen (/dev/sda1 und /dev/sda2) sollen auf Block-Ebene auf den neuen Datenträger kopiert werden. Vorher bekommt der neue Datenträger eine frische Partitionstabelle. Der Bootloader wird vom alten Datenträger auf den Neuen kopiert.

Die Strategie im Detail:

Per GNU fdisk wird eine neue Partitionstabelle vom Typ msdos auf /dev/sdc erstellt, sodass die Situation vom alten Datenträger in den wesentlichen Punkten repliziert wird:
- /dev/sdc1 erhält die selbe Startposition und Größe wie /dev/sda1.
- In diesem Fall ist der neue Datenträger größer als der alte: /dev/sdc2 soll den Rest des Datenträgers füllen.
- Beide Partitionen erhalten eine Markierung vom Typ 7 (HPFS/NTFS/exFAT).
- /dev/sdc1 wird als aktiv markiert.
Der Master Boot Record wird per dd nur teilweise vom alten Datenträger auf den neuen kopiert (nur der Bootloader und die Datenträgersignatur, nicht jedoch die Partitionstabelle).
/dev/sda1 wird blockweise per dd nach /dev/sdc1 kopiert.
/dev/sda2 wird blockweise per dd nach /dev/sdc2 kopiert.
Das NTFS-Dateisystem welches auf /dev/sda2 befindet hat nun mehr Platz auf /dev/sdc2. Es wird per ntfsresize entsprechend darauf aufmerksam gemacht, sodass es den verbleibenden Platz innerhalb der Partition nutzen kann. Zur Vereinfachung dieses Schritts wird als Wrapper GParted benutzt. Beide Tools sind quelloffen und Industriestandard.

Alle hier genannten Werkzeuge sind Teil der SystemRescueCD-Distribution.

5) Realisierung (5 Minuten Arbeit, X Minuten Warten)

Details der Partitionstabelle vom bisherigen Windows-Datenträger per fdisk -l auslesen:
Device Boot Start End Blocks Id System /dev/sda1 * 2048 206847 102400 7 HPFS/NTFS/exFAT /dev/sda2 206848 234438655 117115904 7 HPFS/NTFS/exFAT
Beide Partitionen sind von Typ 7. /dev/sda1 ist Boot-markiert (“active”). Start und Ende sind präzise angegeben. Diese Parameter wollen wir exakt replizieren. Die einzige Änderung der neuen Tabelle gegenüber der Alten wird die Endmarkierung der Partition /dev/sda2 sein, da der neue Datenträger größer ist als der Alte.

Neue Partitionstabelle auf neuem Datenträger per fdisk /dev/sdc erstellen. fdisk wird schrittweise per Tastatur bedient:
- o create a new empty DOS partition table
- n add a new partition
- p primary (0 primary, 0 extended, 4 free)
- Partition number 1, first sector: 2048 (default), last sector: 206847
- n add a new partition
- p primary (0 primary, 0 extended, 4 free)
- Partition number 2, first sector: default, last sector: default (bis zum Ende des Datenträgers)
- mit Befehl t change a partition type wird Typ 7 für beide Partitionen gewählt.
- mit Befehl a toggle a bootable flag wird Partition 1 aktiv markiert.
- mit Befehl w write table to disk and exit werden die Änderungen geschrieben.

Bootloader und Datenträgersignatur blockweise kopieren (die ersten 446 Bytes des Datenträgers):

# dd if=/dev/sda of=/dev/sdc bs=446 count=1
1+0 records in
1+0 records out
446 bytes (446 B) copied, 0.00403888 s, 110 kB/s

kleine “System Reserved”-Partition blockweise kopieren:

# dd if=/dev/sda1 of=/dev/sdc1
204800+0 records in
204800+0 records out
104857600 bytes (105 MB) copied, 2.02602 s, 51.8 MB/s

Windows-Partition (“C:”) blockweise kopieren:

# dd if=/dev/sda2 of=/dev/sdc2
234231808+0 records in
234231808+0 records out
119926685696 bytes (120 GB) copied, 2271.83 s, 52.8 MB/s

NTFS-Dateisystem auf /dev/sdc2 vergrößern:
- GParted starten.
- /dev/sdc2 rechtsklicken und Check wählen.
- In der Ausgabe kann man verfolgen, dass ntfsresize --force --force /dev/sdc2 aufgerufen wird. Es wird der Größenunterschied zwischen Dateisystem und Partition aufgeführt: Current volume size: 119926682112 bytes (119927 MB) vs. Current device size: 512004284416 bytes (512005 MB). Am Ende steht ein Successfully resized NTFS on device '/dev/sdc2'. Da keine Daten verschoben werden müssen, dauert dieser Vorgang nur wenige Sekunden.

Bei den obig angegebenen Schritten ist zu beachten, dass ein Vertauschen/Verschreiben bei den Gerätepfaden (/dev/sd*) unverzeihlich ist. Bitte drei mal denken und kontrollieren, bevor Ihr auf Enter drückt. Ansonsten habt Ihr ja sicherlich eine Sicherung Eurer persönlichen Daten.

6) Rückbau des Datenträgers in das Produktivsystem (5 Minuten)

Der neue Datenträger mit der Windows-Installation wird in das Produktivsystem zurückgebaut. Per BIOS-Einstellung wird das Produktivsystem vom neuen Datenträger gestartet. Es sind keine weiteren Schritte erforderlich.

Fazit

Etwa 25 Minuten Arbeit für eine Betriebssystemmigration (+ Zeit für Datentransfer). Das ist schnell. Warum gewährleistet obige Strategie Zuverlässigkeit? Der durchgeführte Prozess stellt sicher, dass sich zwischen der alten und der neuen Situation nur zwei Kleinigkeiten geändert haben: die Bezeichnung des Datenträgers und die Größe des NTFS-Dateisystems in dem Windows (mehrheitlich) installiert ist. Aus Sicht des BIOS und aus Sicht des Betriebssystems ist der Rest des Gesamtsystems äquivalent zum vorherigen Zustand. Mit dieser Betrachtungsweise ist es leicht zu verstehen, dass die Migration mit hoher Wahrscheinlichkeit gelingt. Die präzise Kontrolle der Geschehnisse erleichtert auch eine etwaige Fehlersuche: gibt es nach Schritt 6 ein Problem, so muss es an einem der beiden geänderten Parameter liegen (in der Tat: UEFI Systeme mögen bedingt durch den Wechsel des Geräte-Namens bzw. der Geräte-ID einen weiteren kleinen Schritt erfordern).

Bei Eurem nächsten Wechsel des Systemdatenträgers solltet Ihr die Migration systematisch durchführen. Wenn Ihr den Prozess unter Kontrolle habt, dann gelingt er mit hoher Wahrscheinlichkeit. Traut lieber den quelloffenen Standard-Werkzeugen welche in professionellen Umgebungen eingesetzt werden als den großen Versprechungen von (kommerziell erhältlichen) Software-Assistenten für Privantanwender.

Hello SPF record.

Jan-Philip Gehrcke — Thu, 24 Jul 2014 13:45:24 +0000

Introduction

Say Google’s or Yahoo’s mail transfer agent (MTA) retrieves an email sent from an MTA running on the host with the (fictional) IP address 1.1.1.1. Say the sending MTA identifies itself as sending on behalf of important.com (within the EHLO/HELO handshake, or with a From address with the domain important.com). Then the receiving MTA (Google or Yahoo, or something else) usually queries the important.com domain via DNS for the so-called SPF (Sender Policy Framework) record. Using this SPF record the retrieving MTA can verify whether the owner of the domain important.com intended to send mail from 1.1.1.1 or not.

The SPF record might say that important.com does not send mails. Then the receiving MTA knows for sure that the incoming mail is invalid/fake/spam. Or this record says that important.com actually sends mail, but only from the (fictional) IP address 2.2.2.2. Then the retrieving MTA also knows for sure that something is wrong with the mail. Another possibility is that the record states that important.com actually sends mail from 1.1.1.1, in which case the incoming mail likely is intended by the owner of the domain important.com. This does not mean that this mail is not spam, but at least one can be quite sure that nobody crafted that mail on behalf of important.com.

That was a simplified description for why it is important to have a proper SPF record set for a certain domain in case one intends to send mail from this domain. In the next paragraphs I explain how I have set the SPF record for gehrcke.de and why.

SPF record for gehrcke.de

Domainfactory is the tech-c for gehrcke.de. However, the DNS entries for gehrcke.de are managed by Cloudflare (free, and quite comfortable). Today, I added an SPF record for gehrcke.de via Cloudflare’s web interface. The first important thing to take note of is that the SPF record actually is a TXT record. There once was a distinct SPF record proposed, but in RFC 7208 (which specifies SPF) we clearly read:

The SPF record is expressed as a single string of text found in the RDATA of a single DNS TXT resource record

and

SPF records MUST be published as a DNS TXT (type 16) Resource Record (RR) [RFC1035] only. Use of alternative DNS RR types was supported in SPF’s experimental phase but has been discontinued.

By now it should be clear that I had to add a new TXT record for gehrcke.de, with a special syntax, the SPF syntax. This syntax is specified here. What I need to express in the record is that mail sent on behalf of gehrcke.de is only sent by the machine with the IPv4 address 5.45.109.1 / or from the IPv6 subnet fe80::5054:8dff:fe9b:358d/64. This is the machine that I know should send mail. No other people / machines should send mail on behalf of my domain. The corresponding record is:

v=spf1 ip4:5.45.109.1 ip6:fe80::5054:8dff:fe9b:358d/64 -all

The prefix v=spf1 declares the SPF version, the ip4 and ip6 markers declare that mail from a certain address or subnet are valid. -all expresses that mail from all other hosts is disallowed. The fact that I do not “allow” others to send mail on behalf of gehrcke.de is just my declaration of intent. A retrieving MTA is not required to respect my SPF record. However, larger mail providers should respect it and at least classify a mail that does not pass the SPF test as spam, if not reject it right away.

Whenever I switch machines or IP addresses, I need to remember to update this record. This is a minor disadvantage. To circumvent this, in many cases a short

v=spf1 a -all

would be enough. It expresses that mail sent from the IP address corresponding to the DNS A record of the domain is allowed. In my case, however, this does not work since the A record points to Cloudflare’s cache rather than directly to my machine.

Testing the record

There are many web-based tools for checking whether a certain domain has an SPF record set and whether its syntax is valid. Particularly useful is, for instance, mxtoolbox.com/SuperTool.aspx. However, a real full-system validation obviously requires you to send a mail to an external service that resembles a serious mail provider and debugs the communication for you, meaning that it informs you about all externally visible details of your mail setup, including the SPF record. A great verification tool that does exactly this is provided by Port25. On the command line I used my local MTA (Exim4) to send mail to their service, instructing them to return the corresponding report to may Googlemail address:

$ echo "This is a test." | mail -s Testing check-auth-jgehrcke=googlemail.com@verifier.port25.com

The following is an excerpt of the test result, indicating that the actual mail sender matches the data provided in the SPF record:

HELO hostname:  gurke2.gehrcke.de
Source IP:      5.45.109.1
mail-from:      user@gehrcke.de
 
----------------------------------------------------------
SPF check details:
----------------------------------------------------------
Result:         pass 
ID(s) verified: smtp.mailfrom=user@gehrcke.de
DNS record(s):
    gehrcke.de. SPF (no records)
    gehrcke.de. 300 IN TXT "v=spf1 ip4:5.45.109.1 ip6:fe80::5054:8dff:fe9b:358d/64 -all"

Great.