Monthly Archives: February 2020

NodeJS http.ClientRequest “finished” event: has the request body been flushed out?

The popular got HTTP client from the NodeJS ecosystem uses the http-timer package which itself uses the “finish” event on a http.ClientRequest to determine the point in time when the HTTP request body has been written to the remote end (has been “uploaded”). Code.

This point in time is then used as the reference for measuring the time it takes for the HTTP server on the other end to generate a response: often called “time to first byte” (TTFB), this client-side metric measures the duration between sending out the last byte of the request to receiving the first byte of the response. TTFB is often used as a server performance metric, indicating the time it took the server to process the request.

My suspicion was that the TTFB numbers I saw in my scenario were pretty off.

I have then made a quick verification experiment in which I send an HTTP request with about 10 MB body size to an HTTP server under my control. I have confirmed the TCP upload to take roughly 30 seconds (reproducibly, through my slowish but stable Internet connection), and I have confirmed the HTTP server to immediately send a response once it has consumed the request body. For all means and purposes of this quick sanity check the actual request upload duration therefore is ~30 seconds, and the actual TTFB is practically zero.

What did http-timer measure? In one attempt, the “finish” event on the http.ClientRequest fired after about 17 seconds, resulting in an alleged TTFB of about 13 seconds. Repetitions yielded 20 s / 10 s, 15 s / 15 s, and more samples in the same ball park. That is, the method seems to significantly underestimate the request upload duration (it determines the point in time when the HTTP request body has been written to the remote end as too early), resulting in an overestimated TTFB (many seconds instead of ~zero seconds).

I suspected that the presence of some big buffer(s) architecturally skews the numbers. So I did this verification experiment with small TCP write buffers on my host operating system:

$ echo 'net.ipv4.tcp_wmem = 4096 16384 65536' >> /etc/sysctl.conf
$ echo 'net.core.wmem_max=65536' >> /etc/sysctl.conf
 
$ sysctl -p
 
$ cat /proc/sys/net/core/wmem_max
65536
 
$ cat /proc/sys/net/ipv4/tcp_wmem
4096    16384   65536

From here I thought that most probably there is some kind of request buffering going on within the got/NodeJS system; the internals of which system are largely unknown to me.

At least, I believe that a big buffer between the client code and the host’s TCP stack would make the problem of “determining the point in time when the HTTP request body has been written to the remote end” kind of an ill-posed problem, explaining the discrepancy between the actual timings and the measured timings.

In NodeJS, a http.ClientRequest is a Writable stream. it has the concept of a highWaterMark (which I understand to be a buffer capacity, roughly) which by default seems to be set to 16 kB, and it can be configured upon stream construction.

In the NodeJS standard library I have found that the highWaterMark is not explicitly set in the routine which largely implements http.ClientRequest construction. Is it set somewhere else? It does not seem to be set in the got code base either. grep has revealed that _http_outgoing.js is a place worth looking:

_http_outgoing.js:65:const HIGH_WATER_MARK = getDefaultHighWaterMark();

All I found here is that this simply sets the default (16 kB). I fiddled with the code a bit to be really sure, and found that an http.ClientRequest object in the context of got indeed uses the default of 16 kB (16384 bytes) for writableHighWaterMark.

From here I am pretty clueless. I am reasonably confident that my quick verification experiment has shown that there is a significant deviation between reality and what’s measured, but as of today with limited knowledge about NodejS inner workings I cannot explain this deviation. Can you? Is the TCP stack of my Linux system tricking me? Is there some additional buffering going on in the inner workings of libuv (the event loop underlying to NodeJS)? Let me know!

An interesting thing I noticed is that similar timing measurement work happens in the established HTTP client request. In their timing measurement code they however do not even attempt to determine the point in time when the HTTP request body has been written to the remote end. The timing measurement code was introduced with request/pull/2452, and there is some lovely discussion about the details. This mildly suggests that the problem is indeed ill-posed in NodeJS (and I’d still love to understand why!).

NPM: retry package download upon “429 Too Many Requests”

Sometimes the NPM package registry is a bit loaded and individual GET HTTP requests emitted by NPM would be responded to with 429 (Too Many Requests) HTTP responses.

It seems that by default NPM might actually try once or twice, but after all it errors out quickly with an error message like this:

npm ERR! 429 Too Many Requests - GET https://registry.npmjs.org/argparse/-/argparse-1.0.10.tgz

Especially in CI systems you want NPM to retry more often with an appropriate back-off, to increase the chance for successfully self-healing transient struggles.

To achieve that you can tune four retry-related configuration parameters documented here. For starters, I now simply set fetch-retries upon installing dependencies:

npm install . --fetch-retries 10

How to run minikube on Fedora (30)

Today I ran minikube on my notebook running Fedora 30. What follows is a list of steps to do that.

First, install KVM/libvirt and periphery in one go:

sudo dnf install @virtualization
sudo systemctl start libvirtd
sudo systemctl enable libvirtd

Do a bit of verification:

lsmod | grep kvm
kvm_intel             303104  0
kvm                   782336  1 kvm_intel
irqbypass              16384  1 kvm

Set up the most recent minikube release:

curl -Lo minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo mkdir -p /usr/local/bin/
sudo install minikube /usr/local/bin/

Start minikube using the kvm2 driver:

$ minikube start --vm-driver=kvm2
😄  minikube v1.7.1 on Fedora 30
✨  Using the kvm2 driver based on user configuration
💾  Downloading driver docker-machine-driver-kvm2:
    > docker-machine-driver-kvm2.sha256: 65 B / 65 B [-------] 100.00% ? p/s 0s
    > docker-machine-driver-kvm2: 13.82 MiB / 13.82 MiB  100.00% 1.19 MiB p/s 1
💿  Downloading VM boot image ...
    > minikube-v1.7.0.iso.sha256: 65 B / 65 B [--------------] 100.00% ? p/s 0s
    > minikube-v1.7.0.iso: 166.68 MiB / 166.68 MiB [-] 100.00% 9.21 MiB p/s 19s
🔥  Creating kvm2 VM (CPUs=2, Memory=2000MB, Disk=20000MB) ...
🐳  Preparing Kubernetes v1.17.2 on Docker '19.03.5' ...
💾  Downloading kubeadm v1.17.2
💾  Downloading kubelet v1.17.2
💾  Downloading kubectl v1.17.2
🚜  Pulling images ...
🚀  Launching Kubernetes ... 
🌟  Enabling addons: default-storageclass, storage-provisioner
⌛  Waiting for cluster to come online ...
🏄  Done! kubectl is now configured to use "minikube"
💡  For best results, install kubectl: https://kubernetes.io/docs/tasks/tools/install-kubectl/

Time lapse video of raw photos (ffmpeg)

I sometimes get back home with tens of gigabytes of image data distributed across thousands of photo files. This is when the actual work starts: sorting, labeling, selecting, agonizing over finding the best shots, and developing them.

This time I also wanted to make a “time lapse” video of all undeveloped photos I took. A nice little short film of all scenes I tried to take photos of. An attribution to all those photos that otherwise would end up in the photo graveyard — in the archive, never going to be looked at again.

For converting a series of JPEG image files into a properly encoded video the open-source program of choice is ffmpeg. Building up an appropriate ffmpeg command took a while. Here is the command I used:

ffmpeg.exe -f image2 -framerate 8 -i resized_renamed_2560/resized_%05d.jpg \
  -filter:v "scale='min(2560,iw)':'min(1440,ih)':force_original_aspect_ratio=decrease,pad=2560:1440:(ow-iw)/2:(oh-ih)/2" \
  -c:v libx265 -crf 28 -preset medium video-2560-h265-medium.mp4

The video filter syntax (the part after -filter:v) instructs ffmpeg to output a video with 2560 pixels width and 1440 pixels height, retaining the aspect ratio of the input image files. That last bit is important because in a large set of photos we usually find two different aspect ratios: we shoot most photos in landscape orientation, but others in portrait orientation. Putting both into the same video means that one of both types needs to be treated in a special way. The video is going to have the same aspect ratio as my landscape photos, so they are not treated in a special way. For portrait photos, though, the above command will retain their original aspect ratio by making the longest edge fit, by putting the image at the center of the video, and by filling the empty space at its left and right with black padding. Kudos to this and that.

The -framerate 8 is an arbitrary choice, it means that I want to have 8 of my input images per second of output video.

I chose the H265 settings based on reference examples here.

I ran ffmpeg on Windows (which is the platform I process my photos on, mainly with Lightroom).

Here is how I generated the JPEG files used as input for ffmpeg in the first place:

  • I first used a proprietary Sony program (Imaging Edge) for batch-converting my ~2500 raw (ARW) photos to JPEG files, with full resolution. I would have loved to use an open-source program for that. However, that is not recommendable. The goal here is to get reasonably good-looking JPEG images from the camera’s raw files despite not manually developing them. Automatically converting a camera’s raw image to a JPEG image already implies that the result looks crappy to a certain degree (contrasts, white balance, colors — all of that will just be flat, mediocre, a middle ground). In the spectrum between a really crappy outcome and a not-even-so-bad outcome the proprietary software simply does the best job.
  • I then used IrfanView for batch-processing the JPEG files from the previous step into JPEG files easily digestible by ffmpeg. Specifically, I down-scaled (longest side being 2560 pixels long, towards the desired video dimensions) and renamed (zero-filled, consecutive 5 digit counter in the file name). Corresponding to the resized_%05d.jpg in the ffmpeg command line shown above, the files were named like this:
    resized_00001.jpg
    resized_00002.jpg
    resized_00003.jpg
    resized_00004.jpg
    ...
    resized_02583.jpg