Category Archives: NodeJS

Robust child process management in NodeJS?

Robust child process management in NodeJS? Not so easy :-).

For example, calling child_process.spawn() with a bad path does not throw an error right away. Although that error (ENOENT in this case, on Linux) is indeed known to the runtime in a synchronous fashion (NodeJS uses uv_spawn() for invoking the new process and that reports errors synchronously to its caller).

Even when child_process.spawn() fails to create a process it will return a process handle. Which you can try to kill(). And that would not throw an error right away.

To discuss this topic I submitted a while back (as of which we improved the documentation for kill()) and

Some more aspects and links, in no particular order:

A successful (but I think a little too involved) recipe for handling startup errors might be this:

  1. spawn()
  2. attach error handler, and upon error store the error state in a magic variable.
  3. implement and start a “success criterion watcher” (may even be based on polling) and in that logic consider the magic variable (for example, if it is known to be an error object, throw it).




JavaScript: use the finally clause in generators with caution

This is about the guarantees provided by the finally clause, and therefore about its suitability for critical resource cleanup tasks, especially after unexpected failures.

In 2013, Axel Rauschmeyer (love his material about JS on the web!) wrote

The finally clause is always executed, no matter what happens inside the try clause (return, exception, break, normal exit).

Mozilla MDN web docs say

Statements that are executed after the try statement completes. These statements execute regardless of whether an exception was thrown or caught.

Sounds like that’s a guarantee.

In Python, we love using the finally clause precisely because of its guarantees. The code you put in there is run, and that guarantee can be translated into an incredibly powerful and easy-to-reason-about technique for building solid resource cleanup and shutdown procedures. This is also true when used from within generators, as I have done for example here.

I am still new to the JavaScript ecosystem, and the question whether I can rely on the same level of guarantee in NodeJS/JavaScript came up. I found the following discussion to be incredibly insightful:

Andreas Rosberg said about

function getResults*() {
     try {
         var resource = acquire();
         for(const item of resource) yield process(item);
     } finally {

that it is a

“bogus form of resource management”. This is an anti-pattern in ES6. It won’t work correctly. We should never have given the illusion that it does.

He added that

Python’s idea is just confused and crazy.

Benjamin Gruenbaum (@benjamingr) replied and was kind of representing my perspective in this discussion, saying that

try/finally is commonly used for resource management and garbage collection is a form of automatic resource management. There is also the assumption that finally is always run. Two other languages have opted into the “run finally” behavior so I wouldn’t call it crazy

and that

I’m very curious why languages like Python have chosen to call finally blocks in this case – this was not a hindsight and according to the PEP. They debated it and explicitly decided to call release.

The fundamental difference between the approaches might be as of garbage collection, seemingly, as Mark S miller says:

JavaScript GC is not specified to be precise, and so should be assumed conservative. […] C++ RAII and Python refcounting are completely different: they are precise, prompt, predictable, and deterministic.

but on the other hand Andreas Rossberg said

Try-finally has nothing to do with GC, it’s just control flow

I don’t think the discussion was concluded in a satisfying way, but I nevertheless enjoyed it very much (a big thank you to the protagonists in this discussion!).

My big takeaway was that in JavaScript one should not generally rely on the finally clause in generators, but instead one needs to carefully look at how exactly it is being used in every special case, and whether or not one should rely on its execution! That makes things difficult. Have I said that I like simplicity? Makes me a little sad, but good to know!

After all I also found in this explicit remark in a nice Mozilla blog post about ES6 generators (emphasis mine):

Note that the .return() is not called automatically by the language in all contexts, only in cases where the language uses the iteration protocol. So it is possible for a generator to be garbage collected without ever running its finally block.

NodeJS http.ClientRequest “finished” event: has the request body been flushed out?

The popular got HTTP client from the NodeJS ecosystem uses the http-timer package which itself uses the “finish” event on a http.ClientRequest to determine the point in time when the HTTP request body has been written to the remote end (has been “uploaded”). Code.

This point in time is then used as the reference for measuring the time it takes for the HTTP server on the other end to generate a response: often called “time to first byte” (TTFB), this client-side metric measures the duration between sending out the last byte of the request to receiving the first byte of the response. TTFB is often used as a server performance metric, indicating the time it took the server to process the request.

My suspicion was that the TTFB numbers I saw in my scenario were pretty off.

I have then made a quick verification experiment in which I send an HTTP request with about 10 MB body size to an HTTP server under my control. I have confirmed the TCP upload to take roughly 30 seconds (reproducibly, through my slowish but stable Internet connection), and I have confirmed the HTTP server to immediately send a response once it has consumed the request body. For all means and purposes of this quick sanity check the actual request upload duration therefore is ~30 seconds, and the actual TTFB is practically zero.

What did http-timer measure? In one attempt, the “finish” event on the http.ClientRequest fired after about 17 seconds, resulting in an alleged TTFB of about 13 seconds. Repetitions yielded 20 s / 10 s, 15 s / 15 s, and more samples in the same ball park. That is, the method seems to significantly underestimate the request upload duration (it determines the point in time when the HTTP request body has been written to the remote end as too early), resulting in an overestimated TTFB (many seconds instead of ~zero seconds).

I suspected that the presence of some big buffer(s) architecturally skews the numbers. So I did this verification experiment with small TCP write buffers on my host operating system:

$ echo 'net.ipv4.tcp_wmem = 4096 16384 65536' >> /etc/sysctl.conf
$ echo 'net.core.wmem_max=65536' >> /etc/sysctl.conf
$ sysctl -p
$ cat /proc/sys/net/core/wmem_max
$ cat /proc/sys/net/ipv4/tcp_wmem
4096    16384   65536

From here I thought that most probably there is some kind of request buffering going on within the got/NodeJS system; the internals of which system are largely unknown to me.

At least, I believe that a big buffer between the client code and the host’s TCP stack would make the problem of “determining the point in time when the HTTP request body has been written to the remote end” kind of an ill-posed problem, explaining the discrepancy between the actual timings and the measured timings.

In NodeJS, a http.ClientRequest is a Writable stream. it has the concept of a highWaterMark (which I understand to be a buffer capacity, roughly) which by default seems to be set to 16 kB, and it can be configured upon stream construction.

In the NodeJS standard library I have found that the highWaterMark is not explicitly set in the routine which largely implements http.ClientRequest construction. Is it set somewhere else? It does not seem to be set in the got code base either. grep has revealed that _http_outgoing.js is a place worth looking:

_http_outgoing.js:65:const HIGH_WATER_MARK = getDefaultHighWaterMark();

All I found here is that this simply sets the default (16 kB). I fiddled with the code a bit to be really sure, and found that an http.ClientRequest object in the context of got indeed uses the default of 16 kB (16384 bytes) for writableHighWaterMark.

From here I am pretty clueless. I am reasonably confident that my quick verification experiment has shown that there is a significant deviation between reality and what’s measured, but as of today with limited knowledge about NodejS inner workings I cannot explain this deviation. Can you? Is the TCP stack of my Linux system tricking me? Is there some additional buffering going on in the inner workings of libuv (the event loop underlying to NodeJS)? Let me know!

An interesting thing I noticed is that similar timing measurement work happens in the established HTTP client request. In their timing measurement code they however do not even attempt to determine the point in time when the HTTP request body has been written to the remote end. The timing measurement code was introduced with request/pull/2452, and there is some lovely discussion about the details. This mildly suggests that the problem is indeed ill-posed in NodeJS (and I’d still love to understand why!).

NPM: retry package download upon “429 Too Many Requests”

Sometimes the NPM package registry is a bit loaded and individual GET HTTP requests emitted by NPM would be responded to with 429 (Too Many Requests) HTTP responses.

It seems that by default NPM might actually try once or twice, but after all it errors out quickly with an error message like this:

npm ERR! 429 Too Many Requests - GET

Especially in CI systems you want NPM to retry more often with an appropriate back-off, to increase the chance for successfully self-healing transient struggles.

To achieve that you can tune four retry-related configuration parameters documented here. For starters, I now simply set fetch-retries upon installing dependencies:

npm install . --fetch-retries 10