Filter web server logs for missing file errors (a grep, sort, uniq example)

I had the suspicion that some (image) files belonging to some pages of my web presence might not be in the proper place anymore. For systematically finding these cases, I had a look into my nginx error logs and saw various missing file errors such as

2014/01/23 09:18:18 [error] 22000#0: *3901754 open() "/XXX/apple-touch-icon.png" failed (2: No such file or directory), client: 173.245.53.224, server: gehrcke.de, request: "GET /apple-touch-icon.png HTTP/1.1", host: "gehrcke.de"

The above is perfectly valid (Apple devices by default poll for apple-touch-icon* files). For finding real problems, I then wanted to filter all missing file errors and sort them by frequency. This one-liner helps:

cat nginx_error.log | grep -Eo 'open\(\) "/.+" failed' | sort | uniq --count | sort -nk 1 | less

At the tail of the output I found that I am really missing two important image files on the server that belong to one blog post:

2681 open() "/XXX/wp/blog_content/websocket_test_empty.png" failed
2688 open() "/XXX/wp/blog_content/websocket_test.png" failed

The above one-liner works in the following way:

  • First, cat writes the nginx error log to stdout.
  • Then, grep reads these lines from stdin, processes line by line, looking for a pattern interpreted in extended regex mode (-E option). It writes lines containing the pattern to stdout, whereas it actually only writes that part of the line corresponding to the matched pattern and not the entire line (-o option).
  • sort brings these lines into order, i.e. repetitive occurrences (duplicate lines) become adjacent to each other.
  • uniq --count merges repetitive occurrences into one single occurrence and adds the number of occurrences to the beginning of the merged line.
  • sort -nk 1 sorts these merged lines by the first whitespace-separated field (-k 1 option) in numerical mode (-n option), in ascending order.
  • The final less visualizes the outcome. Go to the end of the output and you find the most frequent missing file errors.