Setting up quotas on a local linux file system

This is a short guide for setting up disk quotas on a local Linux file system. In this example, I am using Debian Wheezy and configure a quota for the root partition (ext4 in this case) affecting a single user. I try to note down important information that I missed in other references when searching the web.

1) Modify partition entry in /etc/fstab

Add the following options for journaled quotas to the partition of interest in /etc/fstab:

usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0

For me, the entire line looks like this:

UUID=e5aff151-ddf7-4d43-a318-7e97afcfd78e / ext4 errors=remount-ro,usrjquota=aquota.user,grpjquota=aquota.group,jqfmt=vfsv0 0 1

This activates quotas by user and/or group ID using the “vfsv0″ quota format (using 32 bit for UIDs/GIDs, 64 bit for space usage, and 32 bit for inode usage in this case).

2) Remount the partition (no reboot required):

# mount -o remount /


3) Install packages:

# apt-get install quota quotatool

This activates quotas during setup:

Setting up quota (4.00-4+deb7u1) ...
[ ok ] Checking quotas...done.
[ ok ] Turning on quotas...done.


4) Setting an actual quota: preparations

There are multiple ways to actually set a quota. Currently, quotatools in Wheezy is affected by a bug. I decided to use the canonical setquota command with the signature

setquota [ -rm ] [ -u | -g ] [ -F quotaformat ] name block-softlimit block-hardlimit inode-softlimit inode-hardlimit -a | filesystem

Setting a disk usage limit in common units such as MB or GB requires knowledge about the block size of the file system. Given the device file of the disk containing the partition in question, the block size can be found out with dumpe2fs:

# dumpe2fs /dev/vda2 | head -n 50 | grep "Block size"
dumpe2fs 1.42.5 (29-Jul-2012)
Block size:               4096

With a simple calculation, a desired disk usage limit in gigabytes can be translated to the number of blocks:

# python -c "print 2 *1024**3 / 4096"
524288

This means that 524288 blocks on the partition in question correspond to 2 GB of disk usage.

When setting a quota, we also need to define a maximum number of inodes (Basically counting the number of file and directory entries). If disk usage should be the limiting factor, then set this to a (moderately) large number, e.g. 1000000.


5) Setting a quota for a specific user

According to the setquota command signature above, execute

setquota -u -F vfsv0 bob 524288 524288 1000000 1000000 /

in order to

  • affect the root file system (/)
  • limit disk usage for the user bob to 2 GB (as calculated above)
  • limit the number of inodes for the user bob to 1000000
  • not distinguish between hard and soft limits
  • use the vfsv0 quota format (as specified in the mount options)


6) Verify whether quotas are activated

quotaon filesystem should print the following error if quotas are already in effect:

# quotaon /
quotaon: using //aquota.group on /dev/disk/by-uuid/e5aff151-ddf7-4d43-a318-7e97afcfd78e [/]: Device or resource busy
quotaon: using //aquota.user on /dev/disk/by-uuid/e5aff151-ddf7-4d43-a318-7e97afcfd78e [/]: Device or resource busy


7) Check quota report

# repquota  /
*** Report for user quotas on device /dev/disk/by-uuid/e5aff151-ddf7-4d43-a318-7e97afcfd78e
Block grace time: 7days; Inode grace time: 7days
                        Block limits                File limits
User            used    soft    hard  grace    used  soft  hard  grace
----------------------------------------------------------------------
[...]
bob  --  273264 524288 524288          13976 1000000 1000000

The output is as expected, the limits are in effect. Currently, bob is using 273264 of 524288 blocks and 13976 of 1000000 inodes.

order of arguments for GNU find: be careful where to specify actions such as -print0.

Today, I used a find | ls | awk combination for summing up sizes of files ending with a certain suffix:

$ find . -print0 -name "*.rst" -type f | xargs -0 /bin/ls -l | awk '{t += $5} END {print t, "bytes."}'
1612918975712 bytes.

I knew that this number was too large and finally got it “right”:

$ find . -name "*.rst" -type f -print0 | xargs -0 /bin/ls -l | awk '{t += $5} END {print t, "bytes."}'
5789476750 bytes.

The only difference between the two commands is the position of the -print0 argument to the find command. One could think of -print0 being just an option determining the output format. In this case, it would be pretty counter-intuitive that its relative position to other arguments should matter at all. However, find is quite a complex application and behaves different from many other programs with respect to the command line interface. So, does the observed behavior make sense? Why does it matter where -print0 is specified? This is (more or less implicitly) explained on the man page. Below, I try to explain it systematically.

How does find evaluate the command line arguments? The main scheme is the following:

     find [search_path1 [search_path2] ...] [expression]

Hence, the arguments comprise a search path or multiple search paths and the expression. In case of find . -print0 -name "*.rst" -type f, the search path is . and the expression is -print0 -name "*.rst" -type f. Important facts to know about the expression:

  • The expression consists of one or more so-called primaries, each of which is a separate command line argument to find. find evaluates the expression each time it processes an item (file/directory).
  • The expression itself may consist of four types of primaries: options, tests, actions, and binary operators connecting them. The default operator is the logical AND.
  • For each item in the search path, the entire expression is evaluated from left to right, step by step, i.e. primary by primary.
  • A test, an option, or an action either return True or False upon evaluation.
  • In the moment it is clear that the entire expression evaluates to False, the evaluation of the expression is aborted (this concept is called short-circuit evaluation). In this case, various primaries may not have been executed. find continues with the next item in the search path.
  • While the expression is evaluated step by step from left to right, actions are performed right away. Actions have side-effects. Consequently, these side-effects become visible while the expression is evaluated, even if the evaluation is aborted in the next step.
  • The default action, when no other action is specified, is -print. It prints the current item to stdout (newline-terminated). It is executed when the expression is entirely evaluated and has returned True.
  • Tests are the natural way to filter files/directories, i.e. to abort the evaluation of the expression before the default action (normal print) is performed.

The magic behind the observation made above is that -print0 is an action, not an option. The “side-effect” of this action is the file path being printed to stdout (NULL-char terminated). When specified as the first primary in the entire expression, it becomes executed for each item in the search path. The subsequent filter tests become needless. That’s why the reported file size sum was higher than expected in the first case.

Btw, an alternative way for summing up the file sizes based on du (in --apparent-size --block-size=1 mode) would be:

find . -name "*.rst" -type f -print0 | du -b --files0-from=- | awk '{t += $1} END {print "Total:", t, "bytes."}'

A few awesome tools: chartjs, droptask, dbpatterns

Just found out about these three convenient tools:

  • A JavaScript library for creating good-looking plots/graphs with the HTML5 canvas element: chartjs.org
  • Free visual task management: droptask.com
  • Visualize, create, and share database models on the web: dbpatterns.com

Worth sharing, isn’t it?

Geschwindigkeit und Platz: SSD und klassische Festplatte im Notebook per HDD Caddy

In diesem Artikel beschreibe ich, wie ich in meinem Lenovo ThinkPad L430 das optische Laufwerk durch ein “HDD Caddy” ersetzt habe, um statt HDD & optischem Laufwerk nun HDD & SSD im Notebook zu betreiben.
  »» Continue reading »»

Save some CSS traffic: round percentages with LESS

It just occurred to me that the bootstrap-responsive.css from Twitter Bootstrap 2.3 contains several percentage numbers with unnecessarily high precision, such as

width:31.570740134569924%

By rounding all percentages in bootstrap-responsive.min.css to at maximum two digits after the decimal point, the size of the file is reduced by 11 %.

  »» Continue reading »»

Python: generic lazy lambda function via argument expansion

Without doubt, the *-operator for argument gathering/scattering/expansion/unpacking/splat in Python is a very useful tool (as in any other language). Today, I used this language feature in a rather dirty context: combined with the lambda keyword, we can use it for creating an anonymous function accepting any combination of arguments:

>>> noop = lambda *a, **b: None

*a collects positional arguments in variable a, **b collects keyword arguments in b. The function does not use any of the arguments. It always returns None.

noop() can now be called in any way:

>>> print noop(1, 2, 3, arg=1, peter="wurzel")
None
>>> print noop()
None

This can be useful for disabling a certain functionality during run-time of a Python program (aka monkey-patching), as in e.g.:

>>> import multiprocessing.forking
>>> multiprocessing.forking.Popen.poll = lambda *a, **b: None

In many situations, I agree,

def noop(*a, **b):
    pass

might be cleaner.



Concurrent connections to Redis with gevent and redis-py

Redis is a powerful, lightning-fast key-value store. Gevent (1.0) is an event-driven concurrency framework for Python (based on greenlet and libev) allowing for highly concurrent networking and tackling the C10K problem. Redis-py, the well-established Redis client for Python, makes Python talk to Redis. The communication takes place through sockets and is request-response-based. A typical Redis-based Python application therefore is I/O-bound rather than CPU-bound. Furthermore, various features of Redis implicate that a request is not immediately followed by a response, making certain requests block the calling thread for an arbitrary amount of time. This is where gevent comes into play: it allows for concurrent execution of these blocking requests within coroutines. In this blog post, I am presenting a short code example integrating gevent with redis-py.
  »» Continue reading »»

Email through custom domain without setting up your own mail server: the MX record is the key (and services like Zoho)

Setting up an email stack on a Linux box is a challenge. While it is quite possible to get things running, it requires a tremendous amount of care and overview to harden the system from the security point of view and to maintain it. I spent the last days configuring and re-configuring and re-re-configuring postfix, dovecot, squirrelmail, and the whole MySQL-databases-unix-user-permissions tail. I believe I managed to do this properly (at this point I would like to acknowledge this particularly well-written and complete article). But do I still understand the configuration in half a year? Would I be able to maintain it properly? At the moment, I don’t think so. That’s why I do not like having this mail stack on my Linux box. It frightens me. For very simple cases like mine, there is a more elegant solution: making an external service provider manage the mail sent to the custom domain.
  »» Continue reading »»

WordPress deployment: super simple and super fast with nginx caching

WordPress (+plugins) is not exactly the most resource-efficient content management system. In order to go easy on CPU and memory, to guarantee short website load times, and to be able to stand up to page requests with high frequency, WordPress must be served from a cache rather than answering each request via PHP and from the database. In this post, you can find my nginx config for deploying PHP-FPM-based WordPress behind nginx whereas most of the requests are served from the nginx FastCGI cache. No changes to WordPress and no WordPress plugins are required.
  »» Continue reading »»

Change password for MySQL user

Changing the password of an (unprivileged) MySQL user is very simple in principal, but most of the methods I have seen in blog posts etc. are not simple and complete at the same time. This is what I generally do:
  »» Continue reading »»

Solid State Drive story.

Are consumer solid state drives ready to be used in production environments? The newest generations possibly are. During the last years, however, in my opinion they clearly were not. Let me share my SSD horror story, beginning in 2009.
  »» Continue reading »»

Does my notebook support dual-link DVI?

These days I tried to drive a Samsung S27A850D with WQHD resolution (2560×1440) on a Clevo M860TU with a built-in NVIDIA Geforce 9600M GT.
  »» Continue reading »»

Python WebSocket server powered by gevent and ws4py

I am following the emerging WebSocket standard with a lot of interest. Today, I would like to update my recommendation of tools presented in the article “The best and simplest tools to create a basic WebSocket application with Flash fallback and Python on the server side”. ws4py (WebSocket for Python) by Sylvain Hellegouarch is worth spreading the word.
  »» Continue reading »»

LaTeX: jump/skip to the bottom of a page

I could not remember the command to “jump” to the bottom of a page in LaTeX.
  »» Continue reading »»

The best and simplest tools to create a basic WebSocket application with Flash fallback and Python on the server side

Currently, I am playing around with WebSockets. This is due to an application idea I have in my mind which requires a bidirectional connection between browser and server with low latency. The communication will happen in a stream-like fashion at low bandwidth. Real network sockets using TCP/UDP are often the desired optimum for things like that, but within a browser they can only be provided by Java or Flash plugins. The future belongs to WebSockets. Implemented directly in the browser they are providing a much lower level network connection between the browser and the server than HTTP — without any plugin. WebSockets still work on a layer above TCP, but low latency and efficient data transport in both directions is warranted in a TCP-like fashion. Therefore, real-time application developers are very keen to use WebSockets. As this still is a young development, there is only few browser support and a lot of non-mature client/server libraries. Most importantly, there is a huge lack of documentation how to use these.

In this blog post, I present a very simple “echo application” where the user sends a message from his browser to the server — which in turn sends this message back to the client. Simple so far, but the main focus while realizing this is on selecting the right tools for the task, as there is already a lot of stuff out there — some very good, undocumented and hidden things, some totally overloaded things, and some bad things. I tried to fulfill the following conditions:

  • Make use of a “Flashbridge” to realize a fallback when WebSockets are not available in a browser (which is true for Firefox at the moment)
  • use Python on the server side
  • use the best / most solid tools available
  • at the same time, use the simplest tools available that do not bring along loads of stuff that you do not need for simple applications or if you want to desing your own communication protocol anyway.


  »» Continue reading »»