Category Archives: Web development

Songkick events for Google’s Knowledge Graph

Google can display upcoming concert events in the Knowledge Graph of musical artists (as announced in March 2014). This is a great feature, and probably many people in the field of music marketing and especially record labels aim to get this kind of data into the Knowledge Graph for their artists. However, Google does not magically find this data on its own. It needs to be informed, with a special kind of data structure (in the recently standardized JSON-LD format) contained within the artist’s website.

While of great interest to record labels, finding a proper technical solution to create and provide this data to Google still might be a challenge. I have prepared a web service that greatly simplifies the process of generating the required data structure. It pulls concert data from Songkick and translates them into the JSON-LD representation as required by Google. In the next section I explain the process by means of an example.

Web service usage example

The concert data of the band Milky Chance is published and maintained via Songkick, a service that many artists use. The following website shows — among others — all upcoming events of Milky Chance: My web service translates the data held by Songkick into the data structure that Google requires in order to make this concert data appear in their Knowledge Graph. This is the corresponding service URL that needs to be called to retrieve the data:

That URL is made of the base URL of the web service, the songkick ID of the artist (6395144 in this case), the artist name and the artist website URL. Try accessing named service URL in your browser. It currently yields this:

    "@context": "", 
    "@type": "MusicEvent", 
    "name": "Milky Chance", 
    "startDate": "2014-12-12", 
    "url": "", 
    "location": {
      "address": {
        "addressLocality": "Kiel", 
        "postalCode": "24116", 
        "streetAddress": "Eichhofstra\u00dfe 1", 
[ ... SNIP ~ 1000 lines of data ... ]
    "performer": {
      "sameAs": "", 
      "@type": "MusicGroup", 
      "name": "Milky Chance"

This piece of data needs to be included in the HTML source code of the artist website. Google then automatically finds this data and eventually displays the concert data in the Knowledge Graph (within a couple of days). That’s it — pretty simple, right? The good thing is that this method does not require layout changes to your website. This data can literally be included in any website, right now.

That is what happened in case of Milky Chance: some time ago, the data created by the web service was fed into the Milky Chance website. Consequently, their concert data is displayed in their Knowledge Graph. See for yourself: access and look out for upcoming events on the right hand side. Screenshot:


Google Knowledge Graph generated for Milky Chance. Note the upcoming events section: for this to appear, Google needs to find the event data in a special markup within the artist’s website.

So, in summary, when would you want to use this web service?

  • You have an interest in presenting the concert data of an artist in Google’s Knowledge Graph (you are record label or otherwise interested in improved marketing and user experience).
  • You have access to the artist website or know someone who has access.
  • The artist concert data already is present on Songkick or will be present in the future.

Then all you need is a specialized service URL, which you can generate with a small form I have prepared for you here:

Background: why Songkick?

Of course, the event data shown in the Knowledge Graph should be up to date and in sync with presentations of the same data in other places (bands usually display their concert data in many places: on Facebook, on their website, within third-party services, …). Fortunately, a lot of bands actually do manage this data in a central place (any other solution would be tedious). This central place/platform/service often is Songkick, because Songkick really made a nice job in providing people with what they need. My web service reflects recent changes made within Songkick.

Technical detail

The core of the web service is a piece of software that translates the data provided by Songkick into the JSON-LD data as required and specified by Google. The Songkick data is retrieved via Songkick’s JSON API (I applied for and got a Songkick API key). Large parts of this software deal with the unfortunate business of data format translation while handling certain edge cases.

The service is implemented in Python and hosted on Google App Engine. Its architecture is quite well thought-through (for instance, it uses memcache and asynchronous urlfetch wherever possible). It is ready to scale, so to say. Some technical highlights:

  • The web service enforces transport encryption (HTTPS).
  • Songkick back-end is queried via HTTPS only.
  • Songkick back-end is queried concurrently whenever possible.
  • Songkick responses are cached for several hours in order to reduce load on their service.
  • Responses of this web service are cached for several hours. These are served within milliseconds.

This is an overview of the data flow:

  1. Incoming request, specifying Songkick artist ID, artist name, and artist website.
  2. Using the Songkick API (SKA), all upcoming events are queried for this artist (one or more SKA requests, depending on number of events).
  3. For each event, the venue ID is extracted, if possible.
  4. All venues are queried for further details (this implicates as many SKA requests as venue IDs extracted).
  5. A JSON-LD representation of an event is constructed from a combination of
    • event data
    • venue data
    • user-given data (artist name and artist website)
  6. All event representations are combined and a returned.

Some notable points in this context:

  • A single request to this web service might implicate many requests to the Songkick API. This is why SKA responses are aggressively cached:
    • An example artist with 54 upcoming events requires 2 upcoming events API requests (two pages, cannot be requested concurrently) and requires roundabout 50 venue API requests (can be requested concurrently). Summed up, this implicates that my web service cannot respond earlier than three SKA round trip times take.
    • If none of the SKA responses has been cached before, the retrieval of about 2 + 50 SKA responses might easily take about 2 seconds.
    • This web services cannot be faster than SK delivers.
  • This web service applies graceful degradation when extracting data from Songkick (many special cases are handled, which is especially relevant for the venue address).

Generate your service URL

This blog post is just an introduction, and sheds some light on the implementation and decision-making. For general reference, I have prepared this document to get you started:

It contains a web form where you can enter the (currently) three input parameters required for using the service. It returns a service URL for you. This URL points to my application hosted on Google App Engine. Using this URL, the service returns the JSON data that is to be included in an artist’s website. That’s all, it’s really pretty simple.

So, please go ahead and use this tool. I’d love to retrieve some feedback. Closely look at the data it returns, and keep your eyes open for subtle bugs. If you see something weird, report it, please. I am very open for suggestions, and also interested in your questions regarding future plans, release cycle etc. Also, if you need support for (dynamically) including this kind of data in your artist’s website, feel free to contact me.

CSS: Crispy downscaled images

A quick note about modern CSS directives having huge impact on the display quality of downscaled images. The following picture is a screenshot of Firefox 34, displaying two roundish icons (mail and Facebook icons, PNG images) downscaled to 50 % of their original size:

Firefox 34, PNGs downscaled to 50 %, default rendering algorithm

Firefox 34, PNGs downscaled to 50 %, default rendering algorithm

The edges are all blurry, although the original PNG files are not blurry at all (believe me). This sucks.

The correct question at this point is: Why would we want to downscale these images instead of displaying them at their original size?

The answer: mobile device screens have a much higher pixel density than classical desktop screens. Consequently, compared to the desktop appearance, a bitmap image (as opposed to a vector graphic) embedded into a website either has to appear much smaller on mobile screens (relative to other elements on the website) or the mobile browser has to upscale the image. We do not want the former to happen, because this cripples the layout. The latter, however, is equally bad: upscaling requires adding information that was previously lost, it always makes the image worse than before. A solution is to give the browser a sufficient amount of pixels to not being forced to upscale the image in the mobile environment. This, however, requires the browser to downscale the image (at least in the desktop environment). Downscaling is fine in general, given the right downscaling algorithm.

A badly chosen downscaling algorithm may produce blurry edges where crystal clear edges were in the original version of the image. The default downscaling algorithm of current Firefox and Chrome versions produces such blurry edges, which is what is shown in the screenshot above. Such an algorithm might be advantageous for photo-like images, but for line art / icons other techniques fit better, optimized for retaining contrast and edges. The important insight at this point is: a browser can never be intelligent enough to automatically judge — based on the image data — which algorithm to use best. Fortunately, the type of algorithm can manually be specified using CSS, as described here. Using the recommendations given in linked article, the result looks much better:

Firefox 34, PNGs downscaled to 50 %, moz-crisp-edges rendering algorithm

Firefox 34, PNGs downscaled to 50 %, moz-crisp-edges rendering algorithm

As shown in the screenshot, the CSS code in use is:

    image-rendering: -moz-crisp-edges;
    image-rendering: -o-crisp-edges;
    image-rendering: -webkit-optimize-contrast;
    -ms-interpolation-mode: nearest-neighbor;

I have observed the same difference in Chrome 39. So, go ahead and use this or take it one step further and provide different image files for different devices using media queries. Internet Explorer 11 showed crispy images in both cases (its default rendering algorithm does not blur line art upon downscaling).

Sharing state in AngularJS: be aware of $watch issues and race conditions during app initialization

This article is about concise and precise communication of shared state updates from AngularJS services to AngularJS controllers. It warns about race conditions upon AngularJS application bootstrap, and points out advantages of $broadcast over $watch. The topics discussed in this article are supported by minimal working code examples. Finally, this article provides code that can hopefully serve as a best-practice snippet for your own application.

Note: this article has been written with AngularJS version 1.3.X in mind. Future versions of Angular, especially the announced version 2.0, might behave differently.

Introduction to the problem

I have worked with AngularJS for a couple of days now, designing an application that needs to interact with a web service. In this application, I use a small local database (basically a large JavaScript object) that is used by different views in different ways. From time to time, this database object requires to be updated by a remote resource. In the AngularJS ecosystem it seems obvious that such data should be part of an application-wide shared state object and that it needs to be managed by a central entity: an AngularJS service (remember: services in AngularJS can be considered as globally available entities, i.e. they are the perfect choice for communicating between controllers and for sharing state). The two main questions that came to my mind considering this scenario:

  1. How should I handle the automatic initial retrieval of remote data upon application startup?
  2. How should I communicate updates of this piece of shared data to controllers?

The answers to these questions must make sure that the following boundary conditions are fulfilled: controllers need to be informed about all state updates (including the initial one) independently of

  • the application startup time (which is defined by the computing power of the device and the application complexity) and independently of
  • the latency between request and response when querying the remote resource.

An obvious solution (with not-so-obvious issues)


Let us get right into code and discuss a possible solution, by means of a small working example. This is the HTML:

<!DOCTYPE html>
<html data-ng-app="testApp">
    <script data-require="angular.js@1.3.1" data-semver="1.3.1" src="//"></script>
  <body data-ng-controller="Ctrl">
    Please watch the JavaScript console.<br>
    <button ng-click="buttonclick(false)">updateState(constant)</button>
    <button ng-click="buttonclick(true)">updateState(random)</button>
    <script src="script.js"></script>

It includes the AngularJS framework and custom JavaScript code from script.js. The ng main module is called testApp and the body is subject to the ng controller called Ctrl. There are two buttons whose meaning is explained later.

The service ‘StateService’

So, what do we have in script.js? There is the obligatory line for defining the application’s main module:

var app = angular.module('testApp', []);

And there is the definition of a service for this application:

app.factory('StateService', ['$rootScope', '$timeout',
function($rootScope, $timeout) {
  console.log('StateService: startup.');
  var service = {state: {data: null}};
  service.updateState = function(rnd) {
    console.log("StateService: updateState(). Retrieving data...");
    $timeout(function() {
      console.log("StateService: data, assign it to");
      if (rnd) = Math.floor(Math.random()*1000);
      else = "constantpayload";
  // Update state automatically once upon service (app) startup.
  return service;

I have called it 'StateService' because this service should just be responsible for sharing state between controllers. The property is what simulates the shared data — this is what controllers are interested in! This piece of data is first initialized with null.

Subsequently, an updateState() method is defined. It simulates delayed retrieval of data from a remote resource via a timeout-controlled async call which eventually results in assignment of “new” data to This method can be called in two ways:

  • One way results in being set to a hard-coded string.
  • The other results being set to a random number.

The length of the delay after which the pseudo remote data comes in is set to about 1 second, as defined by var UPDATE_STATE_DELAY = 1000.

The service factory (that piece of code shown above) is automatically executed by AngularJS when loading the application. It is important to note that before the service factory returns the service object, service.updateState() is called. That is, when the application bootstraps and this service becomes initialized, it automatically performs one state update. This triggers “the automatic initial retrieval of remote data upon application startup” I talked about in the introduction.

Consequently, about 1 second after this service has been initialized, the object is updated with pseudo remote data. Subsequent calls to updateState() can only be triggered externally, as I will show later.

The controller ‘Ctrl’

StateService in place. So far, so good. This is how a controller can look which makes use of it:

app.controller('Ctrl', ['$scope', 'StateService',
function($scope, stateService) {
  function useStateData() {
    console.log("Ctrl: useStateData(): " +;
  function init() {
    console.log('Ctrl: init. Install watcher for');
      function() {return;},
      function(newValue, oldValue) {
        console.log("Ctrl: watcher: triggered.");
        if (newValue !== oldValue) {
          console.log("Ctrl: watcher: data changed, use data.");
          console.log("Ctrl: watcher: data did not change: " + oldValue);
  $scope.buttonclick = function(random) {
    console.log("Ctrl: Call stateService.updateState() due to button click.");

For being able to communicate state changes from the StateService to the controller, the service is injected into the controller as the stateService object. That just means: we can use this object within the code body of the controller to access service properties, including

In the controller, first of all, I define a dummy function called useStateData(). Its sole purpose is to simulate complex usage of the shared state data. In this case, if the function is called, the data is simply logged to the console.

Subsequently, an init() function is defined and called right after that (I could have put that code right into the body of the controller, but further below in the article the call to init() is wrapped with a timeout, and that is why I already separate it here).

Now we come to the essential part: In summary, the basic idea is to have a mechanism applied in the controller that automatically calls useStateData() after has changed.

For automatic communication of state changes from the service to the controller, AngularJS provides different mechanisms. In very simplistic scenarios we could just bind to any of the model properties in the controller’s scope and rely on Angular’s “automatic” two-way binding. However, in this article the goal is to discuss more complex scenarios where we need to take absolute control of the state update and where we want to react to a state update in a more general way, i.e. by calling a function in response to the update (here, this is useStateData()).

That is what Angular’s $scope.$watch() is good for. It gets (at least) two arguments. A “watcher function” is defined with the first argument. In this case here, this watcher function just returns the value of This watcher function is called in every Angular event loop iteration (upon each call to $digest()). If the value that it watches changes between two iterations, the listener function is called. The listener is defined by the second argument to $scope.$watch(). In our simple example here, the purpose of the listener function is to just use the data, i.e. to call useStateData().

The controller contains some additional code that gives a purpose to the two buttons included in the HTML shown before. One button calls updateState(true), triggering a state update in which the data is set to a random number. The other button calls updateState(false) where the data is set to a hard-coded string (a constant).

Fine, sounds good so far, the controller is ready to respond to state updates. But wait …

Three traps with $scope.$watch()

Run the example shown above via this plunk and watch your JavaScript console. This is the output right after (< 1 s) loading the application:

StateService: startup.
StateService: updateState(). Retrieving data...
Ctrl: init. Install watcher for
Ctrl: watcher: triggered.
Ctrl: watcher: data did not change: null

trap 1: $watch() listener requires case analysis

Let us go through things in order. First, the service is initialized and triggers updateState(), as planned. We expect a state update about 1 second after that. Next thing in the log is output emitted by the controller code: it installs the watcher via $scope.$watch(). Immediately after that the watcher already calls the listener function. The pseudo remote update still did not happen, so why is that function being called? This is explained in the Angular docs:

After a watcher is registered with the scope, the listener fn is called asynchronously to initialize the watcher. In rare cases, this is undesirable because the listener is called when the result of watchExpression didn’t change. To detect this scenario within the listener fn, you can compare the newVal and oldVal.

Wuah, what? I did not explain this before, but this is the reason why the listener function code shown above requires to have a case analysis. We need to manually compare the old value to the new value via

function(newValue, oldValue) {
  console.log("Ctrl: watcher: triggered.");
  if (newValue !== oldValue) {
    console.log("Ctrl: watcher: data changed, use data.");
    console.log("Ctrl: watcher: data did not change: " + oldValue);

If you prefer to simply rely on the trigger and forget to do the case analysis, you may already have a hard-to-debug issue in your code:

function() {
  console.log("Ctrl: watcher: triggered, use data.");
  // Wait, maybe that here was just called due to the watcher init, oops!

Okay, looking at the log output above again, indeed, the first invocation of the listener function resulted in “data did not change: null”. That is, newValue !== oldValue was false. I have not put timestamps into the log, but the following lines are the remaining output of the application (they appeared after about 1 second):

StateService: data, assign it to
Ctrl: watcher: triggered.
Ctrl: watcher: data changed, use data.
Ctrl: useStateData(): constantpayload

As expected, the StateService retrieves its pseudo remote data and re-assigns its object. The $timeout service triggers an Angular event loop iteration, i.e. the assignment is wrapped by an Angular-internal call to $digest(). Consequently, the change is observed by Angular and the listener function of the installed watcher gets called. This time, the (annoying) case analysis makes useStateData() being called. It prints the updated data.

Until here, we have found a way to communicate a state change from a service to a controller, via $watch(). Sounds great. However, this method involves potential false-positive calls to the listener function. To properly deal with this awkward situation, a case analysis is required within the very same. This case analysis is, in my opinion, either a mean trap if you forgot to implement it or unnecessarily bloated code. It simply should not be required.

trap 2: $watch() might swallow special updates

Let us proceed with the same minimal working example. The application is initialized. The state service retrieved its initial update from a pseudo remote source and notified the controller about this update. Now, you can go ahead and play with the button “updateState(random)” of the minimal working example. The console log should display something in these lines for each button click:

Ctrl: Call stateService.updateState() due to button click.
StateService: updateState(). Retrieving data...
StateService: data, assign it to
Ctrl: watcher: triggered.
Ctrl: watcher: data changed, use data.
Ctrl: useStateData(): 148

The chain is working: a button click results in a timeout being set. After about 1 second the data property of the state service gets assigned a new (random number) value. The watcher detects the change and immediately calls the listener function which, in turn, calls the useStateData() method of the controller.

Now, please press “updateState(constant)”, two times at least. What is happening? This is the log (after the second click):

Ctrl: Call stateService.updateState() due to button click.
StateService: updateState(). Retrieving data...
StateService: data, assign it to

The button click is logged. The StateService invokes its update function. After about 1 second, the string “constantpayload” is again assigned to the data property of the state object. As expected, so far. And….? The listener function in the controller does not get called. Never. Why? Because before the update and after the update the watched property, data, was pointing to the same object. In my code example, the same string object (created from one single string literal) is re-assigned to data upon every click on named button. That is, data‘s reference never changes. And, according to the AngularJS docs, the $watch()-internal comparison is done by reference (that is the default, at least). Hence, if I had written = new String("constantpayload");

in the stateService.updateState() function, the listener function would be triggered upon each click on discussed button, because a new string object would be created each time and data‘s reference would change.

Let us reflect. Just a minute ago, in the case discussed before, special $watch() behavior forced us to do a manual case analysis in the listener function in order to decide whether there was a real update or not. Now we found a situation in which we do not even get into the position to manually process an update event in the listener function, because Angular’s $watch() mechanism decided internally that this was not an update. Discussing whether not changing the value during an update can be considered an update or not is a philosophical question. Meaning: it should not be answered for you, this is too much of artificial intelligence. You might want to deal with this question yourself in your controller, e.g. for knowing when the last update occurred, even if the data did not change. If you have hard-coded objects in your application and combine these with $watch(), you might end up with rather complex code paths that you possibly did not expect to even exist. All of this is documented, but it is a trap.

Hence, my opinion is that this behavior of $watch() is too subtle to be considered for concise event transmission.

(At the same time, I appreciate that in many situations developers are not interested in propagating such updates that are no real updates, and are just fine with how $watch() behaves, be it by accident or by strategy).

trap 3: $watch() might seriously affect application performance

This one is really important for architectural decisions. Consider a scenario in which the shared state object is just a “container object” with a rather complex internal structure with many properties that can potentially change during an update. Then, as we have learned before, $watch() cannot simply detect changes in this object. The watched property always points to the container object, i.e. this reference does not change when the internals change. AngularJS provides two solutions to this: $watchCollection() and $watch() with the third argument (objectEquality) set to true. In both cases, the computational complexity of change detection depends on the complexity of the watched object. $watch(watcher, listener, true) performs a thorough analysis of the watched object, it “compares for object equality using angular.equals instead of comparing for reference equality.” The docs warn:

This therefore means that watching complex objects will have adverse memory and performance implications.

You can read more about the intrinsics of $watch() in the “Scope” part of the AngularJS developer guide. In fact, this analysis requires the container object to become deeply inspected for changes. This implicates saving a deep copy of the container object and a comparison of many values. This is costly on its own. But the important thing is: this is executed upon each $digest() round-trip of the framework. That is: often. And definitely upon each user interaction. Consequently, I would say that one should never watch complex objects in such fashion, because the associated complexity usually is not required. In a software project, the complexity of watched objects might grow from release to release, and developers might not be aware of the performance implications, especially in collaborative works. I find that the computational complexity for detecting an update and sending a notification about the very same should ideally never depend on the size of an object, it should just be O(1). Let’s face it: people use $watch() for getting notified, they might forget about its performance implications, and that is why $watch() should be O(1) or throw an error, in my opinion. But this questions the entire dirty-checking approach of Angular, so this is out of scope right now. Anyway, the associated complexity is hidden behind the scenes and will only become visible upon profiling. Just be aware of it.

In the beginning of the article I stated that I want to have a database-like object as part of the shared state. Clearly, a $watch()-based method for automatic change detection is not a good option, as of trap number 3. But also traps number 1 and 2 let me not like $watch() too much. You feel it, we work ourselves more and more into the direction of simple event broadcasts, and we will get to those further down in the article. But before getting there, let us discuss another crucial issue with the architecture shown so far: a race condition.

Race condition: initial remote resource query vs. application startup time

Upon application start, named little database needs to be populated with data from a remote resource. It makes sense to request this data from within the service initialization code, as shown above, via the automatic call to updateState(). Obviously, the point in time when the corresponding response arrives over the wire is not predictable. That is a racer. Let us name him racer A. We do not know how long it takes for him to arrive.

An AngularJS application starts up piece-wise. Various services, controllers and directives need to be initialized. The exact order and timing of actions depends on (at least)

  • the complexity of the application,
  • the order in which things are coded,
  • the way in which Angular is designed to bootstrap itself,
  • the computational performance of the device loading the application, and
  • the load on the device loading the application.

Hence, it is unpredictable at which point in time some controller code is executed which registers a watcher/listener for a certain event. Formally spoken, we can not predict how much time T passes between

  • initial code execution of the shared state service and
  • initial code execution of any given controller consuming this service.

That is racer B. Racer B needs the unknown time T to arrive. Clearly, racer A and B compete. And that is the race condition: depending on the outcome of the race, the status update event might be available before or after certain view controllers register corresponding event listeners.

The code shown so far assumes that T is small compared to the time required for the service to obtain its initial update from the remote resource: the first update event is expected to fly in after the watcher has been installed. Clearly, if that assumption is wrong, the first update event is simply missed by the controller.


I have prepared this plunk for demonstrating the race condition:

It contains the same code as shown before, with two small modifications: the pseudo remote resource delay is reduced to half a second, and the controller initialization is artificially delayed by one second. That is, is changed before the watcher is installed via $scope.$watch(). The controller does not automatically become notified about the initial state update.

This race condition and all discussed $watch-related traps are fixed/non-existing in the solution provided in the next section.

A better solution

$broadcast() / $emit() instead of $watch()

Many on-line resources discourage overusing $broadcast / $emit in AngularJS applications. While that may be good advice in principle, I want to use this opportunity to speak in favor of $broadcast. I think that in my described use case this technique is a perfect fit. Compared to the $watch-based solution discussed above, the simple $broadcast / $emit event semantics have clear advantages. Why is that? Because $broadcast allows for cleanly decoupling three processes:

  1. Construction/modification of the shared data.
  2. Update detection.
  3. Event transmission.

These three processes are inseparably intertwined when one uses $watch(). Having them decoupled provides flexibility. This flexibility can be translated into the following advantages:

  1. “Change detection” code is not executed upon each $digest() cycle. It needs to be explicitly invoked and can usually be derived from foreign triggers (such as an AJAX call done callback/promise).
  2. Event transmission is of constant complexity (O(1)). It will always be, even if the “watched object” changes.
  3. There is no artificial intelligence working behind the scenes that re-interprets what a data change might have meant. The situation becomes as simple as possible: one event has one meaning. If that is what is wanted, then the event becomes emitted. Event emission and event absorption both are under precise control of the developer.

I have therefore modified the architecture shown before:

  • After having retrieved data from the remote resource, the service now broadcasts the event state_updated through the $rootScope. This event gets emitted to all scopes, and is therefore visible to all controllers (although in our example there is only one controller).
  • The controller installs a listener for this event and simply calls useStateData() when the event flies in. No case analysis required — we know what this event means, its emission is under our precise control, and we react to it always in the same way.

This is the code:

var app = angular.module('testApp', []);
app.factory('StateService', ['$rootScope', '$timeout',
function($rootScope, $timeout) {
  console.log('StateService: startup.');
  var service = {state: {data: null}};
  service.updateState = function() {
    // Simulate data retrieval from a remote resource: data assignment (and
    // event broadcast) happens some time after service initialization.
    console.log("StateService: updateState(). Retrieving data...");
    $timeout(function() {
      console.log("StateService: data, broadcast state_updated"); = "payload";
  // Update state automatically once upon service (app) startup.
  return service;
app.controller('Ctrl', ['$scope', '$timeout', 'StateService',
function($scope, $timeout, stateService) {
  function useStateData() {
    console.log("Ctrl: useStateData(): " +;
  function init() {
    console.log('Ctrl: init. Install event handler for state_updated');
    // Install event handler, for being responsive to future state updates.
    // Handler is attached to local $scope, so it gets automatically destroyed
    // upon controller destruction.
    $scope.$on('state_updated', function () {
      console.log("Ctrl: state_updated event retrieved. Use data.");
    // If there have been state updates in the past (between application start
    // and controller initialization), handle the last one of those updates.
    if ( {
      console.log("Ctrl: init: there is some data already. Use it!");
  // Simulate longish app init time: delay execution of this controller init.  
  $timeout(function() {
  // Provide the user with a method to trigger updateState() via button click.
  $scope.buttonclick = function () {
    console.log("Ctrl: Call stateService.updateState() due to button click.");

$broadcast event handlers created in controllers and listening on $rootScope need to be destroyed manually if not needed anymore, otherwise they survive as long as the application lives, possibly resulting in a memory leak. This can be prevented by destroying such event listeners upon controller destruction. As noted in the code right above, this is not necessary when listening on the child scope: Controller destruction triggers destruction of its child scope, which itself triggers destruction of all event handlers. Great.

Strictly spoken, the complexity of calling $broadcast() depends on the number of child scopes existing in the application at the time of event emission. This number usually is not large at all and about constant. Using $emit(), event emission can be made a real O(1) operation. It notifies just the root scope and therefore does not require iterating through the child scopes. However, when doing so, one needs to inject the root scope into controllers, and attach event handlers to it. As stated before, such handlers should be manually removed upon controller destruction. This benchmark shows that for 100 child scopes, $emit() is significantly faster than $broadcast().

Race condition abandoned

The race condition discussed before got abandoned from the last code example, by simply calling useStateData() in the controller if is not nullright after installing the event handler. Why does this work and doesn’t this introduce even more subtle race conditions? Can’t this make useStateData() being called twice on the same data?

The main reason why that works is that we can make certain assumptions about the execution flow, as discussed in the following paragraph. Let us have a careful look at init() in the controller code:

  1.   function init() {
  2.     $scope.$on('state_updated', function () {
  3.       useStateData();
  4.     });
  5.     if ( {
  6.       useStateData();
  7.     }
  8.   }

The first action is that the event handler is installed. The essential insight is: the handler function will for sure not be invoked before init() returns. Why? JavaScript can be considered single-threaded (there is no simultaneous code execution, there is only one (virtual) execution thread). In fact, JavaScript functions are not re-entrant, they rather are atomic execution units. That is, once the execution flow enters init(), it does not leave it until init()‘s end is reached. There is simply no time slice for the registered event handler to be invoked before init() returns. That means: if there have been state updates in the past (before init() was invoked),

  • the event listener is installed after the last update event was emitted by the service,
  • is not null anymore when init() reaches line 5 (the developer needs to guarantee that no update ever resets that property to null) and, consequently,
  • useStateData() in line 6 becomes invoked.

Any (previous or future) foreign call to StateService.updateState() from elsewhere in the application results — at some point in time — in execution of this function (defined in the service code):

function() { = "payload";

This itself is an atomic execution unit where data modification and event emission are condensed within a single transaction (they do not go at all or they go together). As of the above considerations, this execution unit is not invoked before the end of the init() function is reached. Consequently, the code in init() guarantees that the two calls to useStateData() (lines 3 & 6) are always separated by an assignment (via the = operator) to

Best-practice MWE

The following piece of code is based on all considerations made above and cleaned from comments and console output. Play with it (run it using the “Preview” tab) and feel free to reuse it:

(Download plunk)


I hope to have shown to you that in certain cases a $watch()-based solution may result in undesired code behavior, and that using $broadcast()– or $emit()-based communication of state updates might yield simpler and yet more reliable code. Also, please remember that $watch() has the potential to produce a severe performance regression. In the last part of the article I pointed out that one should not accidentally make startup code depend on the difference between application loading time and remote resource query latency. This introduces race conditions which usually are difficult to reproduce and debug.

Thanks for reading, and of course I’d be happy to retrieve some feedback.

Discourse on Debian Wheezy via Docker


The makers of the StackExchange network have been working on Discourse for quite a while now. It is a modern communication platform, with the goal to be a nice mixture of classical internet forums, mailing lists, and established social media features:

Discourse is a from-scratch reboot, an attempt to reimagine what a modern, sustainable, fully open-source Internet discussion platform should be today – both from a technology standpoint and a sociology standpoint.

Technically, Discourse is based on various single components with Ruby on Rails at its core, thin as a leightweight web server, Redis for caching and job control, Sidekiq for micro job management, and PostgreSQL as a persistent database backend. Likewise, deploying Discourse right from the git repository can be a time-consuming task, not to mention the implications of maintaining a Discourse instance or running multiple Discourse instances on the same host. Discourse realized that the complicated deployment certainly prevents a couple of people from using it. Fortunately, the awesome “leightweight VM” container system Docker became production-ready in June 2014:

This release’s “1.0” label signifies a level of quality, feature completeness, backward compatibility and API stability to meet enterprise IT standards. In addition, to provide a full solution for using Docker in production we’re also delivering complete documentation, training programs, professional services, and enterprise support.

… and Discourse consequently decided to work on a Docker-based deployment system, which is the default by now, and even the only supported method (as you can see, Discourse evolves quite quickly). However, Docker is officially supported only on very modern Linux distributions, because Docker’s libcontainer as well as AuFS require kernel features that were only recently introduced. Consequently, Docker (and with that Discourse) is not supported on e.g. Debian 7 (Wheezy), which is the current “stable” Debian as sysadmins love it. But there is something we can do about it.

Although not officially supported or recommended, Docker and Discourse can be run on Debian 7 by backporting a more modern Linux kernel. Currently, Wheezy comes with a kernel from the 3.2 branch. The wheezy-backports (or Debian 8) kernel currently is from the 3.14 branch, which is new enough. Is using a kernel from backports safe? I can only tell that my system runs without quirks, and as you will find on the web, a couple of other people are also running successfully with a kernel from backports.

Hence, if you want to deploy Discourse on Wheezy, you can do it, and it will work perfectly fine. I will quickly go through the required steps in the next sections.

Using a kernel from backports

Follow the official instructions for using the backports repository:

# echo "deb wheezy-backports main" > \
# apt-get update

Get the newer kernel:

# apt-get -t wheezy-backports install linux-image-amd64 linux-headers-amd64


# reboot


$ uname -a
Linux gurke2 3.14-0.bpo.1-amd64 #1 SMP Debian 3.14.12-1~bpo70+1 (2014-07-13) x86_64 GNU/Linux

Install Docker

Following the official installation instructions for Ubuntu (which Debian is very close to):

# apt-get install apt-transport-https
# apt-key adv --keyserver hkp:// --recv-keys 36A1D7869245C8950F966E92D8576A8BA88D21E9
# sh -c "echo deb docker main \
   > /etc/apt/sources.list.d/docker.list"
# apt-get update
# apt-get install lxc-docker

Install Discourse ecosystem

Following the instructions from here and here:

# install -g docker -m 2775 -d /var/docker
# adduser discourseuser
# usermod -a -G docker discourseuser

Now, as user discourseuser:

$ git clone /var/docker

Configure Discourse app

As user discourseuser:

$ cd /var/docker
$ cp samples/standalone.yml containers/app.yml

Edit app.yml to your needs. In my case, I made the Discourse container’s web server and SSH server not listen on an external interface, by changing the expose section to:

  - ""
  - ""

That means that web and SSH servers of the container are reachable through localhost, but not from outside. This has important security advantages:

  • We can proxy the HTTP traffic of the Discourse web application through a web server running on the host (e.g. nginx), and encrypt the connection (the Discourse Docker container does not have TLS support built-in, duh).
  • Why would you want to expose the Discourse container via SSH to the internet, anyway? That’s a severe mistake, in my opinion, and you should disable that unless you have a very good reason not to.

I have set DISCOURSE_SMTP_ADDRESS:, as is the IP address of the host in the private virtual Docker network in my case. On my Debian host, I am running an Exim4 MTA, which is configured with dc_local_interfaces='' and dc_relay_nets='', i.e. it listens on the local Docker network and relays mails incoming from that network. That way, the Discourse instance running within a Docker container can send mail through the Exim MTA running on the Debian host. I have described that in more detail in another blog post.

Each change of the configuration file app.yml requires a re-build via ./launcher rebuild app. A re-build implicates stopping, destructing, bootstrapping, and launching the container. What I have learned just recently: don’t worry too much about the word “destruction”. It does not implicate data loss. You can re-build your Discourse container at any time. Persistent data is stored in the PostgreSQL database and in the shared volume, both of which are not affected by a re-build.

Hope that helps!

Discourse Docker container: send mail through Exim


The Discourse deployment was greatly simplified by introducing Docker support (as I have written about before). Discourse heavily depends on e-mail, and its ability to send mail to arbitrary recipients is essential. While the recommended way is to use an external service like Mandrill, it is also possible to use a local MTA, such as Exim. However, when you set up the vanilla Discourse Docker container, it does not contain an pre-configured MTA, which is fine, since many have a well-configured MTA running on the host already. The question is how to use that MTA for letting Discourse send mail.

Usually, MTAs on smaller machines are configured to listen on localhost only, to not be exposed to the Internet and to not be mis-used for spam. localhost on the host itself, however, is different from localhost within a Docker container. The network within the container is a virtual one, and it is cleanly separated from the host. That is, when Discourse running in a container tries to reach an SMTP server on localhost, it cannot reach an MTA listening on localhost outside of the container. There is a straight-forward solution: Docker comes along with a network bridge. In fact, it provides a private network (in the 172.17.x.x range) that connects single containers with the host. This network can be used for establishing connectivity between a network application within a Docker container and the host.

Exim’s network configuration

Likewise, I have set up Exim4 on the Debian host for relaying mails that are incoming from localhost or from the local virtual Docker network. First I looked up the IP address of the docker bridge on the host, being in my case (got that from /sbin/ifconfig). I then instructed Exim to treat this as local interface and listen on it. Also, Exim was explicitly told to relay mail incoming from the subnet, otherwise it would reject incoming mails from that network. These are the relevant keys in /etc/exim4/update-exim4.conf.conf:


The config update is in place after calling update-exim4.conf and restarting Exim via service exim4 restart.

Testing SMTP access from within container

I tested if Exim’s SMTP server can be reached from within the container. I used the bare-bones SMTP implementation of Python’s smtplib for that. First of all, I SSHd into the container by calling launcher ssh app. I then called python. The following Python session demonstrates how I attempted to establish an SMTP connection right to the host via its IP address in Docker’s private network:

>>> import smtplib
>>> server = smtplib.SMTP('')
>>> server.set_debuglevel(1)
>>> server.sendmail("", "", "test")
send: 'ehlo []\r\n'
reply: '250-localhost Hello [] []\r\n'
reply: '250-SIZE 52428800\r\n'
reply: '250-8BITMIME\r\n'
reply: '250-PIPELINING\r\n'
reply: '250 HELP\r\n'
reply: retcode (250); Msg: localhost Hello [] []
SIZE 52428800
send: 'mail FROM:<> size=4\r\n'
reply: '250 OK\r\n'
reply: retcode (250); Msg: OK
send: 'rcpt TO:<>\r\n'
reply: '250 Accepted\r\n'
reply: retcode (250); Msg: Accepted
send: 'data\r\n'
reply: '354 Enter message, ending with "." on a line by itself\r\n'
reply: retcode (354); Msg: Enter message, ending with "." on a line by itself
data: (354, 'Enter message, ending with "." on a line by itself')
send: 'test\r\n.\r\n'
reply: '250 OK id=1X9bpF-0000st-Od\r\n'
reply: retcode (250); Msg: OK id=1X9bpF-0000st-Od
data: (250, 'OK id=1X9bpF-0000st-Od')

Indeed, the mail arrived at my Google Mail account. This test shows that the Exim4 server running on the host is reachable via SMTP from within the Discourse Docker instance. Until I got the configuration right, I observed essentially two different classes of errors:

  • socket.error: [Errno 111] Connection refused in case there is no proper network routing or connectivity established.
  • smtplib.SMTPRecipientsRefused: {'': (550, 'relay not permitted')} in case the Exim4 SMTP server is reachable, but rejecting your mail (for this to solve I had to add the dc_relay_nets='' to the config shown above).

Obviously, in order to make Discourse use that SMTP server, it needs to be configured with DISCOURSE_SMTP_ADDRESS being set to the IP address of the host in the Docker network, i.e. in my case.

Hope that helps!