Category Archives: Web development

Structured data for Google: how to add the ‘updated’ hentry field

This is a WordPress-specific post. I am using a modified TwentyTwelve theme and Google Webmaster Tools report missing structured data for all of my posts:

Missing "updated" field in the microformats hatom markup

Missing “updated” field in the microformats hatom markup.

In particular, it is the updated hentry field that seems to be missing. TwentyTwelve, like many themes, uses the microformats approach to communicate structured data to Google (also to others, it’s just that Google is a popular and important consumer of this data). How to correctly present date/time information to Google? A quote from their microdata docs:

To specify dates and times unambiguously, use the time element with the datetime attribute. […] The value in the datetime attribute is specified using the ISO date format.

And a quote from their microformats docs:

In general, microformats use the class attribute in HTML tags

It appears that we can combine the approaches. Might be dirty, but it works. So, what you might want to have in the HTML source of your blog post looks like this:

<time class="updated" datetime="2015-02-28T18:09:49+00:00" pubdate>
February 28, 2015
</time>

The value of the datetime attribute is in ISO 8601 format and not shown to the user. It should contain the point in time the article/blog post was last modified (updated). It is parsed by Google as the updated property, because of the class="updated" attribute. The string content of the time tag is what is displayed to your users (February 28, 2015 in this case). There, you usually want to display the point in time when the article was first published.

So, how do you get this into the HTML source code of all of your blog posts? A simple solution is to create a custom “byline” (that is what the author and date information string is often called in the context of WordPress themes), for instance with a PHP function like this:

function modbyline() {
    $datecreated = esc_html(get_the_date());
    $author = esc_html(get_the_author());
    $datemodifiedISO = esc_html(get_the_modified_time("c"));
    echo '<div class="bylinemod"><time class="entry-date updated" datetime="'.$datemodifiedISO.'" pubdate>'.$datecreated.'</time> &mdash; by '.$author.'</div>';
}

This creates HTML code for a custom byline, in my case rendered like so:

<div class="bylinemod">
    <time class="entry-date updated" datetime="2015-02-28T18:09:49+00:00" pubdate>
        February 28, 2015
    </time>
    &mdash; by Jan-Philip Gehrcke
</div>

The user-visible date is the article publication date, and the machine-readable datetime attribute encodes the modification time of the article. Note that WordPress’ get_the_modified_time() by default returns a date string with a human-readable default format. In order to make it machine-readable by ISO 8601 standard, you need to provide it the "c" format specifier argument (I have done this in the function above).

You want to define this custom byline function in your (child) theme’s functions.php. It should be called from within content.php.

After inclusion use Google’s structured data testing tool for validation of the approach. It should show updated entry, containing the correct date.

WP-GeSHi-Highlight 1.2.3 released

I have released version 1.2.3 of WP-GeSHi-Highlight. WP-GeSHi-Highlight is a popular code syntax highlighting plugin for WordPress, based on the established PHP highlighting library GeSHi.

The only change in this realease:

  • The bundled GeSHi library was updated to version 1.0.8.12.

So far, there are no official release notes for GeSHi 1.0.8.12. However, the commit history clarifies that this update involves various language file improvements as well as newcomers. A list of those languages that are affected by the update (manually crafted):

  • PHP
  • Rust
  • TCL
  • CSS
  • Haskell
  • PostScript
  • QML
  • NSIS
  • Nimrod
  • Oxygene
  • LSL2
  • StandardML
  • Oxygene

With this update in place, WP-GeSHi-Highlight supports syntax highlighting for 240 languages.

Documentation, FAQ, and comments can be found at gehrcke.de/wp-geshi-highlight.

Official WordPress themes should have an official change log

Officially supported themes: TwentyXXX

My website is WordPress-backed. WordPress front-ends are called “themes”. There are official themes, released by WordPress/Automattic. And there are thousands of themes released by third parties. While the WordPress project has released many themes, not all of them are equally “important”. There is only one specific series of WordPress themes that is so-to-say most official: themes from the TwentyXXX series.

The issue: no update release notes

In this series, WordPress releases one theme per year (there was TwentyEleven, TwentyTwelve, TwentyThirteen, you get the point). The most recent one of these themes is included with every major release of WordPress. In other words: it does not get more official. Correspondingly, themes from this series enjoy long-term support by the WordPress project. That is, they retrieve maintenance updates even years after their initial release (TwentyEleven was last updated by the end of 2014, for instance). That is great, really! However, there is one very negative aspect with these updates: there are no official release notes. That’s horrible, thinking in engineering terms, and considering release ethics applied in other serious open source software projects.

Background: dependency hell

TwentyXXX theme updates are released rather silently: suddenly, the WordPress dashboard shows that there is an update. But there is no official change log or release note which one could base a decision on. Nothing, apart from an increased version number. That is different from updating WordPress plugins, where the change log usually is only one click away from the WordPress dashboard. Also, the theme version number can not be relied upon to be semantically expressive (AFAIK WordPress themes are not promised to follow semantic versioning, right?)

Now, some of you may think that newer always is better. Just update and trust the developers. But that is not how things work in real life. Generally, we should stick to the paradigm of “never change a running system”, unless […]: sometimes, an update might change behavior, which might not be desired. Sometimes an update might fix a security issue, which one should know about and update immediately. Or the update resolves a usability issue. Such considerations are true for updates for any kind of software. But, in the context of WordPress, there is an even more important topic to consider when updating a theme: an update might break child themes. Or, as expressed by xkcd: “Every change breaks someones workflow”:

http://xkcd.com/1172

http://xkcd.com/1172

A theme can be used by other developers, as a so-called parent theme, in a library fashion — it provides a programming interface. This affects many websites, like mine: a couple of years ago I have decided to base the theme used on my website (here) on the TwentyTwelve theme. I went ahead and created a child theme, which inherits most of its code from TwentyTwelve and changes layout and behavior only in a few aspects. I definitely cannot blindly press the “update” button when TwentyTwelve retrieves an update. This might immediately change the interface I developed my child against, and can consequently break any component of my child theme. Obviously, I cannot just try this out with my live/public website. So, I have to test this update before, in a development environment which is not public.

If proper release notes were available, I could possibly skip that testing and apply such an update right away if it’s just a minor one. Or, I would be alerted that there is a security hole fixed with a breaking change in the parent theme, and I’d know that I have to quickly react and re-work my child theme so that I can safely apply the update to the parent. These things need to be communicated, like in any other open source project with a decent release policy.

Concluding remarks

Yes, there are ways to reconstruct and analyze the code changes that were made. This URL structure actually is quite helpful for generating diffs between theme versions: https://themes.trac.wordpress.org/changeset?old_path=/twentytwelve/1.4&new_path=/twentytwelve/1.6. That URL shows differences between TwentyTwelve 1.4 and 1.6. The same structure can be used for other official themes and version combinations. However, this does not replace a proper change log. WordPress is a mature, large-scale open source project with a huge developer community. Themes from the TwentyXXX series are a major component of this project. The project should provide change logs and/or release notes for every update — for compliance with expectations, and for enabling sound engineering decisions. Others want this, too:

Can any one point me to the release notes for 1.2 or a list of the applied changes? Updating from 1.1 has caused some minor, but unexpected presentation changes on one of my child themes, and I’d like to know what else has changed and what to test for before I upgrade further sites.

Songkick events for Google’s Knowledge Graph

Google can display upcoming concert events in the Knowledge Graph of musical artists (as announced in March 2014). This is a great feature, and probably many people in the field of music marketing and especially record labels aim to get this kind of data into the Knowledge Graph for their artists. However, Google does not magically find this data on its own. It needs to be informed, with a special kind of data structure (in the recently standardized JSON-LD format) contained within the artist’s website.

While of great interest to record labels, finding a proper technical solution to create and provide this data to Google still might be a challenge. I have prepared a web service that greatly simplifies the process of generating the required data structure. It pulls concert data from Songkick and translates them into the JSON-LD representation as required by Google. In the next section I explain the process by means of an example.

Web service usage example

The concert data of the band Milky Chance is published and maintained via Songkick, a service that many artists use. The following website shows — among others — all upcoming events of Milky Chance: http://www.songkick.com/artists/6395144-milky-chance. My web service translates the data held by Songkick into the data structure that Google requires in order to make this concert data appear in their Knowledge Graph. This is the corresponding service URL that needs to be called to retrieve the data:

https://jsonld-events.appspot.com/api/songkick/artist?skid=6395144&name=Milky+Chance&weburl=http%3A%2F%2Fmilkychanceofficial.com

That URL is made of the base URL of the web service, the songkick ID of the artist (6395144 in this case), the artist name and the artist website URL. Try accessing named service URL in your browser. It currently yields this:

[
  {
    "@context": "http://schema.org", 
    "@type": "MusicEvent", 
    "name": "Milky Chance", 
    "startDate": "2014-12-12", 
    "url": "http://www.songkick.com/concerts/21926613-milky-chance-at-max-nachttheater?utm_source=30793&utm_medium=partner", 
    "location": {
      "address": {
        "addressLocality": "Kiel", 
        "postalCode": "24116", 
        "streetAddress": "Eichhofstra\u00dfe 1", 
 
[ ... SNIP ~ 1000 lines of data ... ]
 
    "performer": {
      "sameAs": "http://milkychanceofficial.com", 
      "@type": "MusicGroup", 
      "name": "Milky Chance"
    }
  }
]

This piece of data needs to be included in the HTML source code of the artist website. Google then automatically finds this data and eventually displays the concert data in the Knowledge Graph (within a couple of days). That’s it — pretty simple, right? The good thing is that this method does not require layout changes to your website. This data can literally be included in any website, right now.

That is what happened in case of Milky Chance: some time ago, the data created by the web service was fed into the Milky Chance website. Consequently, their concert data is displayed in their Knowledge Graph. See for yourself: access https://www.google.com/search?q=milky+chance and look out for upcoming events on the right hand side. Screenshot:

milkychance_google_knowledgegraph

Google Knowledge Graph generated for Milky Chance. Note the upcoming events section: for this to appear, Google needs to find the event data in a special markup within the artist’s website.

So, in summary, when would you want to use this web service?

  • You have an interest in presenting the concert data of an artist in Google’s Knowledge Graph (you are record label or otherwise interested in improved marketing and user experience).
  • You have access to the artist website or know someone who has access.
  • The artist concert data already is present on Songkick or will be present in the future.

Then all you need is a specialized service URL, which you can generate with a small form I have prepared for you here: http://gehrcke.de/google-jsonld-events

Background: why Songkick?

Of course, the event data shown in the Knowledge Graph should be up to date and in sync with presentations of the same data in other places (bands usually display their concert data in many places: on Facebook, on their website, within third-party services, …). Fortunately, a lot of bands actually do manage this data in a central place (any other solution would be tedious). This central place/platform/service often is Songkick, because Songkick really made a nice job in providing people with what they need. My web service reflects recent changes made within Songkick.

Technical detail

The core of the web service is a piece of software that translates the data provided by Songkick into the JSON-LD data as required and specified by Google. The Songkick data is retrieved via Songkick’s JSON API (I applied for and got a Songkick API key). Large parts of this software deal with the unfortunate business of data format translation while handling certain edge cases.

The service is implemented in Python and hosted on Google App Engine. Its architecture is quite well thought-through (for instance, it uses memcache and asynchronous urlfetch wherever possible). It is ready to scale, so to say. Some technical highlights:

  • The web service enforces transport encryption (HTTPS).
  • Songkick back-end is queried via HTTPS only.
  • Songkick back-end is queried concurrently whenever possible.
  • Songkick responses are cached for several hours in order to reduce load on their service.
  • Responses of this web service are cached for several hours. These are served within milliseconds.

This is an overview of the data flow:

  1. Incoming request, specifying Songkick artist ID, artist name, and artist website.
  2. Using the Songkick API (SKA), all upcoming events are queried for this artist (one or more SKA requests, depending on number of events).
  3. For each event, the venue ID is extracted, if possible.
  4. All venues are queried for further details (this implicates as many SKA requests as venue IDs extracted).
  5. A JSON-LD representation of an event is constructed from a combination of
    • event data
    • venue data
    • user-given data (artist name and artist website)
  6. All event representations are combined and a returned.

Some notable points in this context:

  • A single request to this web service might implicate many requests to the Songkick API. This is why SKA responses are aggressively cached:
    • An example artist with 54 upcoming events requires 2 upcoming events API requests (two pages, cannot be requested concurrently) and requires roundabout 50 venue API requests (can be requested concurrently). Summed up, this implicates that my web service cannot respond earlier than three SKA round trip times take.
    • If none of the SKA responses has been cached before, the retrieval of about 2 + 50 SKA responses might easily take about 2 seconds.
    • This web services cannot be faster than SK delivers.
  • This web service applies graceful degradation when extracting data from Songkick (many special cases are handled, which is especially relevant for the venue address).

Generate your service URL

This blog post is just an introduction, and sheds some light on the implementation and decision-making. For general reference, I have prepared this document to get you started:

http://gehrcke.de/google-jsonld-events

It contains a web form where you can enter the (currently) three input parameters required for using the service. It returns a service URL for you. This URL points to my application hosted on Google App Engine. Using this URL, the service returns the JSON data that is to be included in an artist’s website. That’s all, it’s really pretty simple.

So, please go ahead and use this tool. I’d love to retrieve some feedback. Closely look at the data it returns, and keep your eyes open for subtle bugs. If you see something weird, report it, please. I am very open for suggestions, and also interested in your questions regarding future plans, release cycle etc. Also, if you need support for (dynamically) including this kind of data in your artist’s website, feel free to contact me.

CSS: Crispy downscaled images

A quick note about modern CSS directives having huge impact on the display quality of downscaled images. The following picture is a screenshot of Firefox 34, displaying two roundish icons (mail and Facebook icons, PNG images) downscaled to 50 % of their original size:

Firefox 34, PNGs downscaled to 50 %, default rendering algorithm

Firefox 34, PNGs downscaled to 50 %, default rendering algorithm

The edges are all blurry, although the original PNG files are not blurry at all (believe me). This sucks.

The correct question at this point is: Why would we want to downscale these images instead of displaying them at their original size?

The answer: mobile device screens have a much higher pixel density than classical desktop screens. Consequently, compared to the desktop appearance, a bitmap image (as opposed to a vector graphic) embedded into a website either has to appear much smaller on mobile screens (relative to other elements on the website) or the mobile browser has to upscale the image. We do not want the former to happen, because this cripples the layout. The latter, however, is equally bad: upscaling requires adding information that was previously lost, it always makes the image worse than before. A solution is to give the browser a sufficient amount of pixels to not being forced to upscale the image in the mobile environment. This, however, requires the browser to downscale the image (at least in the desktop environment). Downscaling is fine in general, given the right downscaling algorithm.

A badly chosen downscaling algorithm may produce blurry edges where crystal clear edges were in the original version of the image. The default downscaling algorithm of current Firefox and Chrome versions produces such blurry edges, which is what is shown in the screenshot above. Such an algorithm might be advantageous for photo-like images, but for line art / icons other techniques fit better, optimized for retaining contrast and edges. The important insight at this point is: a browser can never be intelligent enough to automatically judge — based on the image data — which algorithm to use best. Fortunately, the type of algorithm can manually be specified using CSS, as described here. Using the recommendations given in linked article, the result looks much better:

Firefox 34, PNGs downscaled to 50 %, moz-crisp-edges rendering algorithm

Firefox 34, PNGs downscaled to 50 %, moz-crisp-edges rendering algorithm

As shown in the screenshot, the CSS code in use is:

    image-rendering: -moz-crisp-edges;
    image-rendering: -o-crisp-edges;
    image-rendering: -webkit-optimize-contrast;
    -ms-interpolation-mode: nearest-neighbor;

I have observed the same difference in Chrome 39. So, go ahead and use this or take it one step further and provide different image files for different devices using media queries. Internet Explorer 11 showed crispy images in both cases (its default rendering algorithm does not blur line art upon downscaling).