<?xml version="1.0" encoding="UTF-8"?> <rss
version="2.0"
xmlns:content="http://purl.org/rss/1.0/modules/content/"
xmlns:wfw="http://wellformedweb.org/CommentAPI/"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:atom="http://www.w3.org/2005/Atom"
xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
> <channel><title>gehrcke.de &#187; Python</title> <atom:link href="http://gehrcke.de/category/technical-stuff/python/feed/" rel="self" type="application/rss+xml" /><link>http://gehrcke.de</link> <description>Jan-Philip Gehrcke&#039;s website</description> <lastBuildDate>Sun, 13 May 2012 17:17:35 +0000</lastBuildDate> <language>en</language> <sy:updatePeriod>hourly</sy:updatePeriod> <sy:updateFrequency>1</sy:updateFrequency> <generator>http://wordpress.org/?v=3.3.2</generator> <item><title>Python WebSocket server powered by gevent and ws4py</title><link>http://gehrcke.de/2011/10/python-websocket-server-powered-by-gevent-and-ws4py/</link> <comments>http://gehrcke.de/2011/10/python-websocket-server-powered-by-gevent-and-ws4py/#comments</comments> <pubDate>Fri, 07 Oct 2011 15:56:21 +0000</pubDate> <dc:creator>Jan-Philip Gehrcke</dc:creator> <category><![CDATA[Python]]></category> <category><![CDATA[Technical Stuff]]></category> <category><![CDATA[WebSockets]]></category> <guid
isPermaLink="false">http://gehrcke.de/?p=1761</guid> <description><![CDATA[<p>I am following the emerging WebSocket standard with a lot of interest. Today, I would like to update my recommendation of tools presented in the article &#8220;The best and simplest tools to create a basic WebSocket application with Flash fallback and Python on the server side&#8221;. ws4py (WebSocket for Python) by Sylvain Hellegouarch is worth [...]]]></description> <content:encoded><![CDATA[<p>I am following the emerging <a
href="http://en.wikipedia.org/wiki/WebSocket">WebSocket</a> standard with a lot of interest. Today, I would like to update my recommendation of tools presented in the article <a
href="http://gehrcke.de/2011/06/the-best-and-simplest-tools-to-create-a-basic-websocket-application-with-flash-fallback-and-python-on-the-server-side/">&#8220;The best and simplest tools to create a basic WebSocket application with Flash fallback and Python on the server side&#8221;</a>. <a
href="https://github.com/Lawouach/WebSocket-for-Python"><strong>ws4py</strong></a> (WebSocket for Python) by <a
href="http://www.defuze.org/">Sylvain Hellegouarch</a> is worth spreading the word.<span
id="more-1761"></span></p><p>ws4py is still under development and may underlie considerable structural changes in the future. But what we can see so far is a cleanly written generic pure-Python WebSocket protocol implementation that can be used for servers as well as for clients. I would like to point out that &#8212; among others &#8212; a <a
href="http://gevent.org/">gevent</a>-driven server is already included. I have not seen or done any benchmarks about it, but theoretically this server should be very well performing. Overall, ws4py looks very promising.</p><p>In my playground setups, I&#8217;ve replaced <a
href="https://bitbucket.org/Jeffrey/gevent-websocket/">gevent-websocket</a> by ws4py. This way, I successfully ran some WebSocket communication tests between the clients implemented in Firefox 7, Chrome 14, Chrome 15 and the ws4py/gevent-based server &#8212; making use of the <a
href="http://tools.ietf.org/html/draft-ietf-hybi-thewebsocketprotocol-10">HyBI WebSocket protocol version 10</a>. Together with a recent version of the perfect <a
href="https://github.com/gimite/web-socket-js">web-socket-js</a>, the communication also works with older browsers like Firefox 3.</p><p>Standard conformance and performance of ws4py is tested via the <a
href="http://www.tavendo.de/autobahn/testsuite.html">Autobahn WebSocket test suite</a>. To have a testsuite like that is extremely important in order to push the development of high-quality WebSocket libraries. You should definitely look at it if you are a WebSocket client/server developer yourself.</p><p>Another thing I would like to show to you is the table at <a
href="http://en.wikipedia.org/wiki/User:Oberstet/Comparison_of_WebSocket_implementations">http://en.wikipedia.org/wiki/User:Oberstet/Comparison_of_WebSocket_implementations</a>. If maintained properly, this will always be a very good reference to get an overview about WebSocket implementations on different platforms.</p> ]]></content:encoded> <wfw:commentRss>http://gehrcke.de/2011/10/python-websocket-server-powered-by-gevent-and-ws4py/feed/</wfw:commentRss> <slash:comments>1</slash:comments> </item> <item><title>The best and simplest tools to create a basic WebSocket application with Flash fallback and Python on the server side</title><link>http://gehrcke.de/2011/06/the-best-and-simplest-tools-to-create-a-basic-websocket-application-with-flash-fallback-and-python-on-the-server-side/</link> <comments>http://gehrcke.de/2011/06/the-best-and-simplest-tools-to-create-a-basic-websocket-application-with-flash-fallback-and-python-on-the-server-side/#comments</comments> <pubDate>Sun, 26 Jun 2011 15:56:46 +0000</pubDate> <dc:creator>Jan-Philip Gehrcke</dc:creator> <category><![CDATA[Javascript]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[Technical Stuff]]></category> <category><![CDATA[WebSockets]]></category> <guid
isPermaLink="false">http://gehrcke.de/?p=1694</guid> <description><![CDATA[<p>Currently, I am playing around with WebSockets. This is due to an application idea I have in my mind which requires a bidirectional connection between browser and server with low latency. The communication will happen in a stream-like fashion at low bandwidth. Real network sockets using TCP/UDP are often the desired optimum for things like [...]]]></description> <content:encoded><![CDATA[<p>Currently, I am playing around with <a
href="http://en.wikipedia.org/wiki/WebSockets">WebSockets</a>. This is due to an application idea I have in my mind which requires a <strong>bidirectional connection</strong> between  browser and server with <strong>low latency</strong>. The communication will happen in a stream-like fashion at low bandwidth. Real network sockets using TCP/UDP are often the desired optimum for things like that, but within a browser they can only be provided by Java or Flash plugins. <a
href="http://www.slideshare.net/rmoriz/pushing-the-web-websockets">The future belongs to WebSockets</a>. Implemented directly in the browser they are providing a much lower level network connection between the browser and the server than HTTP &#8212; without any plugin. WebSockets still work on a layer above TCP, but low latency and efficient data transport in both directions is warranted in a TCP-like fashion. Therefore, real-time application developers are very keen to use WebSockets. As this still is a young development, there is only few browser support and a lot of non-mature client/server libraries. Most importantly, there is a huge lack of documentation how to use these.</p><p>In this blog post, I present a very simple &#8220;echo application&#8221; where the user sends a message from his browser to the server &#8212; which in turn sends this message back to the client. Simple so far, but <strong>the main focus while realizing this is on selecting the right tools for the task</strong>, as there is already a lot of stuff out there &#8212; some very good, undocumented and hidden things, some totally overloaded things, and some bad things. I tried to fulfill the following conditions:</p><ul><li>Make use of a &#8220;<strong>Flashbridge</strong>&#8221; to realize a fallback when WebSockets are not available in a browser (which is true for Firefox at the moment)</li><li>use <strong>Python</strong> on the server side</li><li>use the <strong>best / most solid tools</strong> available</li><li>at the same time, use the <strong>simplest tools</strong> available that do not bring along loads of stuff that you do not need for simple applications or if you want to desing your own communication protocol anyway.</li></ul><p><span
id="more-1694"></span></p><h3>Tools for the client side</h3><p>First thing, the client running in the browser has to be implemented in JavaScript, of course. I do not like JavaScript, but this is the language of the browsers <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /> (is there a chance to revolutionize this to Python syntax?). So, when you search the web you will at first find <a
href="http://socket.io/">Socket.IO</a> and <a
href="http://jwebsocket.org/">jWebSocket</a>. Socket.IO is consisting of the client &#8220;Socket.IO&#8221; and the server &#8220;Socket.IO-node&#8221; &#8212; designed to be used together. Yes, it is great &#8212; it seems mature, it already provides a lot of different protocols on top of WebSockets and it provides loads of fallback mechanisms if there are no WebSockets available on the client side. There even is a Python implementation of the server side. But my feeling was: Nope. I have no experience so far with WebSockets, so I would like to start implementing a basic core. I don&#8217;t want to use any communication protocol so far. Just sending messages as designed by the specification of WebSockets. Furthermore, I only want Flash fallback as an <strong>equivalently fast</strong> replacement, no slow backup techniques like xhr-multipart, xhr-polling, and jsonp-polling. Socket.IO overwhelms with a lot of functionality and few of documentation &#8212; making it difficult for newcomers to grasp the essentials.</p><p>A lot of above also applies for jWebSocket, so they have many things in common why I did not want to use them. And as it turned out while browsing through their sources, they have one very special thing in common:</p><h4>web-socket-js</h4><p>Socket.IO as well as jWebSocket both base their core WebSocket-Flash-switch on a very neat and small JavaScript library called <a
href="https://github.com/simb/web-socket-js">web-socket-js</a> written by <a
href="http://gimite.net/en/">Hiroshi Ichikawa</a>. You can check this yourself by looking at <a
href="https://github.com/LearnBoost/Socket.IO/tree/master/lib/vendor/web-socket-js">Socket.IO&#8217;s repository</a> and <a
href="http://code.google.com/p/jwebsocket/source/browse/trunk/jWebSocket/jWebSocketClient/web/res/js/flash-bridge/web_socket.js">jWebSocket&#8217;s repository</a>. If required, web-socket-js replaces the standard JavaScript <code>new WebSocket</code> constructor by its own implementation. This does the trick:</p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="javascript"><pre class="de1"><span class="br0">&#40;</span><span class="kw2">function</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="br0">&#123;</span>
&nbsp;
  <span class="kw1">if</span> <span class="br0">&#40;</span>window.<span class="me1">WebSocket</span><span class="br0">&#41;</span> <span class="kw1">return</span><span class="sy0">;</span>
 <span class="br0">&#91;</span>...<span class="br0">&#93;</span>
<span class="br0">&#125;</span></pre></div></div></div></div></div></div></div><p> If WebSockets are supported by the browser natively, then they are used natively and web-socket-js does not do anything more than <code>return;</code>. If the browser lacks support, a Flash plugin Hiroshi wrote is loaded, providing <strong>the same API</strong> as the original JavaScript WebSocket. This API is implemented in the <code>[...]</code> of the code snippet above.</p><p>This is the way I want to start learning WebSockets: using the original API and with a lot of browser support. <strong>Decision made: on the client side, I use web-socket-js.</strong></p><h3>Tools for the server side</h3><p>Answer upfront this time: <a
href="http://www.gevent.org/">gevent</a> in combination with <a
href="https://bitbucket.org/Jeffrey/gevent-websocket">gevent-websocket</a>. Gevent is a <a
href="http://nichol.as/benchmark-of-python-web-servers">nicely performing</a> networking library for Python, based on the<a
href="http://www.stackless.com/pipermail/stackless-dev/2004-March/000022.html"> brilliant idea</a> of <a
href="http://packages.python.org/greenlet/">greenlets</a>. It allows you to write extremely simple server code. Gevent-websocket is just a very small piece of code on top of this, basically saving you from implementing the WebSocket handshake yourself.</p><h3>The code</h3><p>Let&#8217;s start with <code>websocket_echoserver.py</code>:</p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="python"><pre class="de1"><span class="kw1">from</span> geventwebsocket.<span class="me1">handler</span> <span class="kw1">import</span> WebSocketHandler
<span class="kw1">from</span> gevent <span class="kw1">import</span> pywsgi
<span class="kw1">import</span> gevent
&nbsp;
&nbsp;
<span class="kw1">def</span> handle_echo_client<span class="br0">&#40;</span>ws<span class="br0">&#41;</span>:
    <span class="kw1">while</span> <span class="kw2">True</span>:
        msg <span class="sy0">=</span> ws.<span class="me1">wait</span><span class="br0">&#40;</span><span class="br0">&#41;</span>
        <span class="kw1">if</span> msg <span class="sy0">==</span> <span class="st0">&quot;quit&quot;</span>:
            ws.<span class="me1">close_connection</span><span class="br0">&#40;</span><span class="br0">&#41;</span>
            <span class="kw1">break</span>
        ws.<span class="me1">send</span><span class="br0">&#40;</span>msg<span class="br0">&#41;</span>
&nbsp;
&nbsp;
<span class="kw1">def</span> app<span class="br0">&#40;</span>environ<span class="sy0">,</span> start_response<span class="br0">&#41;</span>:
    <span class="kw1">if</span> environ<span class="br0">&#91;</span><span class="st0">'PATH_INFO'</span><span class="br0">&#93;</span> <span class="sy0">==</span> <span class="st0">&quot;/echo&quot;</span>:
        handle_echo_client<span class="br0">&#40;</span>environ<span class="br0">&#91;</span><span class="st0">'wsgi.websocket'</span><span class="br0">&#93;</span><span class="br0">&#41;</span>
    <span class="kw1">else</span>:
        <span class="kw1">print</span> <span class="st0">&quot;404, PATH_INFO: %s&quot;</span> %  environ<span class="br0">&#91;</span><span class="st0">&quot;PATH_INFO&quot;</span><span class="br0">&#93;</span>
        start_response<span class="br0">&#40;</span><span class="st0">&quot;404 Not Found&quot;</span><span class="sy0">,</span> <span class="br0">&#91;</span><span class="br0">&#93;</span><span class="br0">&#41;</span>
        <span class="kw1">return</span> <span class="br0">&#91;</span><span class="br0">&#93;</span>
&nbsp;
&nbsp;
server <span class="sy0">=</span> pywsgi.<span class="me1">WSGIServer</span><span class="br0">&#40;</span><span class="br0">&#40;</span><span class="st0">''</span><span class="sy0">,</span> <span class="nu0">8087</span><span class="br0">&#41;</span><span class="sy0">,</span> app<span class="sy0">,</span> handler_class<span class="sy0">=</span>WebSocketHandler<span class="br0">&#41;</span>
server.<span class="me1">serve_forever</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></div></div></div></div></div></div></div><p>It listens on localhost:8087 for incoming connections. WebSocket connections are accepted and handled. If the requested resource is <code>/echo</code>, the greenlet <code>handle_echo_client()</code> deals with the client. I think it is obvious what this function does. To reach this server, the client has to connect to <code>ws://yourhost:8087/echo</code>.</p><p>Now the client side. Call this HTML code in your browser (make sure to set the paths to the JavaScript sources and SWF file correctly and keep everything <em>under the same domain</em>):</p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="html4strict"><pre class="de1"><span class="sc0">&lt;!DOCTYPE HTML PUBLIC &quot;-//W3C//DTD HTML 4.01 Transitional//EN&quot;</span>
<span class="sc0">&quot;http://www.w3.org/TR/html4/transitional.dtd&quot;&gt;</span>
<span class="sc2">&lt;<span class="kw2">html</span>&gt;</span>
<span class="sc2">&lt;<span class="kw2">head</span>&gt;</span>
<span class="sc2">&lt;<span class="kw2">title</span>&gt;</span>websocket client: send/receive test<span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">title</span>&gt;</span>
<span class="sc2">&lt;<span class="kw2">script</span> <span class="kw3">type</span><span class="sy0">=</span><span class="st0">&quot;text/javascript&quot;</span> <span class="kw3">src</span><span class="sy0">=</span><span class="st0">&quot;.........../web-socket-js/swfobject.js&quot;</span>&gt;&lt;<span class="sy0">/</span><span class="kw2">script</span>&gt;</span>
<span class="sc2">&lt;<span class="kw2">script</span> <span class="kw3">type</span><span class="sy0">=</span><span class="st0">&quot;text/javascript&quot;</span> <span class="kw3">src</span><span class="sy0">=</span><span class="st0">&quot;.........../web-socket-js/web_socket.js&quot;</span>&gt;&lt;<span class="sy0">/</span><span class="kw2">script</span>&gt;</span>
<span class="sc2">&lt;<span class="kw2">script</span> <span class="kw3">type</span><span class="sy0">=</span><span class="st0">&quot;text/javascript&quot;</span>&gt;</span>
WEB_SOCKET_SWF_LOCATION = &quot;............/web-socket-js/WebSocketMain.swf&quot;;
function log(msg) {
    var d = new Date();
    var hour = d.getHours();
    var min = d.getMinutes();
    var sec = d.getSeconds();
    var msec = d.getMilliseconds();
    var time = hour + &quot;:&quot; + min + &quot;:&quot; + sec + &quot;:&quot; + msec
    document.getElementById('foo').innerHTML += time + &quot;: &quot; + msg + &quot;<span class="sc2">&lt;<span class="kw2">br</span> <span class="sy0">/</span>&gt;</span>&quot;;
    }
&nbsp;
function ws_init(url) {
    log(&quot;connecting to &quot; + url + &quot;...&quot;)
    ws = new WebSocket(url);
    ws.onopen = function() {
        log(&quot;connection established.&quot;);
        };
    ws.onmessage = function(e) {
        log(&quot;message received: '&quot; + e.data + &quot;'&quot;);
        };
    ws.onclose = function() {
        log(&quot;connection closed.&quot;);
        };
    }
&nbsp;
function ws_send(msg) {
    log(&quot;send: &quot; + msg);
    ws.send(msg);
    }
&nbsp;
function ws_close() {
    log(&quot;closing connection..&quot;);
    ws.close();
    }
<span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">script</span>&gt;</span>
<span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">head</span>&gt;</span>
<span class="sc2">&lt;<span class="kw2">body</span>&gt;</span>
<span class="sc2">&lt;<span class="kw2">form</span> <span class="kw3">name</span><span class="sy0">=</span><span class="st0">&quot;input_form&quot;</span>&gt;</span>
<span class="sc2">&lt;<span class="kw2">table</span>&gt;</span>
<span class="sc2">&lt;<span class="kw2">tr</span>&gt;</span>
    <span class="sc2">&lt;<span class="kw2">td</span>&gt;</span>host/resource:<span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">td</span>&gt;</span>
    <span class="sc2">&lt;<span class="kw2">td</span>&gt;&lt;<span class="kw2">input</span> <span class="kw3">type</span><span class="sy0">=</span><span class="st0">&quot;text&quot;</span> <span class="kw3">name</span><span class="sy0">=</span><span class="st0">&quot;host&quot;</span> <span class="kw3">size</span><span class="sy0">=</span><span class="st0">&quot;50&quot;</span> <span class="kw3">value</span><span class="sy0">=</span><span class="st0">&quot;ws://host:port/resource&quot;</span>&gt;&lt;<span class="sy0">/</span><span class="kw2">td</span>&gt;</span>
    <span class="sc2">&lt;<span class="kw2">td</span>&gt;&lt;<span class="kw2">input</span> <span class="kw3">type</span><span class="sy0">=</span><span class="st0">&quot;button&quot;</span> <span class="kw3">value</span><span class="sy0">=</span><span class="st0">&quot;connect&quot;</span> <span class="kw3">onclick</span><span class="sy0">=</span><span class="st0">&quot;ws_init(document.input_form.host.value)&quot;</span>&gt;&lt;<span class="sy0">/</span><span class="kw2">td</span>&gt;</span>
    <span class="sc2">&lt;<span class="kw2">td</span>&gt;&lt;<span class="kw2">input</span> <span class="kw3">type</span><span class="sy0">=</span><span class="st0">&quot;button&quot;</span> <span class="kw3">value</span><span class="sy0">=</span><span class="st0">&quot;disconnect&quot;</span> <span class="kw3">onclick</span><span class="sy0">=</span><span class="st0">&quot;ws_close()&quot;</span>&gt;&lt;<span class="sy0">/</span><span class="kw2">td</span>&gt;</span>
<span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">tr</span>&gt;</span>
<span class="sc2">&lt;<span class="kw2">tr</span>&gt;</span>
    <span class="sc2">&lt;<span class="kw2">td</span>&gt;</span>msg:<span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">td</span>&gt;</span>
    <span class="sc2">&lt;<span class="kw2">td</span>&gt;&lt;<span class="kw2">input</span> <span class="kw3">type</span><span class="sy0">=</span><span class="st0">&quot;text&quot;</span> <span class="kw3">name</span><span class="sy0">=</span><span class="st0">&quot;msg&quot;</span> <span class="kw3">size</span><span class="sy0">=</span><span class="st0">&quot;50&quot;</span>&gt;&lt;<span class="sy0">/</span><span class="kw2">td</span>&gt;</span>
    <span class="sc2">&lt;<span class="kw2">td</span>&gt;&lt;<span class="kw2">input</span> <span class="kw3">type</span><span class="sy0">=</span><span class="st0">&quot;button&quot;</span> <span class="kw3">value</span><span class="sy0">=</span><span class="st0">&quot;send&quot;</span> <span class="kw3">onclick</span><span class="sy0">=</span><span class="st0">&quot;ws_send(document.input_form.msg.value)&quot;</span>&gt;&lt;<span class="sy0">/</span><span class="kw2">td</span>&gt;</span>
<span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">tr</span>&gt;</span>
<span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">table</span>&gt;</span>
<span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">form</span>&gt;</span>
<span class="sc2">&lt;<span class="kw2">div</span> <span class="kw3">id</span><span class="sy0">=</span><span class="st0">&quot;foo&quot;</span>&gt;</span> <span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">div</span>&gt;</span>
<span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">body</span>&gt;</span>
<span class="sc2">&lt;<span class="sy0">/</span><span class="kw2">html</span>&gt;</span></pre></div></div></div></div></div></div></div><p>What is used here is the standard JavaScript WebSocket API. For me, this HTML renders as:<br
/> <img
src="http://gehrcke.de/wp/blog_content/websocket_test_empty.png" alt="websocket test" /></p><p>Now&#8230; maybe you opened this in Chrome. Then, WebSocket support is natively included and activated. In Firefox, it is disabled. There the Flashbridge would come into play. But Flash does not just send and receive data to and from anywhere. It at first asks the host who delivered the SWF file with whom it is allowed to exchange data: it expects to get a so-called socket policy file from the server, after asking him for this on port 843. Yeah, just accept this &#8212; it is not a bad idea that the internet communication of Flash plugins is restricted. Anyway, you have to set up a server listening on port 843 to provide this policy. I have made a <a
href="http://gehrcke.de/2011/06/flash-socket-policy-server-in-python-based-on-gevent/">blog post for setting up a Flash socket policy server in Python based on gevent</a>.  I just assume that you have set up such a server. Then, our HTML code from above works (at least for me) under Firefox 4+, Chrome 6+ and Internet Explorer 9. Enter your echo server&#8217;s details (<code>ws://yourhost:8087/echo</code>) and play around with it. For me, it looks like that:<br
/> <img
src="http://gehrcke.de/wp/blog_content/websocket_test.png" alt="websocket test" /></p><p>Isn&#8217;t that great? It works. The latency times between send and receive are pretty low (at least for such small words to send): they are comparable with the ping latencies I measured against the same server. The Flashbridge transparantly does its job in Firefox and Internet Explorer and introduces around 5 ms more latency.</p><h3>Conclusion</h3><p>The tools used for this test (gevent, gevent-websocket and web-socket-js) are solid, perform very good and are very simple to use. The code executed in the browser uses the standard WebSocket API. But, due to web-socket-js and its nice Flashbridge, it works in all browsers supporting Flash. gevent-websocket and web-socket-js do not provide more functionality than simple message passing over WebSockets, which was desired.</p> ]]></content:encoded> <wfw:commentRss>http://gehrcke.de/2011/06/the-best-and-simplest-tools-to-create-a-basic-websocket-application-with-flash-fallback-and-python-on-the-server-side/feed/</wfw:commentRss> <slash:comments>5</slash:comments> </item> <item><title>Flash socket policy server in Python based on gevent</title><link>http://gehrcke.de/2011/06/flash-socket-policy-server-in-python-based-on-gevent/</link> <comments>http://gehrcke.de/2011/06/flash-socket-policy-server-in-python-based-on-gevent/#comments</comments> <pubDate>Sat, 18 Jun 2011 16:25:30 +0000</pubDate> <dc:creator>Jan-Philip Gehrcke</dc:creator> <category><![CDATA[Python]]></category> <category><![CDATA[Technical Stuff]]></category> <category><![CDATA[WebSockets]]></category> <guid
isPermaLink="false">http://gehrcke.de/?p=1612</guid> <description><![CDATA[<p>Currently, I am looking into gevent &#8212; a nicely performing networking library for Python, based on the brilliant idea of greenlets. I try to use this for WebSocket communication with browsers. As of today, WebSockets are still not available or disabled in some browsers. That&#8217;s why there are Javascript implementations like web-socket-js providing a transparent [...]]]></description> <content:encoded><![CDATA[<p>Currently, I am looking into <a
href="http://www.gevent.org/">gevent</a> &#8212; a <a
href="http://nichol.as/benchmark-of-python-web-servers">nicely performing</a> networking library for Python, based on the<a
href="http://www.stackless.com/pipermail/stackless-dev/2004-March/000022.html"> brilliant idea</a> of <a
href="http://packages.python.org/greenlet/">greenlets</a>. I try to use this for <a
href="http://de.wikipedia.org/wiki/WebSocket">WebSocket</a> communication with browsers. As of today, WebSockets are still not available or <a
href="http://hacks.mozilla.org/2010/12/websockets-disabled-in-firefox-4/">disabled</a> in some browsers. That&#8217;s why there are Javascript implementations like <a
href="https://github.com/gimite/web-socket-js">web-socket-js</a> providing a transparent &#8220;Flashbridge&#8221; as a fallback for the WebSocket communication. But the Flash plugin in the client&#8217;s browser does not simply communicate with any server on any port in the world. As a first step, it always tries to receive a so-called <a
href="http://www.adobe.com/devnet/flashplayer/articles/fplayer9_security.html">socket policy file</a> from the same server the SWF file was received from; on port 843. This policy file tells the client <a
href="http://www.adobe.com/devnet/flashplayer/articles/cross_domain_policy.html">from whom he is allowed to receive data</a>. Today, I present a simple server based on gevent providing this policy file for connecting Flash clients.<span
id="more-1612"></span></p><p>By using gevent, you automatically get one of the highest performing Python servers you can get for this purpose. Due to the greenlet concept, it is able to handle a huge amount of concurrent connections without any threading overhead. Due to benchmarks <a
href="http://oddments.org/?p=494">like this</a>, the server should be faster than comparable Python solutions (based on e.g. <a
href="http://twistedmatrix.com">Twisted</a> or <a
href="http://eventlet.net/">Eventlet</a>) while almost reaching the level of optimized C/C++ solutions. Of course, all this performance is not really necessary for a Flash policy server, but it is good to know what we are dealing with.</p><p>Now, the code. It is already &#8220;bloated&#8221; with comments and logging functionality. Due to the use of gevent, the server code itself is very simple:</p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="python"><pre class="de1"><span class="co1">#!/usr/bin/env python</span>
&nbsp;
<span class="co1"># Flash access policy server based on gevent</span>
<span class="co1"># Jan-Philip Gehrcke, June 2011</span>
&nbsp;
<span class="co1"># Listen on port 843; send acess policy to client; disconnect.</span>
&nbsp;
&nbsp;
<span class="kw1">from</span> gevent.<span class="me1">server</span> <span class="kw1">import</span> StreamServer
<span class="kw1">import</span> <span class="kw3">datetime</span>
<span class="kw1">import</span> <span class="kw3">socket</span>
&nbsp;
&nbsp;
<span class="co1"># Should we log something? Where to?</span>
LOG <span class="sy0">=</span> <span class="nu0">1</span>
LOGFILE <span class="sy0">=</span> <span class="st0">&quot;flash_access_policy_server.log&quot;</span>
&nbsp;
<span class="co1"># The policy that is sent to the clients.</span>
POLICY <span class="sy0">=</span> <span class="st0">&quot;&quot;&quot;&lt;cross-domain-policy&gt;&lt;allow-access-from domain=&quot;*&quot; to-ports=&quot;*&quot; /&gt;&lt;/cross-domain-policy&gt;<span class="es0">\0</span>&quot;&quot;&quot;</span>
&nbsp;
<span class="co1"># The string the client has to send in order to receive the policy.</span>
POLICYREQUEST <span class="sy0">=</span> <span class="st0">&quot;&lt;policy-file-request/&gt;&quot;</span>
&nbsp;
&nbsp;
<span class="co1"># This function is called for each incoming connection</span>
<span class="co1"># (in a non-blocking fashion in a greenlet)</span>
<span class="kw1">def</span> client_handle<span class="br0">&#40;</span>sock<span class="sy0">,</span> address<span class="br0">&#41;</span>:
    log<span class="br0">&#40;</span><span class="st0">&quot;%s:%s: Connection accepted.&quot;</span> % address<span class="br0">&#41;</span>
    <span class="co1"># send and read functions should not wait longer than three seconds</span>
    sock.<span class="me1">settimeout</span><span class="br0">&#40;</span><span class="nu0">3</span><span class="br0">&#41;</span>
    <span class="kw1">try</span>:
        <span class="co1"># try to receive at most 128 bytes (`POLICYREQUEST` is shorter)</span>
        <span class="kw2">input</span> <span class="sy0">=</span> sock.<span class="me1">recv</span><span class="br0">&#40;</span><span class="nu0">128</span><span class="br0">&#41;</span>
        <span class="kw1">if</span> <span class="kw2">input</span>.<span class="me1">startswith</span><span class="br0">&#40;</span>POLICYREQUEST<span class="br0">&#41;</span>:
            sock.<span class="me1">sendall</span><span class="br0">&#40;</span>POLICY<span class="br0">&#41;</span>
            log<span class="br0">&#40;</span><span class="st0">&quot;%s:%s: Policy sent. Closing connection.&quot;</span> % address<span class="br0">&#41;</span>
        <span class="kw1">else</span>:
            log<span class="br0">&#40;</span><span class="st0">&quot;%s:%s: Crap received. Closing connection.&quot;</span> % address<span class="br0">&#41;</span>
    <span class="kw1">except</span> <span class="kw3">socket</span>.<span class="me1">timeout</span>:
        log<span class="br0">&#40;</span><span class="st0">&quot;%s:%s: Timed out. Closing.&quot;</span> % address<span class="br0">&#41;</span>
    sock.<span class="me1">close</span><span class="br0">&#40;</span><span class="br0">&#41;</span>
&nbsp;
&nbsp;
<span class="co1"># Write `msg` to file and stdout, prepended by a date/time string</span>
<span class="kw1">def</span> log<span class="br0">&#40;</span>msg<span class="br0">&#41;</span>:
    <span class="kw1">if</span> LOG:
        l <span class="sy0">=</span> <span class="st0">&quot;%s: %s&quot;</span> % <span class="br0">&#40;</span><span class="kw3">datetime</span>.<span class="kw3">datetime</span>.<span class="me1">now</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">isoformat</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="sy0">,</span> msg<span class="br0">&#41;</span>
        lf.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;%s<span class="es0">\n</span>&quot;</span> % l<span class="br0">&#41;</span>
        <span class="kw1">print</span> l
&nbsp;
&nbsp;
<span class="kw1">if</span> __name__ <span class="sy0">==</span> <span class="st0">'__main__'</span>:
    <span class="kw1">if</span> LOG:
        lf <span class="sy0">=</span> <span class="kw2">open</span><span class="br0">&#40;</span>LOGFILE<span class="sy0">,</span> <span class="st0">&quot;a&quot;</span><span class="br0">&#41;</span>
    server <span class="sy0">=</span> StreamServer<span class="br0">&#40;</span><span class="br0">&#40;</span><span class="st0">'0.0.0.0'</span><span class="sy0">,</span> <span class="nu0">843</span><span class="br0">&#41;</span><span class="sy0">,</span> client_handle<span class="br0">&#41;</span>
    log<span class="br0">&#40;</span><span class="st0">'Starting server...'</span><span class="br0">&#41;</span>
    server.<span class="me1">serve_forever</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></div></div></div></div></div></div></div><p>Note that the policy in this case is not very sophisticated and basically allows everything;-)</p> ]]></content:encoded> <wfw:commentRss>http://gehrcke.de/2011/06/flash-socket-policy-server-in-python-based-on-gevent/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>Average date strings using Python</title><link>http://gehrcke.de/2011/02/average-date-strings-using-python/</link> <comments>http://gehrcke.de/2011/02/average-date-strings-using-python/#comments</comments> <pubDate>Sun, 06 Feb 2011 19:31:23 +0000</pubDate> <dc:creator>Jan-Philip Gehrcke</dc:creator> <category><![CDATA[Python]]></category> <category><![CDATA[Technical Stuff]]></category> <guid
isPermaLink="false">http://gehrcke.de/?p=1501</guid> <description><![CDATA[<p>Assume, you&#8217;ve measured values over time and now you want to average your data. This means you have to a) average your measured values &#8212; which is a trivial task &#8212; and b) average your points in time. Here, I present a solution how to average arbitrary date strings in Python.</p><p>Let&#8217;s talk about a [...]]]></description> <content:encoded><![CDATA[<p>Assume, you&#8217;ve measured values over time and now you want to average your data. This means you have to a) average your measured values &#8212; which is a trivial task &#8212; and <strong>b) average your points in time</strong>. Here, I present a solution how to average arbitrary date strings in Python.<span
id="more-1501"></span></p><p>Let&#8217;s talk about a specific example. This is the content of <code>dates.dat</code>, our input:</p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="text"><pre class="de1">20110130_195243
20110130_200003
20110130_200803
20110130_200909
20110130_201003
20110130_202004
20110130_203003
20110130_204003
20110130_205003
20110130_210003
20110130_211003
20110130_212004
20110130_213003
20110130_214003
20110130_215003
20110130_220003
20110130_221003</pre></div></div></div></div></div></div></div><p>Each of the 17 lines contains a string representing a point in time using some distinct format.</p><p><strong>Now, let&#8217;s say that the goal is to build the mean of every 3 points in time</strong>. An output date string representing a mean time should have the same format as the date strings in <code>dates.dat</code>. Hence, the outputfile <code>dates_meanof3values.dat</code> should look like this:</p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="text"><pre class="de1">20110130_200016
20110130_201305
20110130_204003
20110130_211003
20110130_214003</pre></div></div></div></div></div></div></div><p>These 5 date strings represent the average points in time of the first 5*3 date strings in our input.</p><p>The following Python code accomplishes this:</p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="python"><ol><li
class="li1"><pre class="de1"><span class="kw1">import</span> <span class="kw3">time</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1">meanof <span class="sy0">=</span> <span class="nu0">3</span> <span class="co1"># number of dates taken into account for averaging</span></pre></li><li
class="li1"><pre class="de1">inputfile <span class="sy0">=</span> <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">'dates.dat'</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">outputfile <span class="sy0">=</span> <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;dates_meanof%svalues.dat&quot;</span> % meanof<span class="sy0">,</span><span class="st0">'w'</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">def</span> datestring_to_timestamp<span class="br0">&#40;</span><span class="kw2">str</span><span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="st0">&quot;&quot;&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="st0">    Assume `str` representing a time in local time and convert it to a timestamp</span></pre></li><li
class="li1"><pre class="de1"><span class="st0">    (time as a floating point number expressed in seconds since the epoch, in</span></pre></li><li
class="li1"><pre class="de1"><span class="st0">    UTC) using the format given below.</span></pre></li><li
class="li1"><pre class="de1"><span class="st0">    &quot;&quot;&quot;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">return</span> <span class="kw3">time</span>.<span class="me1">mktime</span><span class="br0">&#40;</span><span class="kw3">time</span>.<span class="me1">strptime</span><span class="br0">&#40;</span><span class="kw2">str</span><span class="sy0">,</span> <span class="st0">&quot;%Y%m%d_%H%M%S&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">def</span> timestamp_to_datestring<span class="br0">&#40;</span>timestamp<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="st0">&quot;&quot;&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="st0">    Inverse of the function `datestring_to_timestamp`</span></pre></li><li
class="li1"><pre class="de1"><span class="st0">    &quot;&quot;&quot;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">return</span> <span class="kw3">time</span>.<span class="me1">strftime</span><span class="br0">&#40;</span><span class="st0">&quot;%Y%m%d_%H%M%S&quot;</span><span class="sy0">,</span> <span class="kw3">time</span>.<span class="me1">localtime</span><span class="br0">&#40;</span>timestamp<span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">def</span> chunks<span class="br0">&#40;</span><span class="kw2">list</span><span class="sy0">,</span> n<span class="sy0">,</span> strict<span class="sy0">=</span><span class="kw2">False</span><span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="st0">&quot;&quot;&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="st0">    Split `list` in sub-lists: yield successive `n`-sized chunks from `list`.</span></pre></li><li
class="li1"><pre class="de1"><span class="st0">    If `strict` is True, the last chunk is only yielded if its length is `n`.</span></pre></li><li
class="li1"><pre class="de1"><span class="st0">    &quot;&quot;&quot;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">for</span> i <span class="kw1">in</span> <span class="kw2">xrange</span><span class="br0">&#40;</span><span class="nu0">0</span><span class="sy0">,</span> <span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#41;</span><span class="sy0">,</span> n<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">if</span> <span class="kw1">not</span> strict <span class="kw1">or</span> <span class="kw2">len</span><span class="br0">&#40;</span><span class="kw2">list</span><span class="br0">&#91;</span>i:i+n<span class="br0">&#93;</span><span class="br0">&#41;</span> <span class="sy0">==</span> n:</pre></li><li
class="li1"><pre class="de1">            <span class="kw1">yield</span> <span class="kw2">list</span><span class="br0">&#91;</span>i:i+n<span class="br0">&#93;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co1"># read lines from file and remove trailing spaces; don't consider empty lines          </span></pre></li><li
class="li1"><pre class="de1">cleanlines <span class="sy0">=</span> <span class="br0">&#91;</span>line.<span class="me1">strip</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="kw1">for</span> line <span class="kw1">in</span> inputfile.<span class="me1">readlines</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="kw1">if</span> line.<span class="me1">strip</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#93;</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># devide data into chunks and interate over them</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">for</span> <span class="kw3">chunk</span> <span class="kw1">in</span> chunks<span class="br0">&#40;</span>cleanlines<span class="sy0">,</span> meanof<span class="sy0">,</span> <span class="kw2">True</span><span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="co1"># build timestamp of each datestring in current chunk and build the median</span></pre></li><li
class="li1"><pre class="de1">    mean_timestamp <span class="sy0">=</span> <span class="kw2">sum</span><span class="br0">&#40;</span><span class="kw2">map</span><span class="br0">&#40;</span>datestring_to_timestamp<span class="sy0">,</span> <span class="kw3">chunk</span><span class="br0">&#41;</span><span class="br0">&#41;</span> / meanof</pre></li><li
class="li1"><pre class="de1">    <span class="co1"># convert mean timestamp back to datestring and write this to file</span></pre></li><li
class="li1"><pre class="de1">    outputfile.<span class="me1">write</span><span class="br0">&#40;</span><span class="st0">&quot;%s<span class="es0">\n</span>&quot;</span> % timestamp_to_datestring<span class="br0">&#40;</span>mean_timestamp<span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li></ol></div></div></div></div></div></div></div><p>Timestamps are linearly related in the decimal system, so that time can be easily averaged by summation and division of timestamps. The functions <code>datestring_to_timestamp()</code> and <code>timestamp_to_datestring()</code> perform the conversion of date strings from/to timestamps, using a user-given date string format (you can edit lines 13 and 19 corresponding to <a
href="http://docs.python.org/library/time.html#time.strftime"> these format specifiers</a>).</p><p>The function <code>chunks()</code>, which makes use of <a
href="http://wiki.python.org/moin/Generators">Python generators</a>, devides a given list into sub-lists (&#8220;chunks&#8221;). This is very useful at this point &#8212; the mean date string of a chunk then is calculated as simple as</p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="python"><pre class="de1">timestamp_to_datestring<span class="br0">&#40;</span><span class="kw2">sum</span><span class="br0">&#40;</span><span class="kw2">map</span><span class="br0">&#40;</span>datestring_to_timestamp<span class="sy0">,</span> <span class="kw3">chunk</span><span class="br0">&#41;</span><span class="br0">&#41;</span> / meanof<span class="br0">&#41;</span></pre></div></div></div></div></div></div></div> ]]></content:encoded> <wfw:commentRss>http://gehrcke.de/2011/02/average-date-strings-using-python/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>MySQL backup script with email support and lzma compression (for cron)</title><link>http://gehrcke.de/2010/11/mysql-backup-script-with-email-support-and-lzma-compression-for-cron/</link> <comments>http://gehrcke.de/2010/11/mysql-backup-script-with-email-support-and-lzma-compression-for-cron/#comments</comments> <pubDate>Tue, 16 Nov 2010 23:23:45 +0000</pubDate> <dc:creator>Jan-Philip Gehrcke</dc:creator> <category><![CDATA[Linux]]></category> <category><![CDATA[MySQL]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[Technical Stuff]]></category> <guid
isPermaLink="false">http://gehrcke.de/?p=1235</guid> <description><![CDATA[<p>Let me share a shell script to perform a backup of a MySQL database. It is intended to run regularly (e.g. as cronjob). I created it carefully. Finally, it may distinguish itself from similar scripts by the following points:</p> It stores the backup file locally. Additionally, it sends an email with the backup file via [...]]]></description> <content:encoded><![CDATA[<p>Let me share a shell script to perform a backup of a MySQL database. It is intended to run regularly (e.g. as cronjob). I created it carefully. Finally, it may distinguish itself from similar scripts by the following points:<span
id="more-1235"></span></p><ul><li>It stores the backup file locally. Additionally, it sends an email with the backup file via any SMTP server you like (localhost:25 by default). STARTTLS/SSL is supported. Therefore, I use <code>mailsend.py</code> (an SMTP-based email sender I&#8217;ve written in Python). If <code>mysqldump</code> does not exit with return code <code>0</code>, a special email is sent. <code>mysqldump</code>&#8217;s stderr output is always fetched and sent via mail.</li></ul><ul><li>It does not use your database password directly in the <code>mysqldump</code> command (in which case other users would be able to see the your password in the process list while the command is running). This mistake seems to be done often.</li></ul><ul><li>It uses <a
href="http://en.wikipedia.org/wiki/Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm">lzma</a> compression, which produces higher compression ratios than gzip/bz2 (at the cost of computing power).</li></ul><ul><li>It automatically deletes local backup files older than a user-given number of days.</li></ul><h5>The script:</h5><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="bash"><ol><li
class="li1"><pre class="de1"><span class="co0">#!/bin/sh</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co0"># database backup settings. edit these accordingly.</span></pre></li><li
class="li1"><pre class="de1"><span class="re2">EMAIL</span>=<span class="st0">&quot;user@host.com&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="re2">BACKUP_NAME</span>=<span class="st0">&quot;FORUM_DB_BACKUP&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="re2">BACKUP_DIR</span>=<span class="st0">&quot;/path/to/backup/dir&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="re2">DB_NAME</span>=<span class="st0">&quot;dbname&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="re2">DB_HOST</span>=<span class="st0">&quot;127.0.0.1&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="re2">DB_USER</span>=<span class="st0">&quot;username&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="re2">DB_PASSWORDFILE</span>=<span class="st0">&quot;/path_to/.db_password_file&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="re2">DEL_OLDER_THAN_X_DAYS</span>=<span class="st0">&quot;30&quot;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">if</span> <span class="br0">&#91;</span><span class="br0">&#91;</span> <span class="sy0">!</span> <span class="re5">-d</span> <span class="co1">${BACKUP_DIR}</span> <span class="br0">&#93;</span><span class="br0">&#93;</span>; <span class="kw1">then</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw2">mkdir</span> <span class="co1">${BACKUP_DIR}</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">fi</span></pre></li><li
class="li1"><pre class="de1"><span class="re2">BACKUP_PATH</span>=<span class="co1">${BACKUP_DIR}</span><span class="sy0">/</span>$<span class="br0">&#40;</span><span class="kw2">date</span> +<span class="sy0">%</span>Y<span class="sy0">%</span>m<span class="sy0">%</span>d-<span class="sy0">%</span>H<span class="sy0">%</span>M<span class="sy0">%</span>S<span class="br0">&#41;</span>_<span class="co1">${BACKUP_NAME}</span>.sql.lzma</pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co0"># dump database content to stdout. read db password from file (never put it</span></pre></li><li
class="li1"><pre class="de1"><span class="co0"># on the commandline directly, as other users may read it in the process</span></pre></li><li
class="li1"><pre class="de1"><span class="co0"># list). forward both stdout and stderr to lzma, which is more efficient</span></pre></li><li
class="li1"><pre class="de1"><span class="co0"># than gzip/bz2. store the data in a file with a meaningful name.</span></pre></li><li
class="li1"><pre class="de1">mysqldump \</pre></li><li
class="li1"><pre class="de1">    <span class="re5">--user</span>=<span class="co1">${DB_USER}</span> \</pre></li><li
class="li1"><pre class="de1">    <span class="re5">--password</span>=$<span class="br0">&#40;</span><span class="kw2">cat</span> <span class="co1">${DB_PASSWORDFILE}</span><span class="br0">&#41;</span> \</pre></li><li
class="li1"><pre class="de1">    <span class="re5">--host</span>=<span class="co1">${DB_HOST}</span> \</pre></li><li
class="li1"><pre class="de1">    <span class="re5">--opt</span> <span class="re5">--quote-names</span> \</pre></li><li
class="li1"><pre class="de1">    <span class="co1">${DB_NAME}</span> \</pre></li><li
class="li1"><pre class="de1">    <span class="sy0">|&amp;</span> lzma <span class="re5">-c</span> <span class="sy0">&gt;</span> \</pre></li><li
class="li1"><pre class="de1">    <span class="co1">${BACKUP_PATH}</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co0"># send the backup file to your email address using the SMTP server listening</span></pre></li><li
class="li1"><pre class="de1"><span class="co0"># on localhost:25. if you wish to use another SMTP server (localhost:465 or </span></pre></li><li
class="li1"><pre class="de1"><span class="co0"># smtp.gmail.com:587 or whatever), see mailsend.py --help.</span></pre></li><li
class="li1"><pre class="de1"><span class="co0"># note: PIPESTATUS may not work within other shells than bash.</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">if</span> <span class="br0">&#91;</span><span class="br0">&#91;</span> <span class="co1">${PIPESTATUS[0]}</span> <span class="re5">-eq</span> <span class="nu0">0</span> <span class="br0">&#93;</span><span class="br0">&#93;</span>; <span class="kw1">then</span></pre></li><li
class="li1"><pre class="de1">    python mailsend.py <span class="re5">-s</span> <span class="co1">${BACKUP_NAME}</span> <span class="re5">-f</span> <span class="co1">${BACKUP_PATH}</span> <span class="co1">${EMAIL}</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">else</span></pre></li><li
class="li1"><pre class="de1">    python mailsend.py <span class="re5">-s</span> <span class="co1">${BACKUP_NAME}</span>_ERROR <span class="re5">-f</span> <span class="co1">${BACKUP_PATH}</span> \</pre></li><li
class="li1"><pre class="de1">    <span class="re5">-m</span> <span class="st0">&quot;mysqldump threw an error. It's stderr is within the attached archive.&quot;</span> \</pre></li><li
class="li1"><pre class="de1">    <span class="co1">${EMAIL}</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">fi</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co0"># delete backup files older than X days (as set above)</span></pre></li><li
class="li1"><pre class="de1"><span class="kw2">find</span> \</pre></li><li
class="li1"><pre class="de1">    <span class="co1">${BACKUP_DIR}</span> \</pre></li><li
class="li1"><pre class="de1">    <span class="re5">-name</span> \<span class="sy0">*</span>.sql.lzma \</pre></li><li
class="li1"><pre class="de1">    <span class="re5">-mtime</span> +<span class="co1">${DEL_OLDER_THAN_X_DAYS}</span> \</pre></li><li
class="li1"><pre class="de1">    <span class="re5">-exec</span> <span class="kw2">rm</span> <span class="re5">-f</span> <span class="st_h">'{}'</span> \;</pre></li></ol></div></div></div></div></div></div></div><h5> How to use it:</h5><ul><li>Put the lines above in a file on your server and make it executable.</li><li>Create the password file containing your database password in clear text (apply <code>chmod 600</code> to the file!).</li><li>Adjust the settings within the header of the script.</li><li>If you like the email feature, store <code>mailsend.py</code> (download link <a
href="#mailsend">below</a>) in the same directory as the backup script (or adjust the paths). If you don&#8217;t like the email feature, outcomment the corresponding lines 35 to 41.</li><li>If your server does not know the <code>lzma</code> command, replace it with <code>gzip</code> (line 28).</li><li>Configure your new backup script as <a
href="http://en.wikipedia.org/wiki/Cron">cronjob</a>.</li></ul><h5><a
name="mailsend">Mailsend.py:</a></h5><p>I will publish a blog post introducing <code>mailsend.py</code> soon. This is what I can provide until then:<br
/> <strong>View the help:</strong> <a
href="http://paste.pocoo.org/show/Isj8E6vsIWvWc9OP9ogh/">here</a><br
/> <strong>View the code:</strong> <a
href="http://paste.pocoo.org/show/PN270ZINhKftblc4Hj4n/">here</a></p><p><strong>Download it:</strong><br
/> <a
href="http://gehrcke.de/files/perm/py/mailsend_2010_11_16.zip">mailsend_2010_11_16.zip</a><br
/> <a
href="http://gehrcke.de/files/perm/py/mailsend_2010_11_16.tar.gz">mailsend_2010_11_16.tar.gz</a></p> ]]></content:encoded> <wfw:commentRss>http://gehrcke.de/2010/11/mysql-backup-script-with-email-support-and-lzma-compression-for-cron/feed/</wfw:commentRss> <slash:comments>0</slash:comments> </item> <item><title>Python backup/snapshot script with compression (BZ2, 7zip)</title><link>http://gehrcke.de/2010/02/python-backupsnapshot-script-with-compression-bz2-7zip/</link> <comments>http://gehrcke.de/2010/02/python-backupsnapshot-script-with-compression-bz2-7zip/#comments</comments> <pubDate>Sat, 13 Feb 2010 18:18:49 +0000</pubDate> <dc:creator>Jan-Philip Gehrcke</dc:creator> <category><![CDATA[Python]]></category> <category><![CDATA[Technical Stuff]]></category> <guid
isPermaLink="false">http://gehrcke.de/?p=1134</guid> <description><![CDATA[<p>While writing my master thesis with LaTeX, I often change source code files, graphics etc. It is important for me to keep track of the major changes as well as I have to prevent losing the whole work due to some data loss issue. To solve these two problems, I wrote a convenient Python backup [...]]]></description> <content:encoded><![CDATA[<p>While writing my master thesis with LaTeX, I often change source code files, graphics etc. It is important for me to <strong>keep track of the major changes</strong> as well as I have to <strong>prevent losing the whole work due to some data loss issue</strong>. To solve these two problems, I wrote a convenient Python backup script.<span
id="more-1134"></span></p><p>On invocation, it creates a snapshot of my thesis directorie&#8217;s content. It writes everything to another directory, with a <strong>timestamp</strong> in its name. With these snapshots, I can simply look into older versions of my work later on, just to be prepared if worst comes to worst.</p><p>But&#8230; what if the hard drive with the snapshots on it crashes? Therefore, I added the option to simply copy the whole directory tree <strong>to an additional destination as well</strong>. If this is another physical storage, it&#8217;s pretty unlikely to lose both snapshot copies at the same time. Feels good.</p><p>Over the time, I created more and more files within my project. For convenience, I added an option to <strong>compress</strong> the directorie&#8217;s snapshot into an archive. At this point, the user can choose between Python&#8217;s built-in <strong>.tar.bz2</strong> compression method or let the script use <strong>7zip</strong>, which provides ultra strong compression and is very fast.</p><p>After <strong>only a few seconds of configuration</strong> (directory definition, backup method selection), you only have to doubleclick the script file everytime you want to feel safe <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /></p><p><small>For Windows users:<br
/> 0) Download and install Python 2.6.x: <a
href="http://python.org/download/">http://python.org/download/</a><br
/> 1) Copy the script (below), edit &#8220;SETTINGS&#8221; section of the script, save it within a .py file<br
/> 2) Doubleclick as often as you want.</small></p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="python"><ol><li
class="li1"><pre class="de1"><span class="co1">#!/usr/bin/env python</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#********************************* README **************************************</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># SNAPSHOT / BACKUP script for Windows/Linux</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># script by Jan-Philip Gehrcke -- jgehrcke@gmail.com -- http://gehrcke.de</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># 0) DESCRIPTION:</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># ===============</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># On invocation, the script creates a snapshot of ORIG_DIR's contents and writes</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># it to BACKUP_DIR into 1) a new subdirectory or 2) a .tar.bz2 archive or 3) a</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># 7zip archive (choose it!). The time of snapshot creation is written into the</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># subdirectorie's name / archive file name. An optional second location can</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># be defined to which the snapshot will be written additionally.</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># This script is useful to manually and quickly create snapshots of a multi-file</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># project you're working on, enabling _rollbacks_ to an older version of your</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># project's files. Furthermore, using the additional backup location on another</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># physical storage, the script prevents _data loss_.</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># Primarily, this script is written for Windows users: simple double-click .py</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># file invocation is considered. Should work on Linux systems, too, but the</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># &quot;press any key to continue&quot; dialogue is quite un-unixoid.</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># 1) USAGE:</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># =========</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># Download and install Python 2.6.x: http://python.org/download/</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># For 7zip method, download http://www.7-zip.org/download.html</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># Put the script file into the directory containing the directory you want to</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># back up, adjust settings (below) and then run the script (doubleclick on Win).</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># The snapshot/backup of</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#  ./ORIG_DIR/*</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># will go to</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#  ./BACKUP_DIR/BACKUP_PREFIX_timestring/*       (SIMPLE method, built-in)</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># OR to the archive</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#  ./BACKUP_DIR/BACKUP_PREFIX_timestring.tar.bz2 (BZ2 method, built-in)</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># OR to the archive</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#  ./BACKUP_DIR/BACKUP_PREFIX_timestring.7z      (7zip method; ultra strong</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#                                                 compression; faster than BZ2;</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#                                                 requires 7zip to be available)</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># Of course, ORIG_DIR and BACKUP_DIR can be absolute paths, too. Then, the</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># location of this script does not matter.</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># 2) SETTINGS:</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># ============</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># always use SLASHES (&quot;/&quot;) in paths, even on Windows -&gt; don't use &quot;\&quot;</span></pre></li><li
class="li1"><pre class="de1">ORIG_DIR <span class="sy0">=</span> <span class="st0">&quot;C:/project&quot;</span>          <span class="co1"># e.g. &quot;.&quot; or &quot;C:/project&quot;</span></pre></li><li
class="li1"><pre class="de1">BACKUP_DIR <span class="sy0">=</span> <span class="st0">&quot;C:/backups&quot;</span>        <span class="co1"># e.g. &quot;C:/backups&quot;</span></pre></li><li
class="li1"><pre class="de1">BACKUP_PREFIX <span class="sy0">=</span> <span class="st0">&quot;thesis_bckp&quot;</span>      <span class="co1"># e.g. &quot;thesis_bckp&quot;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co1"># choose backup method: 'SIMPLE' OR 'BZ2' OR '7zip':</span></pre></li><li
class="li1"><pre class="de1">METHOD <span class="sy0">=</span> <span class="st0">'7zip'</span>     <span class="co1"># quick and really strong compression, 7zip.exe required</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#METHOD = 'SIMPLE'  # copy directory tree; e.g. if you don't have many files..</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#METHOD = 'BZ2'     # builtin method; if you like compression, but no 7zip.</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co1"># in case of 7zip, specify 7z executable path:</span></pre></li><li
class="li1"><pre class="de1">SEVENZIPPATH <span class="sy0">=</span> <span class="st0">&quot;c:/Programs/7-Zip/7z.exe&quot;</span> <span class="co1"># e.g. &quot;c:/Programs/7-Zip/7z.exe&quot;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co1"># set ADDITIONAL_BACKUP_DIR to double-save backup (e.g. on another hard disk)</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># (outcomment the line if this is undesired behavior)</span></pre></li><li
class="li1"><pre class="de1">ADDITIONAL_BACKUP_DIR <span class="sy0">=</span> <span class="st0">&quot;F:/backups&quot;</span>   <span class="co1"># e.g. &quot;F:/backups&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="co1">#*******************************************************************************</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">import</span> <span class="kw3">os</span><span class="sy0">,</span> <span class="kw3">time</span><span class="sy0">,</span> <span class="kw3">shutil</span><span class="sy0">,</span> <span class="kw3">sys</span><span class="sy0">,</span> <span class="kw3">tarfile</span><span class="sy0">,</span> <span class="kw3">subprocess</span><span class="sy0">,</span> <span class="kw3">traceback</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">def</span> backup_directory_simple<span class="br0">&#40;</span>srcdir<span class="sy0">,</span>dstdir<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">if</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">exists</span><span class="br0">&#40;</span>dstdir<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">        exit_stop<span class="br0">&#40;</span><span class="st0">&quot;backup path %s already exists!&quot;</span> % dstdir<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">try</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="kw3">shutil</span>.<span class="me1">copytree</span><span class="br0">&#40;</span>srcdir<span class="sy0">,</span>dstdir<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">except</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;Error while copying tree in %s to %s&quot;</span> % <span class="br0">&#40;</span>srcdir<span class="sy0">,</span>dstdir<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;Traceback:<span class="es0">\n</span>%s&quot;</span>%<span class="kw3">traceback</span>.<span class="me1">format_exc</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">return</span> <span class="kw2">False</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">return</span> dstdir</pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">def</span> backup_directory_bz2<span class="br0">&#40;</span>srcdir<span class="sy0">,</span>tarpath<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">if</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">exists</span><span class="br0">&#40;</span>tarpath<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">        exit_stop<span class="br0">&#40;</span><span class="st0">&quot;backup path %s already exists!&quot;</span> % tarpath<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">try</span>:</pre></li><li
class="li1"><pre class="de1">        tar <span class="sy0">=</span> <span class="kw3">tarfile</span>.<span class="kw2">open</span><span class="br0">&#40;</span>tarpath<span class="sy0">,</span> <span class="st0">&quot;w:bz2&quot;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">for</span> filedir <span class="kw1">in</span> <span class="kw3">os</span>.<span class="me1">listdir</span><span class="br0">&#40;</span>srcdir<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">           tar.<span class="me1">add</span><span class="br0">&#40;</span><span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">join</span><span class="br0">&#40;</span>srcdir<span class="sy0">,</span>filedir<span class="br0">&#41;</span><span class="sy0">,</span>arcname<span class="sy0">=</span>filedir<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        tar.<span class="me1">close</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">except</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;Error while creating tar archive: %s&quot;</span> % tarpath</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;Traceback:<span class="es0">\n</span>%s&quot;</span>%<span class="kw3">traceback</span>.<span class="me1">format_exc</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">return</span> <span class="kw2">False</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">return</span> tarpath</pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">def</span> backup_directory_7zip<span class="br0">&#40;</span>srcdir<span class="sy0">,</span>arcpath<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">if</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">exists</span><span class="br0">&#40;</span>arcpath<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">        exit_stop<span class="br0">&#40;</span><span class="st0">&quot;backup path %s already exists!&quot;</span> % arcpath<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">try</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="co1"># -mx9 means maximum compression</span></pre></li><li
class="li1"><pre class="de1">        arglist <span class="sy0">=</span> <span class="br0">&#91;</span>SEVENZIPPATH<span class="sy0">,</span><span class="st0">&quot;a&quot;</span><span class="sy0">,</span>arcpath<span class="sy0">,</span><span class="st0">&quot;*&quot;</span><span class="sy0">,</span><span class="st0">&quot;-r&quot;</span><span class="sy0">,</span><span class="st0">&quot;-mx9&quot;</span><span class="br0">&#93;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="br0">&#40;</span><span class="st0">&quot;try running cmd:<span class="es0">\n</span> %s<span class="es0">\n</span>in directory<span class="es0">\n</span> %s&quot;</span> %</pre></li><li
class="li1"><pre class="de1">            <span class="br0">&#40;</span><span class="st0">' '</span>.<span class="me1">join</span><span class="br0">&#40;</span>arglist<span class="br0">&#41;</span><span class="sy0">,</span>srcdir<span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="co1"># run 7zip (in the directory to be backupped!)</span></pre></li><li
class="li1"><pre class="de1">        sp <span class="sy0">=</span> <span class="kw3">subprocess</span>.<span class="me1">Popen</span><span class="br0">&#40;</span></pre></li><li
class="li1"><pre class="de1">            args<span class="sy0">=</span>arglist<span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">            stdout<span class="sy0">=</span><span class="kw3">subprocess</span>.<span class="me1">PIPE</span><span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">            stderr<span class="sy0">=</span><span class="kw3">subprocess</span>.<span class="me1">PIPE</span><span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">            cwd<span class="sy0">=</span>srcdir<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">except</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;Error while running 7zip subprocess. Traceback:&quot;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;Traceback:<span class="es0">\n</span>%s&quot;</span>%<span class="kw3">traceback</span>.<span class="me1">format_exc</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">return</span> <span class="kw2">False</span></pre></li><li
class="li1"><pre class="de1">    <span class="co1"># wait for process to terminate, get stdout and stderr</span></pre></li><li
class="li1"><pre class="de1">    stdout<span class="sy0">,</span> stderr <span class="sy0">=</span> sp.<span class="me1">communicate</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">if</span> stdout:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="br0">&#40;</span><span class="st0">&quot;<span class="es0">\n</span>&gt;&gt;&gt; 7zip subprocess STDOUT START:<span class="es0">\n</span>%s&quot;</span></pre></li><li
class="li1"><pre class="de1">                <span class="st0">&quot;&gt;&gt;&gt; 7zip subprocess STDOUT END<span class="es0">\n</span>&quot;</span> % stdout<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">if</span> stderr:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;7zip STDERR:<span class="es0">\n</span>%s&quot;</span> % stderr</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">return</span> <span class="kw2">False</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">return</span> arcpath</pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">def</span> any_key<span class="br0">&#40;</span><span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">print</span> <span class="st0">&quot;Press any key to continue.&quot;</span></pre></li><li
class="li1"><pre class="de1">    getch<span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">def</span> exit_stop<span class="br0">&#40;</span>exitstring<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">print</span> exitstring</pre></li><li
class="li1"><pre class="de1">    any_key<span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw3">sys</span>.<span class="me1">exit</span><span class="br0">&#40;</span>exitstring<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">def</span> so_flushwr<span class="br0">&#40;</span><span class="kw3">string</span><span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="kw3">sys</span>.<span class="me1">stdout</span>.<span class="me1">write</span><span class="br0">&#40;</span><span class="kw3">string</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw3">sys</span>.<span class="me1">stdout</span>.<span class="me1">flush</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co1"># provide getch() method</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># (http://stackoverflow.com/questions/1394956/how-to-do-hit-any-key-in-python</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">try</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="co1"># Win32</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">from</span> <span class="kw3">msvcrt</span> <span class="kw1">import</span> getch</pre></li><li
class="li1"><pre class="de1"><span class="kw1">except</span> <span class="kw2">ImportError</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="co1"># UNIX</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">def</span> getch<span class="br0">&#40;</span><span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">import</span> <span class="kw3">sys</span><span class="sy0">,</span> <span class="kw3">tty</span><span class="sy0">,</span> <span class="kw3">termios</span></pre></li><li
class="li1"><pre class="de1">        fd <span class="sy0">=</span> <span class="kw3">sys</span>.<span class="me1">stdin</span>.<span class="me1">fileno</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        old <span class="sy0">=</span> <span class="kw3">termios</span>.<span class="me1">tcgetattr</span><span class="br0">&#40;</span>fd<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">try</span>:</pre></li><li
class="li1"><pre class="de1">            <span class="kw3">tty</span>.<span class="me1">setraw</span><span class="br0">&#40;</span>fd<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">            <span class="kw1">return</span> <span class="kw3">sys</span>.<span class="me1">stdin</span>.<span class="me1">read</span><span class="br0">&#40;</span><span class="nu0">1</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">finally</span>:</pre></li><li
class="li1"><pre class="de1">            <span class="kw3">termios</span>.<span class="me1">tcsetattr</span><span class="br0">&#40;</span>fd<span class="sy0">,</span> <span class="kw3">termios</span>.<span class="me1">TCSADRAIN</span><span class="sy0">,</span> old<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co1"># build timestring, check settings and invoke corresponding backup function</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">print</span> <span class="st0">&quot;*********************************************************************&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">print</span> <span class="st0">&quot;* snapshot backup script by Jan-Philip Gehrcke -- http://gehrcke.de *&quot;</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">print</span> <span class="st0">&quot;*********************************************************************<span class="es0">\n</span>&quot;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1">timestr <span class="sy0">=</span> <span class="kw3">time</span>.<span class="me1">strftime</span><span class="br0">&#40;</span><span class="st0">&quot;_%y%m%d_%H%M%S&quot;</span><span class="sy0">,</span><span class="kw3">time</span>.<span class="me1">localtime</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">if</span> METHOD <span class="kw1">not</span> <span class="kw1">in</span> <span class="br0">&#91;</span><span class="st0">&quot;SIMPLE&quot;</span><span class="sy0">,</span> <span class="st0">&quot;BZ2&quot;</span><span class="sy0">,</span> <span class="st0">&quot;7zip&quot;</span><span class="br0">&#93;</span>:</pre></li><li
class="li1"><pre class="de1">    exit_stop<span class="br0">&#40;</span><span class="st0">&quot;METHOD not 'SIMPLE' OR 'BZ2' OR '7zip'&quot;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">if</span> <span class="kw1">not</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">exists</span><span class="br0">&#40;</span>ORIG_DIR<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    exit_stop<span class="br0">&#40;</span><span class="st0">&quot;ORIG_DIR does not exist: %s&quot;</span> % <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">abspath</span><span class="br0">&#40;</span>ORIG_DIR<span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">if</span> <span class="kw1">not</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">exists</span><span class="br0">&#40;</span>BACKUP_DIR<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    exit_stop<span class="br0">&#40;</span><span class="st0">&quot;BACKUP_DIR does not exist: %s&quot;</span> % <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">abspath</span><span class="br0">&#40;</span>BACKUP_DIR<span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">else</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">print</span> <span class="br0">&#40;</span><span class="st0">&quot;write snapshot of<span class="es0">\n</span>  %s<span class="es0">\n</span>to<span class="es0">\n</span>  %s<span class="es0">\n</span>using the %s method...<span class="es0">\n</span>&quot;</span> %</pre></li><li
class="li1"><pre class="de1">            <span class="br0">&#40;</span><span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">abspath</span><span class="br0">&#40;</span>ORIG_DIR<span class="br0">&#41;</span><span class="sy0">,</span><span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">abspath</span><span class="br0">&#40;</span>BACKUP_DIR<span class="br0">&#41;</span><span class="sy0">,</span>METHOD<span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">if</span> METHOD <span class="sy0">==</span> <span class="st0">&quot;SIMPLE&quot;</span>:</pre></li><li
class="li1"><pre class="de1">        rv <span class="sy0">=</span> backup_directory_simple<span class="br0">&#40;</span>srcdir<span class="sy0">=</span>ORIG_DIR<span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">            dstdir<span class="sy0">=</span><span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">join</span><span class="br0">&#40;</span>BACKUP_DIR<span class="sy0">,</span> BACKUP_PREFIX + timestr<span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">elif</span> METHOD <span class="sy0">==</span> <span class="st0">&quot;BZ2&quot;</span>:</pre></li><li
class="li1"><pre class="de1">        rv <span class="sy0">=</span> backup_directory_bz2<span class="br0">&#40;</span>srcdir<span class="sy0">=</span>ORIG_DIR<span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">            tarpath<span class="sy0">=</span><span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">join</span><span class="br0">&#40;</span>BACKUP_DIR<span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">                BACKUP_PREFIX + timestr + <span class="st0">&quot;.tar.bz2&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">else</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">try</span>:</pre></li><li
class="li1"><pre class="de1">            <span class="kw1">if</span> <span class="kw1">not</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">exists</span><span class="br0">&#40;</span>SEVENZIPPATH<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">                exit_stop<span class="br0">&#40;</span><span class="st0">&quot;7zip executable not found: %s&quot;</span> % SEVENZIPPATH<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">except</span> <span class="kw2">NameError</span>:</pre></li><li
class="li1"><pre class="de1">            exit_stop<span class="br0">&#40;</span><span class="st0">&quot;variable SEVENZIPPATH not defined&quot;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        rv <span class="sy0">=</span> backup_directory_7zip<span class="br0">&#40;</span>srcdir<span class="sy0">=</span><span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">abspath</span><span class="br0">&#40;</span>ORIG_DIR<span class="br0">&#41;</span><span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">            arcpath<span class="sy0">=</span><span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">abspath</span><span class="br0">&#40;</span><span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">join</span><span class="br0">&#40;</span>BACKUP_DIR<span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">                BACKUP_PREFIX + timestr + <span class="st0">&quot;.7z&quot;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">if</span> rv:</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">print</span> <span class="st0">&quot;Snapshot successfully written to<span class="es0">\n</span>  %s&quot;</span> % <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">abspath</span><span class="br0">&#40;</span>rv<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">else</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">print</span> <span class="st0">&quot;Failure during backup :-(&quot;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">if</span> <span class="st0">'ADDITIONAL_BACKUP_DIR'</span> <span class="kw1">in</span> <span class="kw2">globals</span><span class="br0">&#40;</span><span class="br0">&#41;</span> <span class="kw1">and</span> rv:</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">if</span> <span class="kw1">not</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">exists</span><span class="br0">&#40;</span>ADDITIONAL_BACKUP_DIR<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">        exit_stop<span class="br0">&#40;</span><span class="br0">&#40;</span><span class="st0">&quot;ADDITIONAL_BACKUP_DIR does not exist: %s&quot;</span></pre></li><li
class="li1"><pre class="de1">            % <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">abspath</span><span class="br0">&#40;</span>ADDITIONAL_BACKUP_DIR<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    so_flushwr<span class="br0">&#40;</span><span class="st0">&quot;<span class="es0">\n</span>write additional backup to %s..&quot;</span> % ADDITIONAL_BACKUP_DIR<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">try</span>:</pre></li><li
class="li1"><pre class="de1">        dst <span class="sy0">=</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">join</span><span class="br0">&#40;</span>ADDITIONAL_BACKUP_DIR<span class="sy0">,</span><span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">basename</span><span class="br0">&#40;</span>rv<span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">if</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">isdir</span><span class="br0">&#40;</span>rv<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">            <span class="kw3">shutil</span>.<span class="me1">copytree</span><span class="br0">&#40;</span>rv<span class="sy0">,</span>dst<span class="br0">&#41;</span> <span class="co1"># simple method, copy directory tree</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">else</span>:</pre></li><li
class="li1"><pre class="de1">            <span class="kw3">shutil</span>.<span class="kw3">copy</span><span class="br0">&#40;</span>rv<span class="sy0">,</span>dst<span class="br0">&#41;</span> <span class="co1"># copy 7zip or bz2 archive</span></pre></li><li
class="li1"><pre class="de1">        so_flushwr<span class="br0">&#40;</span><span class="st0">&quot;success<span class="es0">\n</span>&quot;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">except</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;Traceback:<span class="es0">\n</span>%s&quot;</span>%<span class="kw3">traceback</span>.<span class="me1">format_exc</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;Additional backup not written. For diagnosis look above.&quot;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1">any_key<span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li></ol></div></div></div></div></div></div></div> ]]></content:encoded> <wfw:commentRss>http://gehrcke.de/2010/02/python-backupsnapshot-script-with-compression-bz2-7zip/feed/</wfw:commentRss> <slash:comments>7</slash:comments> </item> <item><title>Just for fun: binary to ascii in Python</title><link>http://gehrcke.de/2010/01/just-for-fun-binary-to-ascii-in-python/</link> <comments>http://gehrcke.de/2010/01/just-for-fun-binary-to-ascii-in-python/#comments</comments> <pubDate>Tue, 26 Jan 2010 17:08:57 +0000</pubDate> <dc:creator>Jan-Philip Gehrcke</dc:creator> <category><![CDATA[Python]]></category> <category><![CDATA[Technical Stuff]]></category> <guid
isPermaLink="false">http://gehrcke.de/?p=1118</guid> <description><![CDATA[<p>0110100001110100011101000111000000111010001011110010111101111000011010110110001101100100001011100110001101101111011011010010111100110101001100010011100000101111</p><p>Seriously, you really want to know what&#8217;s behind this!</p><p>BuHa.info&#8217;s New Year&#8217;s Challenge gave me the feeling that I quickly have to write a Python script, decoding ones and zeros to text, using ASCII code</p> import sys&#160;# define file pointers and bytelengthfile_in = open&#40;&#34;bin2txt_in.txt&#34;&#41;file_out = open&#40;&#34;bin2txt_out.txt&#34;,&#34;wb&#34;&#41;bl = 8&#160;# read input file; remove whitespaces# [...]]]></description> <content:encoded><![CDATA[<blockquote><p>0110100001110100011101000111000000111010001011110010111101111000011010110110001101100100001011100110001101101111011011010010111100110101001100010011100000101111</p></blockquote><p>Seriously, you really want to know what&#8217;s behind this!<span
id="more-1118"></span></p><p><a
href="http://www.buha.info/permalink/101/auf-die-plaetze-fertig">BuHa.info&#8217;s New Year&#8217;s Challenge</a> gave me the feeling that I quickly have to write a Python script, decoding ones and zeros to text, using ASCII code <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /></p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="python"><ol><li
class="li1"><pre class="de1"><span class="kw1">import</span> <span class="kw3">sys</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co1"># define file pointers and bytelength</span></pre></li><li
class="li1"><pre class="de1">file_in <span class="sy0">=</span> <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;bin2txt_in.txt&quot;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">file_out <span class="sy0">=</span> <span class="kw2">open</span><span class="br0">&#40;</span><span class="st0">&quot;bin2txt_out.txt&quot;</span><span class="sy0">,</span><span class="st0">&quot;wb&quot;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">bl <span class="sy0">=</span> <span class="nu0">8</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co1"># read input file; remove whitespaces</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># make list of bits (ints) from input</span></pre></li><li
class="li1"><pre class="de1">bitlist <span class="sy0">=</span> <span class="kw2">map</span><span class="br0">&#40;</span><span class="kw2">int</span><span class="sy0">,</span><span class="st0">''</span>.<span class="me1">join</span><span class="br0">&#40;</span>file_in.<span class="me1">read</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">split</span><span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">if</span> <span class="kw2">len</span><span class="br0">&#40;</span>bitlist<span class="br0">&#41;</span>%bl <span class="sy0">!=</span> <span class="nu0">0</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="kw3">sys</span>.<span class="me1">exit</span><span class="br0">&#40;</span><span class="st0">&quot;length of bitlist not integer multiple of %s&quot;</span> % bl<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co1"># assemble resultstring:</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># - make list of bytes from `bitlist`</span></pre></li><li
class="li1"><pre class="de1"><span class="co1"># - evaluate each byte -&gt; int -&gt; ascii char</span></pre></li><li
class="li1"><pre class="de1">rs <span class="sy0">=</span> <span class="st0">''</span>.<span class="me1">join</span><span class="br0">&#40;</span><span class="br0">&#91;</span><span class="kw2">chr</span><span class="br0">&#40;</span><span class="kw2">sum</span><span class="br0">&#40;</span>bit<span class="sy0">&lt;&lt;</span><span class="kw2">abs</span><span class="br0">&#40;</span>idx-bl<span class="br0">&#41;</span>-<span class="nu0">1</span> <span class="kw1">for</span> idx<span class="sy0">,</span>bit <span class="kw1">in</span> <span class="kw2">enumerate</span><span class="br0">&#40;</span>y<span class="br0">&#41;</span><span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">               <span class="kw1">for</span> y <span class="kw1">in</span> <span class="kw2">zip</span><span class="br0">&#40;</span>*<span class="br0">&#91;</span>bitlist<span class="br0">&#91;</span>x::bl<span class="br0">&#93;</span> <span class="kw1">for</span> x <span class="kw1">in</span> <span class="kw2">range</span><span class="br0">&#40;</span>bl<span class="br0">&#41;</span><span class="br0">&#93;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">              <span class="br0">&#93;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="co1"># write output</span></pre></li><li
class="li1"><pre class="de1">file_out.<span class="me1">write</span><span class="br0">&#40;</span>rs<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">file_out.<span class="me1">close</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li></ol></div></div></div></div></div></div></div><p>Put the data above in <code>bin2txt_in.txt</code>, run the script(*) in the same directory and read <code>bin2txt_out.txt</code>&#8217;s content. <strong>You won&#8217;t regret</strong>.</p><p><small><br
/> (*)For Windows users not knowing how to run the script:<br
/> - <a
href="http://www.python.org/ftp/python/2.6.4/python-2.6.4.msi">Download Python</a> &#038; install it.<br
/> - Copy the script above into a simple text file with the extension <code>.py</code> and doubleclick to run.</p><p>And, yeah, you&#8217;re right, this is senseless (except for the programming exercise). And.. true, you can decode this using online converters <a
href="http://www.roubaixinteractive.com/PlayGround/Binary_Conversion/Binary_To_Text.asp">like this one here</a>.</small></p> ]]></content:encoded> <wfw:commentRss>http://gehrcke.de/2010/01/just-for-fun-binary-to-ascii-in-python/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>Convert all PDF files in a directory to PNG images</title><link>http://gehrcke.de/2009/10/convert-all-pdfs-in-a-directory-to-pngs/</link> <comments>http://gehrcke.de/2009/10/convert-all-pdfs-in-a-directory-to-pngs/#comments</comments> <pubDate>Thu, 29 Oct 2009 17:40:45 +0000</pubDate> <dc:creator>Jan-Philip Gehrcke</dc:creator> <category><![CDATA[Ghostscript]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[Technical Stuff]]></category> <guid
isPermaLink="false">http://gehrcke.de/?p=1015</guid> <description><![CDATA[<p>I just needed to convert several PDF graphics to PNG raster graphics. The cleanest way to do this is via Ghostscript (example: gs -sDEVICE=png16m -sOutputFile=tiger.png tiger.pdf). For convenience I made a Python script that converts all PDF files in the working directory to PNG files via Ghostscript. I use it under Windows, but this works [...]]]></description> <content:encoded><![CDATA[<p>I just needed to convert several PDF graphics to PNG raster graphics. The cleanest way to do this is via <a
href="http://www.ghostscript.com/">Ghostscript</a> (example: <code>gs -sDEVICE=png16m -sOutputFile=tiger.png tiger.pdf</code>). For convenience I made a <a
href="http://python.org/">Python</a> script that converts all PDF files in the working directory to PNG files via Ghostscript. I use it under Windows, but this works for &#8220;all&#8221; operating systems. Let me share it with you.<span
id="more-1015"></span></p><h6>Python script:</h6><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="python"><ol><li
class="li1"><pre class="de1"><span class="kw1">import</span> <span class="kw3">subprocess</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">import</span> <span class="kw3">os</span></pre></li><li
class="li1"><pre class="de1"><span class="kw1">import</span> <span class="kw3">traceback</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1">GHOSTSCRIPTPATH <span class="sy0">=</span> <span class="st0">&quot;C:/Program Files (x86)/gs/gs8.70/bin/gswin32c.exe&quot;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">def</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    cwd <span class="sy0">=</span> <span class="kw3">os</span>.<span class="me1">getcwd</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">for</span> filename <span class="kw1">in</span> <span class="kw3">os</span>.<span class="me1">listdir</span><span class="br0">&#40;</span>cwd<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="br0">&#40;</span>name<span class="sy0">,</span> ext<span class="br0">&#41;</span> <span class="sy0">=</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">splitext</span><span class="br0">&#40;</span>filename<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">if</span> ext.<span class="me1">lower</span><span class="br0">&#40;</span><span class="br0">&#41;</span>.<span class="me1">endswith</span><span class="br0">&#40;</span><span class="st0">&quot;pdf&quot;</span><span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">            <span class="kw1">print</span> <span class="st0">&quot;*** FOUND %s&quot;</span> % filename</pre></li><li
class="li1"><pre class="de1">            convert_pdf_file_to_png<span class="br0">&#40;</span>GHOSTSCRIPTPATH<span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">                <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">join</span><span class="br0">&#40;</span>cwd<span class="sy0">,</span> filename<span class="br0">&#41;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">def</span> convert_pdf_file_to_png<span class="br0">&#40;</span>ghostscriptpath<span class="sy0">,</span> pdffilepath<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">if</span> <span class="kw1">not</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">isfile</span><span class="br0">&#40;</span>pdffilepath<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;%s is not a file&quot;</span> % pdffilepath</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">return</span> <span class="kw2">False</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">if</span> <span class="kw1">not</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">isfile</span><span class="br0">&#40;</span>ghostscriptpath<span class="br0">&#41;</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;%s is not a file&quot;</span> % ghostscriptpath</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">return</span> <span class="kw2">False</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1">    pdffiledir <span class="sy0">=</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">dirname</span><span class="br0">&#40;</span>pdffilepath<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    pdffilename <span class="sy0">=</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">basename</span><span class="br0">&#40;</span>pdffilepath<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    pdfname<span class="sy0">,</span> ext <span class="sy0">=</span> <span class="kw3">os</span>.<span class="me1">path</span>.<span class="me1">splitext</span><span class="br0">&#40;</span>pdffilepath<span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">try</span>:    </pre></li><li
class="li1"><pre class="de1">        <span class="co1"># change the &quot;-rXXX&quot; option to set the PNG's resolution.</span></pre></li><li
class="li1"><pre class="de1">        <span class="co1"># -&gt; http://ghostscript.com/doc/current/Devices.htm#File_formats</span></pre></li><li
class="li1"><pre class="de1">        <span class="co1"># for other commandline options see</span></pre></li><li
class="li1"><pre class="de1">        <span class="co1"># http://ghostscript.com/doc/current/Use.htm#Options</span></pre></li><li
class="li1"><pre class="de1">        arglist <span class="sy0">=</span> <span class="br0">&#91;</span>ghostscriptpath<span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">                  <span class="st0">&quot;-dBATCH&quot;</span><span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">                  <span class="st0">&quot;-dNOPAUSE&quot;</span><span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">                  <span class="st0">&quot;-sOutputFile=%s.png&quot;</span> % pdfname<span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">                  <span class="st0">&quot;-sDEVICE=png16m&quot;</span><span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">                  <span class="st0">&quot;-r800&quot;</span><span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">                  pdffilepath<span class="br0">&#93;</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;try running cmd:<span class="es0">\n</span>%s&quot;</span> % arglist</pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1">        sp <span class="sy0">=</span> <span class="kw3">subprocess</span>.<span class="me1">Popen</span><span class="br0">&#40;</span></pre></li><li
class="li1"><pre class="de1">            args<span class="sy0">=</span>arglist<span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">            stdout<span class="sy0">=</span><span class="kw3">subprocess</span>.<span class="me1">PIPE</span><span class="sy0">,</span></pre></li><li
class="li1"><pre class="de1">            stderr<span class="sy0">=</span><span class="kw3">subprocess</span>.<span class="me1">PIPE</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">except</span>:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;Error while running Ghostscript subprocess. Traceback:&quot;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;Traceback:<span class="es0">\n</span>%s&quot;</span>%<span class="kw3">traceback</span>.<span class="me1">format_exc</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">        <span class="kw1">return</span> <span class="kw2">False</span></pre></li><li
class="li1"><pre class="de1">    <span class="co1"># wait for process to terminate, get stdout and stderr</span></pre></li><li
class="li1"><pre class="de1">    stdout<span class="sy0">,</span> stderr <span class="sy0">=</span> sp.<span class="me1">communicate</span><span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">print</span> <span class="st0">&quot;Ghostscript subprocess STDOUT:<span class="es0">\n</span>%s&quot;</span> % stdout</pre></li><li
class="li1"><pre class="de1">    <span class="kw1">if</span> stderr:</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">print</span> <span class="st0">&quot;Ghostscript STDERR:<span class="es0">\n</span>%s&quot;</span> % stderr</pre></li><li
class="li1"><pre class="de1">        <span class="kw1">return</span> <span class="kw2">False</span></pre></li><li
class="li1"><pre class="de1">    <span class="kw1">return</span> <span class="kw2">True</span></pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1">&nbsp;</pre></li><li
class="li1"><pre class="de1"><span class="kw1">if</span> __name__ <span class="sy0">==</span> <span class="st0">&quot;__main__&quot;</span>:</pre></li><li
class="li1"><pre class="de1">    main<span class="br0">&#40;</span><span class="br0">&#41;</span></pre></li></ol></div></div></div></div></div></div></div><h6>Usage:</h6><ul><li>Download &#038; Install Python 2.6.x: <a
href="http://python.org/download/">http://python.org/download/</a> (should work with Python 2.4 and above, but not with Python 3)</li><li>Download &#038; Install Ghostscript: <a
href="http://mirror.cs.wisc.edu/pub/mirrors/ghost/GPL/current/">http://mirror.cs.wisc.edu/pub/mirrors/ghost/GPL/current/</a></li><li>Copy the script source from above, save it as e.g. <code>make_all_pdfs_to_png.pyw</code></li><li>adjust the variable <code>GHOSTSCRIPTPATH</code> in line 5 of the script to the location of your Ghostscript executable. Even for Windows, it&#8217;s okay to use normal slashes in the path (or escaped backslashes: <code>\\</code>).</li><li>If needed, change the resolution option &#8220;<code>-rXXX</code>&#8221; in line 38 of the script. This determines the resolution of the PNGs (read in the Ghostscript documentation about the exact meaning of the options &#8212; links to the docs are given within the source above).</li><li>Now the Python script is ready to use. Copy it to the directory containing the PDF files you want to convert and start it (under Windows: simply double click the script file). The corresponding PNG files should be created now.</li></ul> ]]></content:encoded> <wfw:commentRss>http://gehrcke.de/2009/10/convert-all-pdfs-in-a-directory-to-pngs/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> <item><title>Google Summer of Code end: code upload and acknowledgement</title><link>http://gehrcke.de/2009/08/google-summer-of-code-end-code-upload-and-acknowledgement/</link> <comments>http://gehrcke.de/2009/08/google-summer-of-code-end-code-upload-and-acknowledgement/#comments</comments> <pubDate>Mon, 24 Aug 2009 19:17:01 +0000</pubDate> <dc:creator>Jan-Philip Gehrcke</dc:creator> <category><![CDATA[Amazon Web Services]]></category> <category><![CDATA[CernVM]]></category> <category><![CDATA[Clobi]]></category> <category><![CDATA[General]]></category> <category><![CDATA[GSoC 2009]]></category> <category><![CDATA[Nimbus]]></category> <category><![CDATA[Personal Stuff]]></category> <category><![CDATA[Python]]></category> <guid
isPermaLink="false">http://gehrcke.de/?p=929</guid> <description><![CDATA[<p>The Google Summer of Code 2009 final evaluation deadline is today; 19 UTC. I don&#8217;t have time to summarize my summer here now, but there are two things I want to say to the world. First, I want to thank many people for enriching my summer. Second, I would like to announce the Clobi project [...]]]></description> <content:encoded><![CDATA[<p>The <em>Google Summer of Code 2009</em> final evaluation deadline is today; 19 UTC. I don&#8217;t have time to summarize my summer here now, but there are two things I want to say to the world. First, I want to thank many people for enriching my summer. Second, I would like to announce the <em>Clobi</em> project on <em>Google Code</em>.<span
id="more-929"></span></p><h4>Acknowledgement</h4><ul><li><a
href="http://www.mcs.anl.gov/~keahey/">Kate Keahey</a> (<a
href="http://www.anl.gov/">Argnonne National Laboratory</a>/<a
href="http://globus.org/">The Globus Alliance</a>/<a
href="http://workspace.globus.org/">Nimbus team</a>).<br
/> She offered me a fabulous mentorship. I assess this at its true worth by looking at what I&#8217;ve learned throughout the summer just because of these great conversations <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . Thank you for all your support and dedication and for pushing me and the project forwards!</li></ul><ul><li><a
href="http://www.mcs.anl.gov/~tfreeman/">Tim Freeman</a> and <a
href="http://www.linkedin.com/pub/david-labissoniere/b/888/663">David LaBissoniere</a> (<a
href="http://www.anl.gov/">Argnonne National Laboratory</a>/<a
href="http://globus.org/">The Globus Alliance</a>/<a
href="http://workspace.globus.org/">Nimbus team</a>).<br
/> Wonderful, extensive and patient extra-premium-support-with-special-treatment regarding <em>Nimbus/Workspace</em> in the MUD. Thank you so much! You smoothed my technical way through the project.</li></ul><ul><li><a
href="https://excess.org/">Ian Ward</a> (creator of <a
href="http://excess.org/urwid">urwid</a>, a console user interface library for <em>Python</em>).<br
/> He supported me so great while I was implementing the user interface for <em>Clobi&#8217;s Resource Manager </em>using <em>urwid</em>&#8216;s new <em>SelectEventLoop</em> technology. Talking to him saved so much valuable time. Thank you!</li></ul><ul><li><a
href="http://www.elastician.com/">Mitchell Garnaat</a> (the creator of <a
href="http://code.google.com/p/boto/">boto</a>, a Python interface to Amazon Web Services) and the <a
href="http://groups.google.com/group/boto-users">boto users mailinglist</a>.<br
/> Great and essential support since almost one year now. Thank you very much for answering many questions!</li></ul><ul><li><a
href="https://savannah.cern.ch/users/pbuncic">Predrag Buncic</a> (<a
href="http://cernvm.cern.ch/cernvm/">CernVM</a>) and <a
href="http://www.linkedin.com/in/artemharutyunyan">Artem Harutyunyan</a> (<a
href="http://aliceinfo.cern.ch/Collaboration/index.html">ALICE@LHC</a>).<br
/> Thank you <a
href="http://gehrcke.de/2009/06/cernvm-on-nimbusec2-public-key-injection-problem/">for your support</a> regarding <em>CernVM</em> on <em>Nimbus</em>!</li></ul><ul><li><a
href="http://www.linkedin.com/pub/jakub-moscicki/0/629/b79">Jakub Moscicki</a>, <a
href="http://www3.imperial.ac.uk/people/u.egede">Ulrik Egede</a>, <a
href="http://homepages.physik.uni-muenchen.de/~Johannes.Elmsheuser/contact.html">Johannes Elmsheuser</a> and the rest of the <a
href="http://ganga.web.cern.ch/ganga/">Ganga</a> crew.<br
/> Thanks a lot for very important, effective and efficient support concerning the system behind <em>Clobi</em> and <em>Clobi&#8217;s Ganga backend</em>. You are great!</li></ul><ul><li><a
href="http://www.linkedin.com/pub/stefan-kluth/4/72a/971">Stefan Klut</a>h and <a
href="http://www.linkedin.com/profile?viewProfile=&#038;key=15670615">Stefan Stonjek</a> (<a
href="http://atlas.ch/">ATLAS@LHC</a>, <a
href="http://www.mpp.mpg.de/">Max-Planck-Institut für Physik, Munich</a>)<br
/> Thank you for many helpful discussions, support and beautiful times in Munich!</li></ul><ul><li><a
href="http://people.cs.uchicago.edu/~borja/">Borja Sotomayor</a> (<a
href="http://www.uchicago.edu/">University of Chicago</a>, <a
href="http://globus.org">Globus Alliance</a>).<br
/> He made a great job as GSoC mentoring organization administrator. Thank you for this and for introducing the MUD <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .</li></ul><ul><li>Paul D Marshall.<br
/> Thank you for discussions regarding dynamical deployment of computing resources.</li></ul><ul><li><a
href=" http://www.linkedin.com/pub/xiaoming-gao/9/a90/348">Xiaoming Gao</a>.<br
/> Thank you for the exciting cooperation regarding <em>Virtual Block Stores</em> for Nimbus.</li></ul><ul><li><a
href="http://en.wikipedia.org/wiki/Alex_Martelli">Alex Martelli</a>.<br
/> He answered <a
href="http://stackoverflow.com/questions/1185660/python-is-os-read-os-write-on-an-os-pipe-threadsafe">an important Python question</a> that most people could not answer. Smoothed my way to implement inter thread communication in <em>Clobi&#8217;s Resource Manager</em>.</li></ul><h4>Clobi @ Google Code</h4><p>In <a
href="http://gehrcke.de/2009/08/distribute-high-performance-computing-jobs-among-multiple-computing-clouds/">this blog post</a> I introduced <strong>Clobi</strong>, the result of this <em>Google Summer of Code</em> project. Today, I created a new project page for <em>Clobi</em> on <em>Google Code</em>: <a
href="http://code.google.com/p/clobi/">http://code.google.com/p/clobi/</a></p><p>As a first action, I pushed my local <a
href="http://en.wikipedia.org/wiki/Mercurial_%28software%29">mercurial</a> code repository into the online repository. You can browse the code <a
href="http://code.google.com/p/clobi/source/browse/#hg">here</a> and you can look through the development history (the commits I&#8217;ve made) <a
href="http://code.google.com/p/clobi/source/list">here</a>.</p><p>Then I prepared a test release of all <em>Clobi</em> components. You can get it <a
href="http://code.google.com/p/clobi/downloads/list">in the download section</a>.</p><p>That&#8217;s all about <em>Clobi</em> for the next time. From tomorrow on, I will go on with my master thesis project in Physics (about <em>Magnetic Particle Imaging</em>). Next week, the <a
href="http://www.icmrm10.montana.edu/">ICMRM</a> conference in Montanta starts and I&#8217;ve to make a poster for my contribution (abstract <a
href="gehrcke_ICMRM09_abstr_MPI_MMF2vsSPM_090713.pdf">here</a>).</p><p><strong>I will go on with <em>Clobi</em></strong>, when there is more free time <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> .</p> ]]></content:encoded> <wfw:commentRss>http://gehrcke.de/2009/08/google-summer-of-code-end-code-upload-and-acknowledgement/feed/</wfw:commentRss> <slash:comments>2</slash:comments> </item> <item><title>new system successfully tested: &#8220;Distribution of High Performance Computing Jobs among Multiple Computing Clouds&#8221;</title><link>http://gehrcke.de/2009/08/distribute-high-performance-computing-jobs-among-multiple-computing-clouds/</link> <comments>http://gehrcke.de/2009/08/distribute-high-performance-computing-jobs-among-multiple-computing-clouds/#comments</comments> <pubDate>Tue, 18 Aug 2009 12:14:26 +0000</pubDate> <dc:creator>Jan-Philip Gehrcke</dc:creator> <category><![CDATA[Amazon Web Services]]></category> <category><![CDATA[ATLAS Software]]></category> <category><![CDATA[CernVM]]></category> <category><![CDATA[Clobi]]></category> <category><![CDATA[GSoC 2009]]></category> <category><![CDATA[Linux]]></category> <category><![CDATA[Nimbus]]></category> <category><![CDATA[Python]]></category> <category><![CDATA[Science]]></category> <category><![CDATA[Technical Stuff]]></category> <guid
isPermaLink="false">http://gehrcke.de/?p=794</guid> <description><![CDATA[<p>Hello you out there!</p><p>I just started running the first serious test of the system I&#8217;ve developed during this year&#8217;s Google Summer of Code. If I wanted to put it in sensational words, the test could be called &#8220;Distribution of Particle Physics High Performance Computing Jobs among Multiple Computing Clouds&#8221;; just to get some readers [...]]]></description> <content:encoded><![CDATA[<p>Hello you out there!</p><p>I just started running the first serious test of the system I&#8217;ve developed during this year&#8217;s <a
href="http://code.google.com/soc/">Google Summer of Code</a>. If I wanted to put it in sensational words, the test could be called <strong>&#8220;Distribution of Particle Physics High Performance Computing Jobs among Multiple Computing Clouds&#8221;</strong>; just to get some readers <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . During the test, there will be some time I just sit around and watch my monitor, so I decided to share my experience about the new system with you and keep record of the test progress within this blog post.<span
id="more-794"></span></p><p><strong>Content</strong> (don&#8217;t worry, the sections are short):</p><ul><li><a
href="#intro">0 Introduction</a></li><li><a
href="#prep">1 Preparation</a></li><li><a
href="#start">2 VM startup</a></li><li><a
href="#sessmon">3 Session monitoring (number of Job Agents)</a></li><li><a
href="#submit">4 Job submission</a></li><li><a
href="#sessmon2">5 Session monitoring (number of jobs)</a></li><li><a
href="#jobmon">6 Job monitoring</a></li><li><a
href="#jobcomplete">7 Job completion, output receipt</a></li><li><a
href="#output">8 Examine output</a></li><li><a
href="#shutdown">9 VM shutdown</a></li><li><a
href="#appendix">10 Appendix</a></li></ul><h4><a
name="intro">0 Introduction</a></h4><p>First of all, I&#8217;ve to introduce the system. This is the longest section.</p><p>It&#8217;s a job scheduling system supporting Virtual Machines (VMs) in multiple <a
href="http://en.wikipedia.org/wiki/Infrastructure_as_a_service">Infrastructure-as-a-Service computing clouds</a>. Because of this, I found the name <strong>&#8220;Clobi&#8221;</strong>, which somehow comes from &#8220;cloud&#8221; and &#8220;combination&#8221;. If you know a better name, then please let me know <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_wink.gif' alt=';-)' class='wp-smiley' /> . Currently, the system is using and supporting <a
href="http://workspace.globus.org/">Nimbus</a> (and it&#8217;s prepared for Cumulus, Nimbus&#8217; storage service) and <a
href="http://aws.amazon.com/">Amazon Web Services</a> (more precisely <a
href="http://aws.amazon.com/ec2/">EC2</a>, <a
href="http://aws.amazon.com/sqs/">SQS</a>, <a
href="http://aws.amazon.com/simpledb/">SimpleDB</a>, <a
href="http://aws.amazon.com/s3/">S3</a>).</p><p>It&#8217;s an &#8220;elastic&#8221; and &#8220;scalable&#8221; job system that can set up a huge computing resource pool almost instantly. VMs are added to or removed from the pool dynamically; based on need and demand. Jobs are submitted and processed using an asynchronous queueing system. An arbitrary number of clients is allowed to submit jobs to an existing resource pool. The basical realiability is inherited from the reliability of the core messaging components Amazon SQS (for queueing) and Amazon SimpleDB (for bookkeeping): to prevent data from being lost or becoming unavailable, it is stored redundantly and geographically dispersed across multiple datacenters.</p><p>Furthermore, the job system&#8217;s components are highly decoupled, which allows single components to fail or to get re-initialized without affecting the others.</p><p>The motivating application for this system is <a
href="https://twiki.cern.ch/twiki/bin/view/Atlas/AtlasComputing">ATLAS Computing</a> (for the <a
href="http://atlas.ch/">ATLAS experiment</a> at <a
href="http://lhc.web.cern.ch/lhc/">LHC</a>, <a
href="http://public.web.cern.ch/public/">CERN</a> (Geneva)): a common ATLAS Computing application (the so-called &#8220;full chain&#8221;) will be run during this test.</p><p>But: the basic system is totally generic and can be used in any case whenever it&#8217;s convenient to distribute jobs among different clouds. This is always the case when one tries to satisfy the basic computing power needs for a low price by e.g. operating an own Nimbus cloud, but wants to able to instantly balance out peaks of desired computing power by simply adding Amazon’s EC2 to the resource pool for a certain amout of time. By using <strong>Clobi</strong>, combining different clouds to one big resource pool becomes very easy.</p><p>These are the main components used during this test:</p><ul><li>a <strong>special ATLAS Software Virtual Machine image</strong> based on <a
href="http://cernvm.cern.ch/cernvm/">CernVM</a>. I placed it on <a
href="http://workspace.globus.org/clouds/nimbus.html">Nimbus Teraport Cloud</a> and (as Amazon Machine Image) on S3 for EC2</li></ul><ul><li>the <strong>Clobi Resource Manager</strong> (observing job queues, starting/killing VMs, &#8230;)</li></ul><ul><li>the <strong>Clobi Job Agent</strong> (running on VMs, polling &#038; running jobs, bookkeeping, &#8230;)</li></ul><ul><li>the <strong>Clobi Job Management Interface</strong> (providing methods to submit / remove / kill / monitor / &#8230; jobs)</li></ul><ul><li>a <a
href="http://ganga.web.cern.ch/ganga/">Ganga</a> <strong>Clobi Backend</strong> (integrates <strong>Clobi</strong> into <a
href="http://ganga.web.cern.ch/ganga/">Ganga</a>, which is &#8220;an easy-to-use frontend for job definition and management&#8221;)</li></ul><p>The meaning of these components will become clearer in the following parts.</p><p>Let me show step by step &#8212; but only very roughly &#8212; how I use <strong>Clobi</strong> within the first serious test. Many details are left out, but you will get it in principle. After reading this blog post, you&#8217;ve an overview about what the system does and what I&#8217;ve actually done during the summer.</p><h4><a
name="prep">1 Preparation</a></h4><p>I&#8217;ve prepared session/cloud configuration files. Using them, I started a new <strong>Clobi session</strong> (a resource pool) with the <strong>Clobi Resource Manager</strong> (It&#8217;s a <a
href="http://www.python.org/">Python</a> application and I use it locally; here on my desktop machine). At first, it does much configuration and initialization stuff, including setting up SQS queues and SimpleDB domains. Interaction with Amazon Web Services is done via the <a
href="http://code.google.com/p/boto/">boto</a> module for Python. After initialization, the Resource Manager offers an interactive mode (it&#8217;s a multi-threaded console application with user interface, built using the <a
href="http://excess.org/urwid">urwid</a> module for Python).</p><h4><a
name="start">2 VM startup</a></h4><p>Using the <strong>Resource Manager</strong>, I started one VM on EC2 and one on Nimbus Teraport Cloud, both based on &#8220;the special ATLAS Software VM&#8221;, containing <a
href="http://atlas-computing.web.cern.ch/atlas-computing/projects/releases/status/">ATLAS Software 15.2.0</a> and the <strong>Clobi Job Agent</strong>. Starting VMs manually is done with a very simple command. The driving forces in the background are boto in case of EC2 and the <a
href="http://workspace.globus.org/vm/TP2.2/dev/reference.html">Nimbus cloud reference client</a>, which I&#8217;ve wrapped and controlled via Python&#8217;s subprocess module. The main loop thread of the <strong>Resource Manager</strong> periodically polls the states of just started EC2 instances and the states of Nimbus client subprocesses to figure out if the instructed actions result in success or not.</p><p>The following screenshot shows the <strong>Resource Manager</strong> in action (you basically see a terminal window). Follow a bit of the log. As you can see, it&#8217;s very easy to run VMs and &#8212; after a certain amout of time &#8212; the <strong>Resource Manager</strong> detects that both VMs have successfully started booting:</p><div
id="attachment_796" class="wp-caption aligncenter" style="width: 222px"><a
href="http://gehrcke.de/wp/wp-content/uploads/001_RM_VMs_started.png" rel="lightbox[794]"><img
src="http://gehrcke.de/wp/wp-content/uploads/001_RM_VMs_started-212x300.png" alt="Clobi Resource Manager showing two started Virtual Machines" title="001_RM_VMs_started" width="212" height="300" class="size-medium wp-image-796" /></a><p
class="wp-caption-text"><strong>Clobi Resource Manager</strong> showing two started Virtual Machines</p></div><p>The EC2 VM needed around 10 minutes to start up, while the Nimbus VM needed 20 minutes. Reason: the AMI is ~10 GB big; the image on Nimbus ~20 GB (wasted space &#038; time, but it&#8217;s just a test..).</p><p>You might say: &#8220;Only two VMs? Boring..&#8221;. I say: I could have taken several hundred. The point is: it would not make any difference, except in cost and in the amount of space used for log files. <strong>Clobi</strong> uses technology / is designed to always work reliably; even in different orders of magnitude. This is often called &#8220;scalable&#8221; or &#8220;elastic&#8221;. Basically, this positive characteristic is inherited from Amazon&#8217;s SQS, SDB and S3, which are used by <strong>Clobi</strong> to do management and control of the system.</p><h4><a
name="sessmon">3 Session monitoring (number of Job Agents)</a></h4><p>I conceal almost all the details of how the components exchange information. But I&#8217;ve to tell the following to you, to not completely confuse you:</p><ul><li>the <strong>Resource Manager</strong> gave some bootstrap information to the VMs.</li></ul><ul><li>the <strong>Job Agent</strong> is automatically invoked on VM operating system startup.</li></ul><p>Using the bootstrap information, each VM&#8217;s <strong>Job Agent</strong> &#8220;registers with SimpleDB&#8221;. The <strong>Resource Manager</strong> has a monitoring functionality to check SimpleDB for running Job Agents:</p><div
id="attachment_803" class="wp-caption aligncenter" style="width: 310px"><a
href="http://gehrcke.de/wp/wp-content/uploads/2009/08/002_RM_JAs_running.png" rel="lightbox[794]"><img
src="http://gehrcke.de/wp/wp-content/uploads/2009/08/002_RM_JAs_running-300x169.png" alt="Clobi Resource Manager showing two started Job Agents" title="002_RM_JAs_running" width="300" height="169" class="size-medium wp-image-803" /></a><p
class="wp-caption-text"><strong>Clobi Resource Manager</strong> showing two started <strong>Clobi Job Agents</strong></p></div><p>Voilà, now it&#8217;s definitely known that both VMs successfully started their <strong>Job Agent&#8217;s</strong>. These start in a &#8220;watching/lurking&#8221; state, periodically polling SQS for jobs.</p><h4><a
name="submit">4 Job submission</a></h4><p>The SimpleDB / SQS / S3 data structure together with <strong>Clobi&#8217;s Job Management Interface</strong> allows to submit jobs with different priorities, to remove jobs, to kill running jobs and to monitor jobs. Furthermore, transmission and receipt of an input/output sandbox archive is possible. This is needed to deliver executables and small input data and to receive small output data as well as stdout/err and other logs.</p><p>I&#8217;ve downloaded Ganga and installed it to my local machine. Then, I&#8217;ve started developing a new &#8220;<strong>Clobi backend</strong>&#8221; to integrate <strong>Clobi&#8217;s Job Management Interface</strong> into Ganga. Using this new backend, it&#8217;s possible to submit/kill/monitor/.. jobs right away from the Ganga interface, using Ganga&#8217;s common job description and management commands.</p><p>To test the system, I&#8217;ve prepared some shellscripts that invoke running <a
href="http://gehrcke.de/2009/06/atlas-software-how-to-run-the-full-chain/">&#8220;The Full Chain&#8221;</a> on the worker nodes. This is a very good test to validate the whole system: it needs some very small input files, only works if the ATLAS Software was set up properly (uoooh.. not trivial!), stresses the VM (the simulation step consumes much CPU power) and leaves some small output files for the output sandbox.</p><p>At Ganga startup, I provided a configuration file containing few but essential information about the <strong>Clobi session</strong> that I&#8217;ve set up before via <strong>Resource Manager</strong>. From this configuration file Ganga&#8217;s <strong>Clobi backend</strong> e.g. knows which SimpleDB domain to query and to which SQS queues the job messages must be submitted. Using this bootstrap information, <strong>an arbitrary number of Gangas could be used from anywhere to submit and manage jobs</strong> within this special <strong>Clobi session</strong>.</p><p>I will now submit the same job (the &#8220;full chain&#8221; thing) three times: the Nimbus VM has two virtual cores and its <strong>Job Agent</strong> will try to receive and run two jobs at the same time. The EC2 VM (m1.small) only has one virtual core. Hence, three jobs are needed to use the VMs to full capacity <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> . This is a screenshot from the Ganga terminal session where I submitted the jobs:</p><div
id="attachment_805" class="wp-caption aligncenter" style="width: 227px"><a
href="http://gehrcke.de/wp/wp-content/uploads/2009/08/003_Ganga_submitted.png" rel="lightbox[794]"><img
src="http://gehrcke.de/wp/wp-content/uploads/2009/08/003_Ganga_submitted-217x300.png" alt="Ganga: job submission via Clobi backend" title="003_Ganga_submitted" width="217" height="300" class="size-medium wp-image-805" /></a><p
class="wp-caption-text">Ganga: job submission via <strong>Clobi backend</strong></p></div><p>The <strong>Clobi</strong> backend successfully did its job: it created three <strong>Clobi</strong> job IDs, submitted three SQS messages and uploaded three input sandbox archives.</p><h4><a
name="sessmon2">5 Session monitoring (number of jobs)</a></h4><p>The <strong>Resource Manager</strong> is able to observe the queues and to determine the number of jobs submitted to them. It recognizes two jobs in the queue for priority 2:<br
/><div
id="attachment_809" class="wp-caption aligncenter" style="width: 310px"><a
href="http://gehrcke.de/wp/wp-content/uploads/2009/08/004_RM_2jobs.png" rel="lightbox[794]"><img
src="http://gehrcke.de/wp/wp-content/uploads/2009/08/004_RM_2jobs-300x170.png" alt="Clobi Resource Manager detected two jobs in the queues" title="004_RM_2jobs" width="300" height="170" class="size-medium wp-image-809" /></a><p
class="wp-caption-text"><strong>Clobi Resource Manager</strong> detected two jobs in the queues</p></div></p><p>Only two? Maybe one <strong>Job Agent</strong> polled a job right away after submission, or maybe the SQS measurement was not exact (this is possible, too). Anyway, few time later there is only one job left in the queues and then they are empty. This means that the <strong>Job Agent</strong> on the Nimbus VM successfully grabbed two jobs and the EC2 VM grabbed one:<br
/><div
id="attachment_812" class="wp-caption aligncenter" style="width: 310px"><a
href="http://gehrcke.de/wp/wp-content/uploads/2009/08/005_RM_0jobs.png" rel="lightbox[794]"><img
src="http://gehrcke.de/wp/wp-content/uploads/2009/08/005_RM_0jobs-300x169.png" alt="Clobi Resource Manager detects zero jobs in the queues" title="005_RM_0jobs" width="300" height="169" class="size-medium wp-image-812" /></a><p
class="wp-caption-text"><strong>Clobi Resource Manager</strong> detects zero jobs in the queues</p></div></p><p>While me and Ganga are waiting for the jobs to finish (this takes some time and Ganga periodically polls the state of the jobs via <strong>Clobi backend / Clobi Job Management Interface</strong>), I use the time to advise you of an important fact: it&#8217;s the objective to automate the observe-queues-and-start/kill-VMs-as-required process in the future. The current <strong>Resource Manager</strong> is very prepared for this. Let me show the monitoring loop to you:<br
/><div
id="attachment_814" class="wp-caption aligncenter" style="width: 301px"><a
href="http://gehrcke.de/wp/wp-content/uploads/2009/08/006_RM_monitoring_loop.png" rel="lightbox[794]"><img
src="http://gehrcke.de/wp/wp-content/uploads/2009/08/006_RM_monitoring_loop-291x300.png" alt="Clobi Resource Manager showing its monitoring loop" title="006_RM_monitoring_loop" width="291" height="300" class="size-medium wp-image-814" /></a><p
class="wp-caption-text"><strong>Clobi Resource Manager</strong> showing its monitoring loop</p></div></p><p>It observes the number of jobs in the queues and the number of running <strong>Job Agents</strong> periodically. Based on this information the <strong>Resource Manager</strong> easily could start / kill virtual machines (I&#8217;ve already demonstrated how easy starting is; killing is described later). I did not implement this algorithm until now, because a) I had no time and b) I could have done it quick and dirty, but I really did not need this feature to develop and test the rest of the system. But this feature will come, because if it&#8217;s implemented properly with intelligent policies, it&#8217;s just great.</p><h4><a
name="jobmon">6 Job monitoring</a></h4><p>As I&#8217;ve already mentioned, Ganga periodically checks the jobs&#8217; states. Therefore, the <strong>Ganga Clobi backend</strong> provides a special method that Ganga calls from time to time from one of its monitoring threads. Normally this happens quietly, but I&#8217;ve put some debug output into this method. Let&#8217;s check it out:<br
/><div
id="attachment_815" class="wp-caption aligncenter" style="width: 310px"><a
href="http://gehrcke.de/wp/wp-content/uploads/2009/08/007_Ganga_monitoring_jobs.png" rel="lightbox[794]"><img
src="http://gehrcke.de/wp/wp-content/uploads/2009/08/007_Ganga_monitoring_jobs-300x97.png" alt="Ganga receives monitoring information via the Clobi backend" title="007_Ganga_monitoring_jobs" width="300" height="97" class="size-medium wp-image-815" /></a><p
class="wp-caption-text">Ganga receives monitoring information via the <strong>Clobi backend</strong></p></div></p><h4><a
name="jobcomplete">7 Job completion, output receipt</a></h4><p>After some more time, Ganga discovered that one of the three jobs finished successfully. This means that the <strong>Job Agent</strong> detected a returncode of 0 of the job shellscript and could successfully store the output sandbox archive to S3. At this point, the <strong>Clobi backend</strong> triggers to download and extract the output sandbox archive. This looks like:</p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="text"><pre class="de1">Clobi                              : INFO     status for job-090818044445-3585-3088: completed_success
Ganga.GPIDev.Lib.Job               : INFO     job 60 status changed to &quot;completed&quot;
Clobi                              : INFO     download atlassessions/0907210728-testsess-0c7e/jobs/out_sndbx_job-090818044445-3585-3088.tar.bz2 from S3
Clobi                              : INFO     store key 0907210728-testsess-0c7e/jobs/out_sndbx_job-090818044445-3585-3088.tar.bz2 as file /home/gurke/gangadir/workspace/gurke/LocalAMGA/60/output/out_sndbx_job-090818044445-3585-3088.tar.bz2 from bucket atlassessions
Clobi                              : INFO     Download of output sandbox archive successfull.</pre></div></div></div></div></div></div></div><p>Did you have doubts that this is my first serious test and everything worked until now? Some parts of the system are already tested very much, of course. But the <strong>Clobi backend</strong> for Ganga made the transition from vitally-important-features-missing to just-scratch-along-usability only a few hours ago. I&#8217;m really very happy that everything worked until now, but the output sandbox archive extraction could be improved:</p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="text"><pre class="de1">Clobi                              : CRITICAL Error while extracting output sandbox
Clobi                              : CRITICAL Traceback:
Traceback (most recent call last):
  File &quot;/mnt/hgfs/E/gsoc_code_repo/ganga_clobi_backend/Clobi/Clobi.py&quot;, line 243, in clobi_dl_extrct_outsandbox_arc
    sp = subprocess.Popen(
NameError: global name 'subprocess' is not defined</pre></div></div></div></div></div></div></div><p>Yoooah, I (want to) use Python&#8217;s subprocess module to extract the <code>tar.bz2</code> archive with system&#8217;s <code>tar</code>, but I forgot to <code>import subprocess</code> <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_sad.gif' alt=':-(' class='wp-smiley' /> . This is forgotten and fixed easily <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /> . Btw: of course I got three downloaded output archives and three extraction errors <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> .</p><h4><a
name="output">8 Examine output</a></h4><p>The output sandbox archive files were stored on my local machine by Ganga&#8217;s <strong>Clobi backend</strong>. I take a look into one of it by extracting it manually:</p><div
class="wp-geshi-highlight-wrap5"><div
class="wp-geshi-highlight-wrap4"><div
class="wp-geshi-highlight-wrap3"><div
class="wp-geshi-highlight-wrap2"><div
class="wp-geshi-highlight-wrap"><div
class="wp-geshi-highlight"><div
class="text"><pre class="de1">$ ls
out_sndbx_job-090818044451-3585-01c2.tar.bz2
$ tar xjf out_sndbx_job-090818044451-3585-01c2.tar.bz2
$ ls
AOD_007410_00001.pool.root  evgen.log  joblog_job-090818044451-3585-01c2  recoAOD.log
EVGEN_007410_00001.pool.root  jobagent_log  out_sndbx_job-090818044451-3585-01c2.tar.bz2</pre></div></div></div></div></div></div></div><p><strong>Great! The AOD file is there</strong>. This means that 1) <strong>Clobi</strong> did perfect work to control and manage the job and 2) the ATLAS Computing part (&#8220;The Full Chain&#8221;) worked perfectly:</p><ul><li>the interaction between a certain particle (which was defined within the input sandbox) and the ATLAS detector was successfully simulated.</li></ul><ul><li>the ATLAS detector output (basically times and voltages) was calculated successfully.</li></ul><ul><li>particle tracks and energy deposits were successfully reconstructed from times and voltages.</li></ul><ul><li>an event summary was successfully built from tracks and energy deposits.</li></ul><p>&#8220;The summary&#8221; is saved within the AOD file, which successfully returned to my local machine. Cool. Every single part of the system worked as it should (psss, don&#8217;t think of the extraction&#8230;).</p><h4><a
name="shutdown">9 VM shutdown</a></h4><p>You perhaps asked yourself how to dynamically kill VMs. Besides the &#8220;hard kill&#8221; (invoking Nimbus/EC2 API calls to shut down a VM), I&#8217;ve implemented a mechanism that I call &#8220;soft kill&#8221;: the <strong>Resource Manager</strong> sets the &#8220;soft kill flag&#8221; for a specific VM (in SimpleDB) and the corresponding <strong>Job Agent</strong> checks it from time to time. When it is set, it waits until all currently running jobs are done and then the <strong>Job Agent</strong> shuts down the VM. Let&#8217;s watch it in action (I had to look up the command of my own application, too few sleep recently!):<br
/><div
id="attachment_820" class="wp-caption aligncenter" style="width: 288px"><a
href="http://gehrcke.de/wp/wp-content/uploads/2009/08/008_RM_softkill.png" rel="lightbox[794]"><img
src="http://gehrcke.de/wp/wp-content/uploads/2009/08/008_RM_softkill-278x300.png" alt="Clobi Resource Manager setting up the softkill flag for both VMs" title="008_RM_softkill" width="278" height="300" class="size-medium wp-image-820" /></a><p
class="wp-caption-text"><strong>Clobi Resource Manager</strong> setting up the softkill flag for both VMs</p></div></p><p>After waiting some time we see the number of running <strong>Job Agents</strong> decrease to zero..<br
/><div
id="attachment_823" class="wp-caption aligncenter" style="width: 288px"><a
href="http://gehrcke.de/wp/wp-content/uploads/2009/08/009_RM_softkill_0jobagents.png" rel="lightbox[794]"><img
src="http://gehrcke.de/wp/wp-content/uploads/2009/08/009_RM_softkill_0jobagents-278x300.png" alt="Clobi Resource Manager detected that both Job Agents / VMs have shut down" title="009_RM_softkill_0jobagents" width="278" height="300" class="size-medium wp-image-823" /></a><p
class="wp-caption-text"><strong>Clobi Resource Manager</strong> detected that both Job Agents / VMs have shut down</p></div></p><h4><a
name="appendix">10 Appendix</a></h4><p>If you are very interested, you can find some additional material:</p><ul><li>My earlier work on this topic (from last year), &#8220;Amazon Web Services for ATLAS Computing&#8221; (AWSAC) can be found here: <a
href="http://gehrcke.de/awsac">http://gehrcke.de/awsac</a>.</li></ul><ul><li>The aboriginal GSoC project description <a
href="http://gehrcke.de/projects/google-summer-of-code/">can be found here</a>.</li></ul><ul><li>I&#8217;ve already written <a
href="http://gehrcke.de/category/technical-stuff/google-summer-of-code/">some blog posts about this project during Google Summer of Code</a>.</li></ul><ul><li>The last visualizations of the system (from the planning period) are these two schemes: <a
href="http://gehrcke.de/gsoc/jobsystem_scheme.jpeg" rel="lightbox[794]">one</a>, <a
href="http://gehrcke.de/gsoc/jobsystem_scheme_two_sessions.jpeg" rel="lightbox[794]">two</a>.</li></ul><p>A detailled, up-to-date and exact description of the system (&#8220;<strong>Clobi</strong>&#8220;) is planned for the future.</p><p>I will need the last days of GSoC to implement some missing and important features, to search and fix bugs and to clean everything up to make it presentable. I will definitely work on this project after GSoC (as the time allows it, of course). Currently I think about pushing the project to either <a
href="http://bitbucket.org/">bitbucket</a> or <a
href="http://code.google.com/">Google code</a>. Both support <a
href="http://en.wikipedia.org/wiki/Mercurial_%28software%29">mercurial</a> repositories and this is what I used for my code until now (locally).</p><p>If you like this, spread it! Every question and/or comment is much appreciated!</p><p>Thanks for listening <img
src='http://gehrcke.de/wp/wp-includes/images/smilies/icon_smile.gif' alt=':-)' class='wp-smiley' /></p> ]]></content:encoded> <wfw:commentRss>http://gehrcke.de/2009/08/distribute-high-performance-computing-jobs-among-multiple-computing-clouds/feed/</wfw:commentRss> <slash:comments>3</slash:comments> </item> </channel> </rss>
