Average date strings using Python

Assume, you’ve measured values over time and now you want to average your data. This means you have to a) average your measured values — which is a trivial task — and b) average your points in time. Here, I present a solution how to average arbitrary date strings in Python.

Let’s talk about a specific example. This is the content of dates.dat, our input:

20110130_195243
20110130_200003
20110130_200803
20110130_200909
20110130_201003
20110130_202004
20110130_203003
20110130_204003
20110130_205003
20110130_210003
20110130_211003
20110130_212004
20110130_213003
20110130_214003
20110130_215003
20110130_220003
20110130_221003

Each of the 17 lines contains a string representing a point in time using some distinct format.

Now, let’s say that the goal is to build the mean of every 3 points in time. An output date string representing a mean time should have the same format as the date strings in dates.dat. Hence, the outputfile dates_meanof3values.dat should look like this:

20110130_200016
20110130_201305
20110130_204003
20110130_211003
20110130_214003

These 5 date strings represent the average points in time of the first 5*3 date strings in our input.

The following Python code accomplishes this:

  1. import time
  2.  
  3. meanof = 3 # number of dates taken into account for averaging
  4. inputfile = open('dates.dat')
  5. outputfile = open("dates_meanof%svalues.dat" % meanof,'w')
  6.  
  7. def datestring_to_timestamp(str):
  8.     """
  9.     Assume `str` representing a time in local time and convert it to a timestamp
  10.     (time as a floating point number expressed in seconds since the epoch, in
  11.     UTC) using the format given below.
  12.     """
  13.     return time.mktime(time.strptime(str, "%Y%m%d_%H%M%S"))
  14.  
  15. def timestamp_to_datestring(timestamp):
  16.     """
  17.     Inverse of the function `datestring_to_timestamp`
  18.     """
  19.     return time.strftime("%Y%m%d_%H%M%S", time.localtime(timestamp))
  20.  
  21. def chunks(list, n, strict=False):
  22.     """
  23.     Split `list` in sub-lists: yield successive `n`-sized chunks from `list`.
  24.     If `strict` is True, the last chunk is only yielded if its length is `n`.
  25.     """
  26.     for i in xrange(0, len(list), n):
  27.         if not strict or len(list[i:i+n]) == n:
  28.             yield list[i:i+n]
  29.  
  30. # read lines from file and remove trailing spaces; don't consider empty lines          
  31. cleanlines = [line.strip() for line in inputfile.readlines() if line.strip()]
  32. # devide data into chunks and interate over them
  33. for chunk in chunks(cleanlines, meanof, True):
  34.     # build timestamp of each datestring in current chunk and build the median
  35.     mean_timestamp = sum(map(datestring_to_timestamp, chunk)) / meanof
  36.     # convert mean timestamp back to datestring and write this to file
  37.     outputfile.write("%s\n" % timestamp_to_datestring(mean_timestamp))

Timestamps are linearly related in the decimal system, so that time can be easily averaged by summation and division of timestamps. The functions datestring_to_timestamp() and timestamp_to_datestring() perform the conversion of date strings from/to timestamps, using a user-given date string format (you can edit lines 13 and 19 corresponding to these format specifiers).

The function chunks(), which makes use of Python generators, devides a given list into sub-lists (“chunks”). This is very useful at this point — the mean date string of a chunk then is calculated as simple as

timestamp_to_datestring(sum(map(datestring_to_timestamp, chunk)) / meanof)