# Average date strings using Python

Assume, you’ve measured values over time and now you want to average your data. This means you have to a) average your measured values — which is a trivial task — and b) average your points in time. Here, I present a solution how to average arbitrary date strings in Python.

Let’s talk about a specific example. This is the content of `dates.dat`, our input:

```20110130_195243
20110130_200003
20110130_200803
20110130_200909
20110130_201003
20110130_202004
20110130_203003
20110130_204003
20110130_205003
20110130_210003
20110130_211003
20110130_212004
20110130_213003
20110130_214003
20110130_215003
20110130_220003
20110130_221003```

Each of the 17 lines contains a string representing a point in time using some distinct format.

Now, let’s say that the goal is to build the mean of every 3 points in time. An output date string representing a mean time should have the same format as the date strings in `dates.dat`. Hence, the outputfile `dates_meanof3values.dat` should look like this:

```20110130_200016
20110130_201305
20110130_204003
20110130_211003
20110130_214003```

These 5 date strings represent the average points in time of the first 5*3 date strings in our input.

The following Python code accomplishes this:

1. `import time`
2. ` `
3. `meanof = 3 # number of dates taken into account for averaging`
4. `inputfile = open('dates.dat')`
5. `outputfile = open("dates_meanof%svalues.dat" % meanof,'w')`
6. ` `
7. `def datestring_to_timestamp(str):`
8. `    """`
9. `    Assume `str` representing a time in local time and convert it to a timestamp`
10. `    (time as a floating point number expressed in seconds since the epoch, in`
11. `    UTC) using the format given below.`
12. `    """`
13. `    return time.mktime(time.strptime(str, "%Y%m%d_%H%M%S"))`
14. ` `
15. `def timestamp_to_datestring(timestamp):`
16. `    """`
17. `    Inverse of the function `datestring_to_timestamp``
18. `    """`
19. `    return time.strftime("%Y%m%d_%H%M%S", time.localtime(timestamp))`
20. ` `
21. `def chunks(list, n, strict=False):`
22. `    """`
23. `    Split `list` in sub-lists: yield successive `n`-sized chunks from `list`.`
24. `    If `strict` is True, the last chunk is only yielded if its length is `n`.`
25. `    """`
26. `    for i in xrange(0, len(list), n):`
27. `        if not strict or len(list[i:i+n]) == n:`
28. `            yield list[i:i+n]`
29. ` `
30. `# read lines from file and remove trailing spaces; don't consider empty lines          `
31. `cleanlines = [line.strip() for line in inputfile.readlines() if line.strip()]`
32. `# devide data into chunks and interate over them`
33. `for chunk in chunks(cleanlines, meanof, True):`
34. `    # build timestamp of each datestring in current chunk and build the median`
35. `    mean_timestamp = sum(map(datestring_to_timestamp, chunk)) / meanof`
36. `    # convert mean timestamp back to datestring and write this to file`
37. `    outputfile.write("%s\n" % timestamp_to_datestring(mean_timestamp))`

Timestamps are linearly related in the decimal system, so that time can be easily averaged by summation and division of timestamps. The functions `datestring_to_timestamp()` and `timestamp_to_datestring()` perform the conversion of date strings from/to timestamps, using a user-given date string format (you can edit lines 13 and 19 corresponding to these format specifiers).

The function `chunks()`, which makes use of Python generators, devides a given list into sub-lists (“chunks”). This is very useful at this point — the mean date string of a chunk then is calculated as simple as

`timestamp_to_datestring(sum(map(datestring_to_timestamp, chunk)) / meanof)`