Average date strings using Python

Assume, you’ve measured values over time and now you want to average your data. This means you have to a) average your measured values — which is a trivial task — and b) average your points in time. Here, I present a solution how to average arbitrary date strings in Python.

Let’s talk about a specific example. This is the content of dates.dat, our input:

20110130_195243
20110130_200003
20110130_200803
20110130_200909
20110130_201003
20110130_202004
20110130_203003
20110130_204003
20110130_205003
20110130_210003
20110130_211003
20110130_212004
20110130_213003
20110130_214003
20110130_215003
20110130_220003
20110130_221003

Each of the 17 lines contains a string representing a point in time using some distinct format.

Now, let’s say that the goal is to build the mean of every 3 points in time. An output date string representing a mean time should have the same format as the date strings in dates.dat. Hence, the outputfile dates_meanof3values.dat should look like this:

20110130_200016
20110130_201305
20110130_204003
20110130_211003
20110130_214003

These 5 date strings represent the average points in time of the first 5*3 date strings in our input.

The following Python code accomplishes this:

```
import time
```
```
 
```

meanof = 3 # number of dates taken into account for averaging

```
inputfile = open('dates.dat')
```

outputfile = open("dates_meanof%svalues.dat" % meanof,'w')

```
 
```
```
def datestring_to_timestamp(str):
```
```
    """
```

    Assume `str` representing a time in local time and convert it to a timestamp

    (time as a floating point number expressed in seconds since the epoch, in

```
    UTC) using the format given below.
```
```
    """
```

    return time.mktime(time.strptime(str, "%Y%m%d_%H%M%S"))

```
 
```
```
def timestamp_to_datestring(timestamp):
```
```
    """
```

    Inverse of the function `datestring_to_timestamp`

```
    """
```

    return time.strftime("%Y%m%d_%H%M%S", time.localtime(timestamp))

```
 
```
```
def chunks(list, n, strict=False):
```
```
    """
```

    Split `list` in sub-lists: yield successive `n`-sized chunks from `list`.

    If `strict` is True, the last chunk is only yielded if its length is `n`.

```
    """
```
```
    for i in xrange(0, len(list), n):
```

        if not strict or len(list[i:i+n]) == n:

```
            yield list[i:i+n]
```
```
 
```

# read lines from file and remove trailing spaces; don't consider empty lines

cleanlines = [line.strip() for line in inputfile.readlines() if line.strip()]

# devide data into chunks and interate over them

for chunk in chunks(cleanlines, meanof, True):

    # build timestamp of each datestring in current chunk and build the median

    mean_timestamp = sum(map(datestring_to_timestamp, chunk)) / meanof

    # convert mean timestamp back to datestring and write this to file

    outputfile.write("%s\n" % timestamp_to_datestring(mean_timestamp))

Timestamps are linearly related in the decimal system, so that time can be easily averaged by summation and division of timestamps. The functions datestring_to_timestamp() and timestamp_to_datestring() perform the conversion of date strings from/to timestamps, using a user-given date string format (you can edit lines 13 and 19 corresponding to these format specifiers).

The function chunks(), which makes use of Python generators, devides a given list into sub-lists (“chunks”). This is very useful at this point — the mean date string of a chunk then is calculated as simple as

timestamp_to_datestring(sum(map(datestring_to_timestamp, chunk)) / meanof)

Jan-Philip Gehrcke, PhD

Average date strings using Python

Leave a Reply Cancel reply