Assume, you’ve measured values over time and now you want to average your data. This means you have to a) average your measured values — which is a trivial task — and b) average your points in time. Here, I present a solution how to average arbitrary date strings in Python.
Let’s talk about a specific example. This is the content of dates.dat, our input:
20110130_195243 20110130_200003 20110130_200803 20110130_200909 20110130_201003 20110130_202004 20110130_203003 20110130_204003 20110130_205003 20110130_210003 20110130_211003 20110130_212004 20110130_213003 20110130_214003 20110130_215003 20110130_220003 20110130_221003
Each of the 17 lines contains a string representing a point in time using some distinct format.
Now, let’s say that the goal is to build the mean of every 3 points in time. An output date string representing a mean time should have the same format as the date strings in dates.dat. Hence, the outputfile dates_meanof3values.dat should look like this:
20110130_200016 20110130_201305 20110130_204003 20110130_211003 20110130_214003
These 5 date strings represent the average points in time of the first 5*3 date strings in our input.
The following Python code accomplishes this:
import time
meanof = 3 # number of dates taken into account for averaging
inputfile = open('dates.dat')
outputfile = open("dates_meanof%svalues.dat" % meanof,'w')
def datestring_to_timestamp(str):
"""Assume `str` representing a time in local time and convert it to a timestamp(time as a floating point number expressed in seconds since the epoch, inUTC) using the format given below."""return time.mktime(time.strptime(str, "%Y%m%d_%H%M%S"))
def timestamp_to_datestring(timestamp):
"""Inverse of the function `datestring_to_timestamp`"""return time.strftime("%Y%m%d_%H%M%S", time.localtime(timestamp))
def chunks(list, n, strict=False):
"""Split `list` in sub-lists: yield successive `n`-sized chunks from `list`.If `strict` is True, the last chunk is only yielded if its length is `n`."""for i in xrange(0, len(list), n):
if not strict or len(list[i:i+n]) == n:
yield list[i:i+n]
# read lines from file and remove trailing spaces; don't consider empty linescleanlines = [line.strip() for line in inputfile.readlines() if line.strip()]
# devide data into chunks and interate over themfor chunk in chunks(cleanlines, meanof, True):
# build timestamp of each datestring in current chunk and build the medianmean_timestamp = sum(map(datestring_to_timestamp, chunk)) / meanof
# convert mean timestamp back to datestring and write this to fileoutputfile.write("%s\n" % timestamp_to_datestring(mean_timestamp))
Timestamps are linearly related in the decimal system, so that time can be easily averaged by summation and division of timestamps. The functions datestring_to_timestamp() and timestamp_to_datestring() perform the conversion of date strings from/to timestamps, using a user-given date string format (you can edit lines 13 and 19 corresponding to these format specifiers).
The function chunks(), which makes use of Python generators, devides a given list into sub-lists (“chunks”). This is very useful at this point — the mean date string of a chunk then is calculated as simple as
timestamp_to_datestring(sum(map(datestring_to_timestamp, chunk)) / meanof)
Leave a Reply