Assume, you’ve measured values over time and now you want to average your data. This means you have to a) average your measured values — which is a trivial task — and b) average your points in time. Here, I present a solution how to average arbitrary date strings in Python.
Let’s talk about a specific example. This is the content of dates.dat
, our input:
20110130_195243 20110130_200003 20110130_200803 20110130_200909 20110130_201003 20110130_202004 20110130_203003 20110130_204003 20110130_205003 20110130_210003 20110130_211003 20110130_212004 20110130_213003 20110130_214003 20110130_215003 20110130_220003 20110130_221003
Each of the 17 lines contains a string representing a point in time using some distinct format.
Now, let’s say that the goal is to build the mean of every 3 points in time. An output date string representing a mean time should have the same format as the date strings in dates.dat
. Hence, the outputfile dates_meanof3values.dat
should look like this:
20110130_200016 20110130_201305 20110130_204003 20110130_211003 20110130_214003
These 5 date strings represent the average points in time of the first 5*3 date strings in our input.
The following Python code accomplishes this:
import time
meanof = 3 # number of dates taken into account for averaging
inputfile = open('dates.dat')
outputfile = open("dates_meanof%svalues.dat" % meanof,'w')
def datestring_to_timestamp(str):
"""
Assume `str` representing a time in local time and convert it to a timestamp
(time as a floating point number expressed in seconds since the epoch, in
UTC) using the format given below.
"""
return time.mktime(time.strptime(str, "%Y%m%d_%H%M%S"))
def timestamp_to_datestring(timestamp):
"""
Inverse of the function `datestring_to_timestamp`
"""
return time.strftime("%Y%m%d_%H%M%S", time.localtime(timestamp))
def chunks(list, n, strict=False):
"""
Split `list` in sub-lists: yield successive `n`-sized chunks from `list`.
If `strict` is True, the last chunk is only yielded if its length is `n`.
"""
for i in xrange(0, len(list), n):
if not strict or len(list[i:i+n]) == n:
yield list[i:i+n]
# read lines from file and remove trailing spaces; don't consider empty lines
cleanlines = [line.strip() for line in inputfile.readlines() if line.strip()]
# devide data into chunks and interate over them
for chunk in chunks(cleanlines, meanof, True):
# build timestamp of each datestring in current chunk and build the median
mean_timestamp = sum(map(datestring_to_timestamp, chunk)) / meanof
# convert mean timestamp back to datestring and write this to file
outputfile.write("%s\n" % timestamp_to_datestring(mean_timestamp))
Timestamps are linearly related in the decimal system, so that time can be easily averaged by summation and division of timestamps. The functions datestring_to_timestamp()
and timestamp_to_datestring()
perform the conversion of date strings from/to timestamps, using a user-given date string format (you can edit lines 13 and 19 corresponding to these format specifiers).
The function chunks()
, which makes use of Python generators, devides a given list into sub-lists (“chunks”). This is very useful at this point — the mean date string of a chunk then is calculated as simple as
timestamp_to_datestring(sum(map(datestring_to_timestamp, chunk)) / meanof)
Leave a Reply