API Reference

TimeSeries

In traces, a TimeSeries is similar to a dictionary that contains measurements of something at different times. One difference is that you can ask for the value at any time – it doesn’t need to be at a measurement time. Let’s say you’re measuring the contents of a grocery cart by the number of minutes within a shopping trip.

>>> cart = traces.TimeSeries()
>>> cart[1.2] = {'broccoli'}
>>> cart[1.7] = {'broccoli', 'apple'}
>>> cart[2.2] = {'apple'}
>>> cart[3.5] = {'apple', 'beets'}

If you want to know what’s in the cart at 2 minutes, you can simply get the value using cart[2] and you’ll see {'broccoli', 'apple'}. By default, if you ask for a time before the first measurement, you’ll get the first measurement value.

>>> cart = traces.TimeSeries()
>>> cart[-1]
set(['broccoli'])

If, however, you set the default when creating the TimeSeries, you’ll get that instead:

>>> cart = traces.TimeSeries(default=set())
>>> cart[-1]
set([])

In this case, it might also make sense to add the t=0 point as a measurement with cart[0] = set(). If you know the time span over which the measurements are taken and you want your code to break if there is something out of that range, you can set a domain on the TimeSeries.

>>> cart = traces.TimeSeries(domain=(0, 120))
>>> cart[121]
Traceback (most recent call last):
  File "bunga.py", line 4, in <module>
    cart[121]
  File "/Users/stringer/Projects/traces/traces/timeseries.py", line 772, in __getitem__
    return self.get(time)
  File "/Users/stringer/Projects/traces/traces/timeseries.py", line 124, in get
    raise KeyError(msg)
KeyError: '121 is outside of the domain.'

Setting an explicit domain can help avoid pesky bugs with dirty sensor data.

Performance note

Traces is not designed for maximal performance, but it’s no slouch since it uses the excellent sortedcontainers.SortedDict under the hood to store sparse time series.

class traces.TimeSeries(data=None, default=<object object>)[source]

A class to help manipulate and analyze time series that are the result of taking measurements at irregular points in time. For example, here would be a simple time series that starts at 8am and goes to 9:59am:

>>> ts = TimeSeries()
>>> ts['8:00am'] = 0
>>> ts['8:47am'] = 1
>>> ts['8:51am'] = 0
>>> ts['9:15am'] = 1
>>> ts['9:59am'] = 0

The value of the time series is the last recorded measurement: for example, at 8:05am the value is 0 and at 8:48am the value is 1. So:

>>> ts['8:05am']
0
>>> ts['8:48am']
1

There are also a bunch of things for operating on another time series: sums, difference, logical operators and such.

compact()[source]

Convert this instance to a compact version: the value will be the same at all times, but repeated measurements are discarded.

default

Return the default value of the time series.

difference(other)[source]

difference(x, y) = x(t) - y(t).

distribution(start=None, end=None, normalized=True, mask=None)[source]

Calculate the distribution of values over the given time range from start to end.

Parameters:
  • start (orderable, optional) – The lower time bound of when to calculate the distribution. By default, the first time point will be used.
  • end (orderable, optional) – The upper time bound of when to calculate the distribution. By default, the last time point will be used.
  • normalized (bool) – If True, distribution will sum to one. If False and the time values of the TimeSeries are datetimes, the units will be seconds.
  • mask (Domain or TimeSeries, optional) – A Domain on which to calculate the distribution.
Returns:

Histogram with the results.

first_item()[source]

Returns the first (time, value) pair of the time series.

get(time, interpolate='previous')[source]

Get the value of the time series, even in-between measured values.

get_item_by_index(index)[source]

Get the (t, value) pair of the time series by index.

is_floating()[source]

An empty TimeSeries with no specific default value is said to be “floating”, since the value of the TimeSeries is undefined. Any operation that needs to look up the value of the TimeSeries is not defined on a floating TimeSeries.

items() → list of the (key, value) pairs in ts, as 2-tuples[source]
classmethod iter_merge(timeseries_list)[source]

Iterate through several time series in order, yielding (time, list) tuples where list is the values of each individual TimeSeries in the list at time t.

iterintervals(n=2)[source]

Iterate over groups of n consecutive measurement points in the time series.

iterperiods(start=None, end=None, value=None)[source]

This iterates over the periods (optionally, within a given time span) and yields (interval start, interval end, value) tuples.

TODO: add mask argument here.

last_item()[source]

Returns the last (time, value) pair of the time series.

logical_and(other)[source]

logical_and(t) = self(t) and other(t).

logical_or(other)[source]

logical_or(t) = self(t) or other(t).

logical_xor(other)[source]

logical_xor(t) = self(t) ^ other(t).

mean(start=None, end=None, mask=None)[source]

This calculated the average value of the time series over the given time range from start to end.

classmethod merge(ts_list, compact=True, operation=None, default=None)[source]

Iterate through several time series in order, yielding (time, value) where value is the either the list of each individual TimeSeries in the list at time t (in the same order as in ts_list) or the result of the optional operation on that list of values.

moving_average(sampling_period, window_size=None, start=None, end=None, placement='center', pandas=False)[source]

Averaging over regular intervals

multiply(other)[source]

mul(t) = self(t) * other(t).

n_measurements()[source]

Return the number of measurements in the time series.

n_points(start=-inf, end=inf, mask=None, include_start=True, include_end=False, normalized=False)[source]

Calculate the number of points over the given time range from start to end.

Parameters:
  • start (orderable, optional) – The lower time bound of when to calculate the distribution. By default, start is -infinity.
  • end (orderable, optional) – The upper time bound of when to calculate the distribution. By default, the end is +infinity.
  • mask (Domain or TimeSeries, optional) – A Domain on which to calculate the distribution.
Returns:

int with the result

operation(other, function, **kwargs)[source]

Calculate “elementwise” operation either between this TimeSeries and another one, i.e.

operation(t) = function(self(t), other(t))

or between this timeseries and a constant:

operation(t) = function(self(t), other)

If it’s another time series, the measurement times in the resulting TimeSeries will be the union of the sets of measurement times of the input time series. If it’s a constant, the measurement times will not change.

remove(time)[source]

Allow removal of measurements from the time series. This throws an error if the given time is not actually a measurement point.

remove_points_from_interval(start, end)[source]

Allow removal of all points from the time series within a interval [start:end].

sample(sampling_period, start=None, end=None, interpolate='previous')[source]

Sampling at regular time periods.

set(time, value, compact=False)[source]

Set the value for the time series. If compact is True, only set the value if it’s different from what it would be anyway.

set_interval(start, end, value, compact=False)[source]

Set the value for the time series on an interval. If compact is True, only set the value if it’s different from what it would be anyway.

slice(start, end)[source]

Return an equivalent TimeSeries that only has points between start and end (always starting at start)

sum(other)[source]

sum(x, y) = x(t) + y(t).

threshold(value, inclusive=False)[source]

Return True if > than treshold value (or >= threshold value if inclusive=True).

to_bool(invert=False)[source]

Return the truth value of each element.

Domain

class traces.Domain(data=None)[source]

Initialize with:

>>> Domain(1, 4)
>>> Domain([1, 4])
>>> Domain((1, 4))
>>> Domain([[1, 4]])
>>> Domain([(1, 4)])
>>> Domain((1, 4), (5, 8))
>>> Domain([1, 4], [5, 8])
>>> Domain([(1, 4), (5, 8)])
>>> Domain([[1, 4], [5, 8]])

Domain has to be closed intervals. It can be open toward -inf or inf. For example, Domain(-inf, 3) means a domain from -inf to 3 inclusive.

Histogram

class traces.Histogram(data=(), **kwargs)[source]
max()[source]

Maximum observed value.

mean()[source]

Mean of the distribution.

min()[source]

Minimum observed value.

normalized()[source]

Return a normalized version of the histogram where the values sum to one.

standard_deviation()[source]

Standard deviation of the distribution.

total()[source]

Sum of values.

variance()[source]

Variance of the distribution.