API Reference¶
TimeSeries¶
In traces, a TimeSeries is similar to a dictionary that contains measurements of something at different times. One difference is that you can ask for the value at any time – it doesn’t need to be at a measurement time. Let’s say you’re measuring the contents of a grocery cart by the number of minutes within a shopping trip.
>>> cart = traces.TimeSeries()
>>> cart[1.2] = {'broccoli'}
>>> cart[1.7] = {'broccoli', 'apple'}
>>> cart[2.2] = {'apple'}
>>> cart[3.5] = {'apple', 'beets'}
If you want to know what’s in the cart at 2 minutes, you can simply
get the value using cart[2] and you’ll see {'broccoli',
'apple'}. By default, if you ask for a time before the first
measurement, you’ll get None.
>>> cart = traces.TimeSeries()
>>> cart[-1]
None
If, however, you set the default when creating the TimeSeries, you’ll get that instead:
>>> cart = traces.TimeSeries(default=set())
>>> cart[-1]
set([])
In this case, it might also make sense to add the t=0 point as a
measurement with cart[0] = set().
Visualizing Time Series¶
The TimeSeries class provides a plot() method for easy visualization of your data:
>>> ts = traces.TimeSeries()
>>> ts[0] = 0
>>> ts[1] = 2
>>> ts[3] = 1
>>> ts[5] = 0
>>>
>>> # Create a basic plot with default settings
>>> fig, ax = ts.plot()
You can customize the plot appearance with various parameters:
>>> # Create a plot with linear interpolation and custom styling
>>> fig, ax = ts.plot(
... interpolate="linear", # Use linear interpolation between points
... figure_width=10, # Set figure width in inches
... linewidth=2, # Set line thickness
... marker="s", # Use square markers
... markersize=5, # Set marker size
... color="#FF5733" # Use custom color
... )
The plot method returns matplotlib objects that you can further customize or save to a file:
>>> # Add title and labels
>>> ax.set_title("My Time Series Data")
>>> ax.set_xlabel("Time")
>>> ax.set_ylabel("Value")
>>>
>>> # Save the plot to a file
>>> fig.savefig("my_timeseries.png")
- class traces.TimeSeries(data=None, default=None)[source]¶
A class to help manipulate and analyze time series that are the result of taking measurements at irregular points in time. For example, here would be a simple time series that starts at 8am and goes to 9:59am:
>>> ts = TimeSeries() >>> ts['8:00am'] = 0 >>> ts['8:47am'] = 1 >>> ts['8:51am'] = 0 >>> ts['9:15am'] = 1 >>> ts['9:59am'] = 0
The value of the time series is the last recorded measurement: for example, at 8:05am the value is 0 and at 8:48am the value is 1. So:
>>> ts['8:05am'] 0
>>> ts['8:48am'] 1
There are also a bunch of things for operating on another time series: sums, difference, logical operators and such.
- compact()[source]¶
Convert this instance to a “compact” version: the value will be the same at all times, but repeated measurements are discarded.
Compacting the time series can significantly reduce the length and memory usage for data with many repeated values.
No arguments are required for this method, and it modifies the time series in place.
Example
>>> ts = TimeSeries(data=[(1, 5), (2, 5), (5, 5), (6, 1)]) >>> ts.compact() >>> ts TimeSeries({1: 5, 6: 1})
- classmethod count_by_value(ts_list)[source]¶
Return a dict mapping each state value to a TimeSeries that counts how many of the input timeseries are in that state at each point in time.
Efficient for many timeseries with few discrete states (e.g. boolean on/off, ticket open/closed). Uses iter_merge_transitions to process O(1) per transition, independent of the number of timeseries K. See docs/merge_strategies.rst for how this avoids the O(K) list copies that iter_merge/merge require.
- Parameters:
ts_list – An iterable of TimeSeries objects.
- Returns:
A dict where keys are the distinct values found across all input timeseries (including defaults), and values are TimeSeries objects whose value at any time t is the count of input timeseries equal to that state at time t.
- distribution(start=None, end=None, normalized=True, mask=None, interpolate='previous')[source]¶
Calculate the distribution of values over the given time range from start to end.
- Parameters:
start (orderable, optional) – The lower time bound of when to calculate the distribution. By default, the first time point will be used.
end (orderable, optional) – The upper time bound of when to calculate the distribution. By default, the last time point will be used.
normalized (bool) – If True, distribution will sum to one. If False and the time values of the TimeSeries are datetimes, the units will be seconds.
mask (
TimeSeries, optional) – A domain on which to calculate the distribution.interpolate (str, optional) – Method for interpolating between measurement points: either “previous” (default) or “linear”. Note: if “previous” is used, then the resulting histogram is exact. If “linear” is given, then the values used for the histogram are the average value for each segment – the mean of this histogram will be exact, but higher moments (variance) will be approximate.
- Returns:
Histogramwith the results.
- exists()[source]¶
Returns a new TimeSeries where values are True when the original value is not None.
Deprecated: Use is_not_none() instead, which has a clearer name.
- Returns:
- A new TimeSeries with boolean values (True where original values
are not None, False where they are None)
- Return type:
Examples
>>> ts = TimeSeries() >>> ts[0] = "data" >>> ts[1] = None >>> ts[2] = 42 >>> exists_ts = ts.exists() >>> exists_ts[0] # Returns True >>> exists_ts[1] # Returns False >>> exists_ts[2] # Returns True
- classmethod from_csv(filename, time_column=0, value_column=1, time_transform=None, value_transform=None, skip_header=True, default=None, delimiter=',')[source]¶
Load time series data from a CSV file.
- Parameters:
filename (str) – Path to the CSV file to read
time_column (int) – Index of the column containing time values (default: 0)
value_column (int) – Index of the column containing measurement values (default: 1)
time_transform (callable, optional) – Function to transform time strings to desired format. Default converts strings like “2020-01-01 12:00:00” to datetime objects.
value_transform (callable, optional) – Function to transform value strings. Default leaves values as strings.
skip_header (bool) – Whether to skip the first row of the file (default: True)
default (any) – Default value for the time series
delimiter (str) – CSV delimiter character (default: “,”)
- Returns:
A new TimeSeries object with the data from the CSV
- Return type:
Examples
>>> # Basic usage with default settings >>> ts = TimeSeries.from_csv("data.csv") >>> >>> # Custom time parsing >>> import datetime >>> ts = TimeSeries.from_csv( ... "data.csv", ... time_transform=lambda s: datetime.datetime.strptime(s, "%Y-%m-%dT%H:%M:%S") ... ) >>> >>> # Convert values to integers >>> ts = TimeSeries.from_csv( ... "data.csv", ... value_transform=int, ... default=0 ... )
- classmethod from_json(filename=None, json_string=None, time_key='time', value_key='value', time_transform=None, value_transform=None, default=None)[source]¶
Load time series data from a JSON file or string.
The JSON should be either: 1. A list of objects/dictionaries with time and value keys 2. A single object/dictionary with time keys and value values
- Parameters:
filename (str, optional) – Path to the JSON file
json_string (str, optional) – JSON string (used if filename not provided)
time_key (str) – The key for time values in each record (default: “time”)
value_key (str) – The key for measurement values in each record (default: “value”)
time_transform (callable, optional) – Function to transform time values to desired format Default converts ISO format strings to datetime objects.
value_transform (callable, optional) – Function to transform measurement values
default (any) – Default value for the time series
- Returns:
A new TimeSeries object with the data from the JSON
- Return type:
Examples
>>> # From a list of records >>> ts = TimeSeries.from_json('data.json') >>> >>> # From a JSON string with custom keys >>> ts = TimeSeries.from_json( ... json_string='[{"timestamp": "2020-01-01T00:00:00", "temp": 20.5}]', ... time_key="timestamp", ... value_key="temp" ... ) >>> >>> # With custom time parsing >>> ts = TimeSeries.from_json( ... 'data.json', ... time_transform=lambda t: datetime.datetime.fromtimestamp(float(t)) ... )
- get(time, interpolate='previous')[source]¶
Get the value of the time series at any time point.
This method retrieves the value at any time point, even between actual measurement times. The interpolation method determines how values between measurements are calculated.
- Parameters:
time – The time at which to get the value
interpolate (str) –
The interpolation method to use. Available options: - “previous”: Use the value from the most recent measurement time
(step function / zero-order hold)
”linear”: Use linear interpolation between adjacent measurements
- Returns:
The interpolated value at the specified time
- Raises:
ValueError – If an invalid interpolation method is specified
Examples
>>> ts = TimeSeries() >>> ts[0] = 0 >>> ts[10] = 10 >>> >>> # Previous value interpolation (default) >>> ts.get(5) # Returns 0 >>> >>> # Linear interpolation >>> ts.get(5, interpolate="linear") # Returns 5
- is_not_none()[source]¶
Returns a new TimeSeries where values are True when the original value is not None.
This method checks for None values in the TimeSeries and creates a new TimeSeries with boolean values indicating presence (not None) or absence (None) of values.
- Returns:
- A new TimeSeries with boolean values (True where original values
are not None, False where they are None)
- Return type:
Examples
>>> ts = TimeSeries() >>> ts[0] = "data" >>> ts[1] = None >>> ts[2] = 42 >>> exists_ts = ts.is_not_none() >>> exists_ts[0] # Returns True >>> exists_ts[1] # Returns False >>> exists_ts[2] # Returns True
- classmethod iter_merge(timeseries_list)[source]¶
Iterate through several time series in order, yielding (time, list) tuples where list is the values of each individual TimeSeries in the list at time t.
Note: yields a full K-element list copy at each unique time. For large K, consider iter_merge_transitions which yields individual O(1) transitions instead. See docs/merge_strategies.rst for details.
- static iter_merge_transitions(timeseries_list)[source]¶
Yield (time, index, previous_value, new_value) for each transition across all timeseries.
This is more memory-efficient than iter_merge for large numbers of timeseries because it yields individual transitions instead of copying the full state list at each step.
Uses a flat-sort strategy: all transitions are extracted into a single list and sorted once, rather than using a priority queue. See docs/merge_strategies.rst for a detailed comparison of merge implementation approaches with benchmarks.
- Parameters:
timeseries_list – An iterable of TimeSeries objects.
- Yields:
Tuples of (time, index, previous_value, new_value) where – - time: the time of the transition - index: which timeseries changed - previous_value: the value before the transition - new_value: the value after the transition
- iterintervals(n=2)[source]¶
Iterate over groups of n consecutive measurement points in the time series.
- iterperiods(start=None, end=None, value=None)[source]¶
This iterates over the periods (optionally, within a given time span) and yields (interval start, interval end, value) tuples.
- mean(start=None, end=None, mask=None, interpolate='previous')[source]¶
This calculated the average value of the time series over the given time range from start to end, when mask is truthy.
- classmethod merge(ts_list, compact=True, operation=None)[source]¶
Iterate through several time series in order, yielding (time, value) where value is the either the list of each individual TimeSeries in the list at time t (in the same order as in ts_list) or the result of the optional operation on that list of values.
- moving_average(sampling_period, window_size=None, start=None, end=None, placement='center', pandas=False)[source]¶
Averaging over regular intervals
- n_points(start=-inf, end=inf, mask=None, include_start=True, include_end=False, normalized=False)[source]¶
Calculate the number of points over the given time range from start to end.
- Parameters:
start (orderable, optional) – The lower time bound of when to calculate the distribution. By default, start is -infinity.
end (orderable, optional) – The upper time bound of when to calculate the distribution. By default, the end is +infinity.
mask (
TimeSeries, optional) – A domain on which to calculate the distribution.
- Returns:
int with the result
- operation(other, function, default=None)[source]¶
Calculate “elementwise” operation either between this TimeSeries and another one, i.e.
operation(t) = function(self(t), other(t))
or between this timeseries and a constant:
operation(t) = function(self(t), other)
If it’s another time series, the measurement times in the resulting TimeSeries will be the union of the sets of measurement times of the input time series. If it’s a constant, the measurement times will not change.
- plot(interpolate='previous', figure_width=12, linewidth=1, marker='o', markersize=3, color='#222222')[source]¶
Create a plot of the time series data.
Creates a visualization of the time series using matplotlib. The plot shows data points at each measurement time and connects them with lines using the specified interpolation method.
- Parameters:
interpolate (str) –
Interpolation method between points. Options are: - “previous”: Step-like plot where each value stays constant until
the next value (default)
”linear”: Straight lines between data points
figure_width (float) – Width of the figure in inches (default: 12)
linewidth (float) – Width of the connecting lines (default: 1)
marker (str) – Marker style for data points, using matplotlib marker notation (default: “o” for circular markers)
markersize (float) – Size of the markers for data points (default: 3)
color (str) – Color of the line and markers (default: “#222222”)
- Returns:
A tuple containing (figure, axes) matplotlib objects that can be further customized or saved to a file.
- Return type:
tuple
- Raises:
ImportError – If matplotlib is not installed
ValueError – If an invalid interpolation method is specified
Examples
>>> ts = TimeSeries() >>> ts[0] = 0 >>> ts[1] = 2 >>> ts[3] = 1 >>> >>> # Basic plot with default settings >>> fig, ax = ts.plot() >>> >>> # Custom plot with linear interpolation >>> fig, ax = ts.plot( ... interpolate="linear", ... figure_width=10, ... linewidth=2, ... marker="s", ... markersize=5, ... color="#FF5733" ... ) >>> >>> # Save the plot to a file >>> fig.savefig("my_timeseries.png")
- remove(time)[source]¶
Allow removal of measurements from the time series. This throws an error if the given time is not actually a measurement point.
- remove_points_from_interval(start, end)[source]¶
Remove all measurement points within a specified time interval.
This method removes all measurement points that fall within the interval [start, end), not including the end point. Unlike remove(), this method won’t raise KeyError if there are no points in the interval.
- Parameters:
start – The start time of the interval (inclusive)
end – The end time of the interval (exclusive)
Examples
>>> ts = TimeSeries() >>> ts[0] = 0 >>> ts[5] = 5 >>> ts[10] = 10 >>> ts.remove_points_from_interval(4, 7) >>> # Now ts contains only points at t=0 and t=10
- sample(sampling_period, start=None, end=None, interpolate='previous', mask=None)[source]¶
Sampling at regular time periods.
- sample_interval(sampling_period=None, start=None, end=None, idx=None, operation='mean')[source]¶
Sampling on intervals by using some operation (mean,max,min).
It can be called either with sampling_period, [start], [end] or with a idx as a DateTimeIndex.
The returing pandas.Series will be indexed either on pandas.date_range(start,end,freq=sampling_period) or on idx.
- Parameters:
sampling_period – the sampling period
start – the start time of the sampling
end – the end time of the sampling
idx – a DateTimeIndex with the start times of the intervals
operation – “mean”, “max” or “min”
- Returns:
a pandas Series with the Trace sampled
- set(time, value, compact=False)[source]¶
Set the value for the time series. If compact is True, only set the value if it’s different from what it would be anyway.
- set_interval(start, end, value, compact=False)[source]¶
Sets the value for the time series within a specified time interval.
- Parameters:
start – The start time of the interval, inclusive
end – The end time of the interval, exclusive.
value – The value to set within the interval.
compact (optional) – If compact is True, only set the value if it’s different from what it would be anyway. Defaults to False.
- Raises:
ValueError – If the start time is equal or after the end
time, indicating an invalid interval. –
Example
>>> ts = TimeSeries(data=[(1, 5), (3, 2), (5, 4), (6, 1)]) >>> ts.set_interval(2, 6, 3) >>> ts TimeSeries({1: 5, 2: 3, 6: 1})
Note
The method sets the value over the interval by removing measurements points from the time series between start and end (exclusive), rather than changing the value of any intermediate points to equal the value.
- set_many(data, compact=False)[source]¶
Set many values at once from an iterable of (time, value) pairs or a dictionary mapping times to values.
This is more efficient than calling set() in a loop because it avoids per-element bisect.insort calls.
- Parameters:
data – An iterable of (time, value) pairs or a dictionary.
compact – If True, discard consecutive entries with the same value. The first entry is kept if it differs from the current default. Only meaningful when data is in time-sorted order.
- slice(start, end)[source]¶
Return an equivalent TimeSeries that only has points between start and end (always starting at start)
- threshold(value, inclusive=False)[source]¶
Return True if > than treshold value (or >= threshold value if inclusive=True).
- to_bool(invert=False, default=<object object>)[source]¶
Return the truth value of each element.
- Parameters:
invert – opposite truth values
default – If default is not explicitly given, keep it as None if it’s None (which often means “undefined” rather than “false”), otherwise cast to bool
- Returns:
TimeSerieswith the results.
- to_json(filename=None, time_transform=None, value_transform=None, dict_format=False)[source]¶
Export time series data to a JSON file or return as a JSON string.
- Parameters:
filename (str, optional) – Path where JSON file will be written. If None, returns a JSON string instead.
time_transform (callable, optional) – Function to transform time values before serializing. Default converts datetime objects to ISO format strings.
value_transform (callable, optional) – Function to transform values before serializing.
dict_format (bool) – If True, uses a dictionary format with times as keys. If False (default), uses a list of objects with time and value keys.
- Returns:
- If filename is None, returns the JSON string.
Otherwise, writes to the file and returns None.
- Return type:
str or None
Examples
>>> # Export to a file using default settings >>> ts.to_json('output.json') >>> >>> # Get JSON as a string and customize time formatting >>> json_str = ts.to_json( ... time_transform=lambda dt: dt.timestamp() ... ) >>> >>> # Use dictionary format instead of list format >>> ts.to_json('output.json', dict_format=True)
Histogram¶
EventSeries¶
An EventSeries represents a sequence of events that occur at specific times. Unlike TimeSeries which tracks measurements (values) over time, EventSeries only tracks when events occur, without associated values.
>>> # Track website login events
>>> logins = traces.EventSeries([
... "2023-05-01 08:15",
... "2023-05-01 09:30",
... "2023-05-01 10:45",
... "2023-05-01 12:00"
... ])
EventSeries is useful for analyzing:
Event frequencies and patterns
Time intervals between events
Event counts within specific time ranges
Active cases over time (like support tickets, hospital stays)
>>> # Count events in a time range
>>> logins.events_between("2023-05-01 08:00", "2023-05-01 10:00")
2
>>> # Get cumulative count of events over time
>>> cumulative = logins.cumulative_sum()
>>> cumulative["2023-05-01 11:00"]
3
- class traces.EventSeries(data=None)[source]¶
A sorted collection of event times.
EventSeries is a specialized data structure for representing a sequence of events that occur at specific times. It subclasses list and keeps event times in chronological order.
This class is useful for: - Tracking when events occur (like clicks, logins, system events, etc.) - Counting events over time periods - Analyzing inter-event times (time between consecutive events) - Tracking cases that open and close (like support tickets, hospital visits)
Unlike TimeSeries, EventSeries doesn’t track values associated with each time point, only the times themselves. It may contain duplicate times if multiple events occur simultaneously.
- Parameters:
data (iterable, optional) – An optional iterable of time points to initialize the series. Times can be any comparable type (datetime, float, int, string, etc.) but should be consistent throughout the series.
Examples
>>> # Create an empty EventSeries >>> es = EventSeries() >>> >>> # Create from a list of timestamps >>> from datetime import datetime >>> events = [ ... datetime(2023, 1, 1, 12, 0, 0), ... datetime(2023, 1, 1, 14, 30, 0), ... datetime(2023, 1, 2, 9, 15, 0) ... ] >>> es = EventSeries(events) >>> >>> # Create from string times (requires consistent format) >>> es = EventSeries(["08:00", "09:30", "13:15", "15:45"])
- static count_active(es_open, es_closed)[source]¶
Calculate the number of active cases over time from open and close events.
This method is useful for tracking the number of concurrent active cases at any point in time, such as: - Open support tickets - Active sessions or logins - Hospital patients - Ongoing processes
It takes two event series: one for when cases open/start, and another for when cases close/end. It then computes the difference between cumulative open and close events to determine how many cases are active at each point.
- Parameters:
es_open (EventSeries) – Series of times when cases open or start
es_closed (EventSeries) – Series of times when cases close or end
- Returns:
- A TimeSeries with the number of active cases at any point in time.
The default value is 0 (representing times before any cases opened).
- Return type:
Examples
>>> # Track support tickets opened and closed >>> ticket_opened = EventSeries(["08:00", "09:00", "13:00", "07:00"]) >>> ticket_closed = EventSeries(["08:30", "12:00", "12:00"]) >>> active_tickets = EventSeries.count_active(ticket_opened, ticket_closed) >>> >>> active_tickets["07:00"] # 1 ticket opened 1 >>> active_tickets["08:15"] # 2 opened, 0 closed 2 >>> active_tickets["08:45"] # 2 opened, 1 closed 1 >>> active_tickets["13:15"] # 4 opened, 3 closed 1
- cumsum()[source]¶
Alias for cumulative_sum()
- Returns:
A cumulative count of events over time.
- Return type:
- cumulative_sum()[source]¶
Create a TimeSeries of cumulative event counts over time.
Generates a TimeSeries where each unique time in the EventSeries becomes a measurement point, and the value is the total number of events that have occurred up to and including that time.
- Returns:
- A new TimeSeries with times from the EventSeries as keys
and cumulative counts as values. The default value is 0.
- Return type:
Examples
>>> es = EventSeries([1, 1, 4, 5, 9, 6, 3, 9, 15]) >>> cumulative = es.cumulative_sum() >>> cumulative[1] # Two events at time 1 2 >>> cumulative[5] # Five events up to and including time 5 5 >>> cumulative[15] # All nine events by time 15 9 >>> cumulative[0] # No events before time 1 0
- events_between(start, end)[source]¶
Count events occurring within a specific time interval.
Counts the number of events that occurred between the specified start and end times, inclusive of both endpoints (closed interval).
- Parameters:
start – The start time of the interval (inclusive)
end – The end time of the interval (inclusive)
- Returns:
The number of events that occurred within the specified interval.
- Return type:
int
Examples
>>> es = EventSeries([1, 1, 4, 5, 9, 6, 3, 9, 15]) >>> es.events_between(1, 5) # Events at times 1, 1, 3, 4, 5 5 >>> es.events_between(7, 10) # Events at times 9, 9 2 >>> es.events_between(16, 20) # No events in this range 0
- iter_interevent_times()[source]¶
Iterate through the time intervals between consecutive events.
Yields the time difference between each consecutive pair of events in the series. This is useful for analyzing event arrival patterns, wait times, or service intervals.
- Yields:
The time difference between consecutive events. The type will depend on the time type used in the EventSeries (timedelta for datetime objects, numeric difference for numbers, etc.)
Examples
>>> from datetime import datetime, timedelta >>> events = [ ... datetime(2023, 1, 1, 12, 0, 0), ... datetime(2023, 1, 1, 14, 30, 0), # 2.5 hours after first ... datetime(2023, 1, 1, 14, 45, 0), # 15 minutes after second ... ] >>> es = EventSeries(events) >>> intervals = list(es.iter_interevent_times()) >>> intervals[0] # Time between first and second events timedelta(hours=2, minutes=30) >>> intervals[1] # Time between second and third events timedelta(minutes=15)
