# traces¶

A Python library for unevenly-spaced time series analysis.

## Why?¶

Taking measurements at irregular intervals is common, but most tools are primarily designed for evenly-spaced measurements. Also, in the real world, time series have missing observations or you may have multiple series with different frequencies: it’s can be useful to model these as unevenly-spaced.

Traces aims to make it simple to write *readable code* to:

**Wrangle**. Read, write, and manipulate unevenly-spaced time series data**Explore**. Perform basic analyses of unevenly-spaced time series data without making an awkward / lossy transformation to evenly-spaced representations**Convert**. Gracefully transform unevenly-spaced times series data to evenly-spaced representations

Traces was designed by the team at Datascope based on several practical applications in different domains, because it turns out unevenly-spaced data is actually pretty great, particularly for sensor data analysis.

## Quickstart: using traces¶

To see a basic use of traces, let’s look at these data from a light
switch, also known as *Big Data from the Internet of Things*.

The main object in traces is a TimeSeries, which you create just like a dictionary, adding the five measurements at 6:00am, 7:45:56am, etc.

```
>>> time_series = traces.TimeSeries()
>>> time_series[datetime(2042, 2, 1, 6, 0, 0)] = 0 # 6:00:00am
>>> time_series[datetime(2042, 2, 1, 7, 45, 56)] = 1 # 7:45:56am
>>> time_series[datetime(2042, 2, 1, 8, 51, 42)] = 0 # 8:51:42am
>>> time_series[datetime(2042, 2, 1, 12, 3, 56)] = 1 # 12:03:56am
>>> time_series[datetime(2042, 2, 1, 12, 7, 13)] = 0 # 12:07:13am
```

What if you want to know if the light was on at 11am? Unlike a python dictionary, you can look up the value at any time even if it’s not one of the measurement times.

```
>>> time_series[datetime(2042, 2, 1, 11, 0, 0)] # 11:00am
0
```

The `distribution`

function gives you the fraction of time that the
`TimeSeries`

is in each state.

```
>>> time_series.distribution(
>>> start=datetime(2042, 2, 1, 6, 0, 0), # 6:00am
>>> end=datetime(2042, 2, 1, 13, 0, 0) # 1:00pm
>>> )
Histogram({0: 0.8355952380952381, 1: 0.16440476190476191})
```

The light was on about 16% of the time between 6am and 1pm.

### Adding more data…¶

Now let’s get a little more complicated and look at the sensor readings from forty lights in a building.

How many lights are on throughout the day? The merge function takes
the forty individual `TimeSeries`

and efficiently merges them into
one `TimeSeries`

where the each value is a list of all lights.

```
>>> trace_list = [... list of forty traces.TimeSeries ...]
>>> count = traces.TimeSeries.merge(trace_list, operation=sum)
```

We also applied a `sum`

operation to the list of states to get the
`TimeSeries`

of the number of lights that are on.

How many lights are typically on during business hours, from 8am to 6pm?

```
>>> histogram = count.distribution(
>>> start=datetime(2042, 2, 1, 8, 0, 0), # 8:00am
>>> end=datetime(2042, 2, 1, 12 + 6, 0, 0) # 6:00pm
>>> )
>>> histogram.median()
17
```

The `distribution`

function returns a Histogram
that can be used to get summary metrics such as the mean or quantiles.

### It’s flexible¶

The measurements points (keys) in a `TimeSeries`

can be in any units as
long as they can be ordered. The values can be anything.

For example, you can use a `TimeSeries`

to keep track the contents
of a grocery basket by the number of minutes within a shopping trip.

```
>>> time_series = traces.TimeSeries()
>>> time_series[1.2] = {'broccoli'}
>>> time_series[1.7] = {'broccoli', 'apple'}
>>> time_series[2.2] = {'apple'} # puts broccoli back
>>> time_series[3.5] = {'apple', 'beets'} # mmm, beets
```

To learn more, check the examples and the detailed reference.

## More info¶

## Contributing¶

Contributions are welcome and greatly appreciated! Please visit the repository for more info.