Data Management

Data module for the Experimentalis library.

experimentalis.data.calculate_uncertainty(raw_data, method='default', indices_range=None, y_range=None, plot=False, graphing_options=None)

Calculates the uncertainty \(dy\) in some measurement series. Implicitly assumes that the uncertainty is already unknown (\(dy = \vec{0}\)), so if \(dy \neq \vec{0}\), then this will throw an error.

experimentalis.data.dataset_apply_selector(dataset, selector)

Applies a selector to all dimensions of the dataset simultaneously. These selectors can be filters, sorters, ranges, etc.

import numpy as np
from experimentalis.data import Dataset

data = Dataset(
    x  = [ 1, 2, 3 ],
    y  = [ 2, 4, 6 ],
    dx = [ 1/7, 1/6, 1/6 ],
    dy = [ 1/5, 1/9, 1/5 ]
)

last_two_selector = slice(1, 3)
is_odd_selector   = data.y % 2 != 0

last_two_values = dataset_apply_selector(data, last_two_selector)
odd_values = dataset_apply_selector(data, is_odd_selector)
Parameters:
  • dataset (Dataset) – Dataset to be transformed

  • selector (slice or numpy.ndarray) – A NumPy-compatible selector. This can be a slice, an index array, or a boolean mask.

Returns:

A new dataset containing only the selected entries.

Return type:

Dataset

experimentalis.data.isolate_noise_uncertainty(raw_data)

Calculates uncertainty due to noise in a tuple dataset, calculated as the standard deviation over a period where the data should be uniform.

Returns:

The standard deviation over a uniform distribution.

Return type:

float

experimentalis.data.pack_dataset(dataset, packing_factor=100)

Downsamples a dataset by averaging consecutive blocks of data (packing).

Returns a new Dataset where consecutive blocks of size p are averaged. The uncertainty in \(y\) is reduced accordingly. Implicitly ignores \(dx\), so the function automatically throws a warning for any dataset where \(dx \neq \vec{0}\).

Mathematically, if our dataset \(D\) is

\[\vec{D} = (x_i, y_i, dx_i, dy_i), \quad i = 1, 2, \dots, N\]

and we have packing factor \(p\) (the block size), then for each packed point \(D'_j\) for \(j \in 1, 2, \dots, N/p\),

\[D'_j = (x'_j, y'_j, 0, dy'_j)\]

where

\[ \begin{align}\begin{aligned}x'_j = \frac{1}{p} \sum_{k=0}^{p-1} x_{(j-1)p + k},\\y'_j = \frac{1}{p} \sum_{k=0}^{p-1} y_{(j-1)p + k},\end{aligned}\end{align} \]

and the y-uncertainty becomes

\[dy'_j = \frac{dy}{\sqrt{p}}.\]
Parameters:
  • dataset (experimentalis.data.Dataset) – The dataset to be packed.

  • packing_factor (int) – The “packing factor” or block size

Returns:

A new dataset of the packed data

Return type:

experimentalis.data.Dataset

experimentalis.data.shear_dataset(dataset, n)

Autoamtically shears of the last n entries of the dataset.

Parameters:
  • dataset (experimentalis.data.Dataset) – Dataset to be sheared.

  • n (int) – The number of elements to remove.

Returns:

A new dataset with the last n elements removed.

Return type:

Dataset

experimentalis.data.sort_dataset(dataset)

Automatically sorts a dataset along the x-axis.

Parameters:

dataset (Dataset) – Dataset to be sorted

Returns:

A new dataset of the original sorted along the x-axis.

Return type:

Dataset

experimentalis.data.trim_dataset(dataset, trim_range, graphing_options=None, plot=False)

Trims a dataset (optionally with a visual helper) within a selected range. The trim range is the data designated to be kept, not the data to remove.

Parameters:
  • dataset (Dataset) – Dataset to be trimmed.

  • trim_range ((int, int)) – The trimming range

Returns:

A copy of the original dataset containing only the trimmed data.

Return type:

Dataset

class experimentalis.dataset.Dataset(x, y, dx=None, dy=None)

Bases: object

Datasets are tracker objects for two-dimensional coupled measurement, i.e., some independent measurement x and its uncertainty dx alongside a dependent measurement y on x and its uncertainty dy.

For example, a Dataset can typically be used for any one-dimensional time series, like stock prices, temperature, or precipitation chances. For the case of stock prices, x would be the time, dx would be the uncertainty in each clock reading (so, say, the instrumental lag between a measurement and the intended time delta), y would be the stock prices, and dy would be the uncertainty in the stock reading (once again due to lag or other issues).

dx
dy
x
y