pyscamp API documentation

pyscamp: Python bindings for SCAMP

selfjoin(a, m, **kwargs)

Computes the matrix profile for time series A.

abjoin(a, b, m, **kwargs)

For each subsequence in time series A, finds the nearest neighbor in time series B.

selfjoin_sum(a, m, **kwargs)

Returns the sum of the correlations above specified threshold (default 0) for each subsequence in a time series.

abjoin_sum(a, b, m, **kwargs)

For each subsequence in time series a, returns the sum of the correlations to subsequences in time series b above specified threshold (default 0).

selfjoin_knn(a, m, k, **kwargs)

[GPU ONLY, EXPERIMENTAL] Returns the approximate k nearest neighbors for each subsequence in a time series.

abjoin_knn(a, b, m, k, **kwargs)

[GPU ONLY, EXPERIMENTAL] For each subsequence in time series A, returns its approximate K nearest neighbors in time series B.

selfjoin_matrix(a, m, **kwargs)

[EXPERIMENTAL] Returns a pooled version of the distance matrix with HxW of [mheight x mwidth], pooling operation is max() for Pearson Correlation and min() for Euclidean Distance.

abjoin_matrix(a, b, m, **kwargs)

[EXPERIMENTAL] Returns a pooled version of the distance matrix with HxW of [mheight x mwidth], pooling operation is max() for Pearson Correlation and min() for Euclidean Distance.

autotune([devices, cache_path])

Run the SCAMP GPU kernel autotuner for the selected device(s) and persist the chosen kernel configurations to disk.

gpu_supported()

Returns true if both 1) The module was compiled with GPU support and 2) GPUs are available.

pyscamp.abjoin(a, b, m, **kwargs)

For each subsequence in time series A, finds the nearest neighbor in time series B.

Parameters:
  • a (1D array) – Time series, b will be queried for subsequences in a.

  • b (1D array) – Time series in which to search for matches for subsequences in a.

  • m (int) – Subsequence length to use for computing the matrix profile.

Returns:

A tuple. First element: The nearest neighbor distance of subsequences in a to time series b. Second element: The index (in b) of each nearest neighbor.

Return type:

Tuple of np.ndarray[float32] and np.ndarray[int32]

pyscamp.abjoin_knn(a, b, m, k, **kwargs)

[GPU ONLY, EXPERIMENTAL] For each subsequence in time series A, returns its approximate K nearest neighbors in time series B.

Parameters:
  • a (1D array) – Time series to compute the KNN matrix profile for.

  • b (1D array) – Time series in which to search for matches.

  • m (int) – Subsequence length to use for computing the matrix profile.

  • k (int) – Number of neighbors to return for each subsequence.

  • threshold (float, optional) – Correlation threshold [0,1] (Default 0), matches which have a correlation less than the threshold will be ignored

Returns:

List of tuples (col, row, distance) containing the matches (up to K) for each column of the distance matrix, col is the index in A, row is the index in B of the match, and d is the distance between the two subsequences.

Return type:

List of tuple[int, int, float]

pyscamp.abjoin_matrix(a, b, m, **kwargs)

[EXPERIMENTAL] Returns a pooled version of the distance matrix with HxW of [mheight x mwidth], pooling operation is max() for Pearson Correlation and min() for Euclidean Distance.

Parameters:
  • a (1D array) – Time series corresponding to the columns of the distance matrix.

  • b (1D array) – Time series corresponding to the rows of the distance matrix.

  • m (int) – Subsequence length to use for computing the matrix profile.

  • mheight (int, optional) – Height of the pooled distance matrix to output. Default 50.

  • mwidth (int, optional) – Width of the pooled distance matrix to output. Default 50.

  • threshold (float, optional) – Correlation threshold [0,1] (Default 0), matches which have a correlation less than the threshold will be ignored

Returns:

A 2D array of height mheight and width mwidth. This is a pooled version of the full distance matrix.

Return type:

2D array

pyscamp.abjoin_sum(a, b, m, **kwargs)

For each subsequence in time series a, returns the sum of the correlations to subsequences in time series b above specified threshold (default 0).

Parameters:
  • a (1D array) – Time series to compute matrix profile for.

  • b (1D array) – Time series to search for matches.

  • m (int) – Subsequence length to use for computing the matrix profile.

  • threshold (float, optional) – Correlation threshold [0,1] (Default 0), matches which have a correlation less than the threshold will be ignored

Returns:

For each subsequence in A, returns the sum of correlations above the specified threshold in B.

Return type:

np.ndarray[float64]

pyscamp.autotune(devices=None, cache_path='')

Run the SCAMP GPU kernel autotuner for the selected device(s) and persist the chosen kernel configurations to disk. Future pyscamp calls on the same machine will read these configurations from the cache and use them when launching GPU kernels.

A full sweep takes a few minutes on a recent GPU. The output is verbose so you can follow progress; pass cache_path to redirect the write elsewhere (e.g. for a sandboxed run).

Parameters:
  • devices (list[int], optional) – List of CUDA device IDs to tune. If empty (default), only device 0 is tuned – a full sweep takes O(minutes) and most multi-GPU boxes hold identical devices, so sweeping them all wastes wall time on identical configs. Pass devices=[0, 1, ...] explicitly to tune more than one (e.g. if you have two different GPU models).

  • cache_path (str, optional) – Filesystem path to read/write the cache from. Empty (default) resolves in this order: $SCAMP_AUTOTUNE_CACHE (if set, used verbatim), then $XDG_CACHE_HOME/scamp/autotune.txt (if set), then a platform-specific user dir ($HOME/.cache/scamp/autotune.txt on Linux/macOS; %LOCALAPPDATA%\scamp\autotune.txt on Windows). Parent directories are created automatically.

Returns:

Number of devices that were tuned.

Return type:

int

Raises:
  • RuntimeError – If pyscamp was built without CUDA support.

  • ValueError – If no CUDA devices are available.

pyscamp.gpu_supported()

Returns true if both 1) The module was compiled with GPU support and 2) GPUs are available.

pyscamp.selfjoin(a, m, **kwargs)

Computes the matrix profile for time series A.

Parameters:
  • a (1D array) – Time series to compute matrix profile for.

  • m (int) – Subsequence length to use for computing the matrix profile.

Returns:

A tuple containing the matrix profile as the first element and the indices as the second element.

Return type:

Tuple of np.ndarray[float32] and np.ndarray[int32]

pyscamp.selfjoin_knn(a, m, k, **kwargs)

[GPU ONLY, EXPERIMENTAL] Returns the approximate k nearest neighbors for each subsequence in a time series.

Parameters:
  • a (1D array) – Time series to compute the KNN matrix profile for.

  • m (int) – Subsequence length to use for computing the matrix profile.

  • k (int) – Number of neighbors to return for each subsequence.

  • threshold (float, optional) – Correlation threshold [0,1] (Default 0), matches which have a correlation less than the threshold will be ignored

Returns:

List of tuples (col, row, distance) containing the matches (up to K) for each column of the distance matrix, row is the index of the match, and d is the distance between the two subsequences.

Return type:

List of tuple[int, int, float]

pyscamp.selfjoin_matrix(a, m, **kwargs)

[EXPERIMENTAL] Returns a pooled version of the distance matrix with HxW of [mheight x mwidth], pooling operation is max() for Pearson Correlation and min() for Euclidean Distance.

Parameters:
  • a (1D array) – Time series to compute matrix profile for.

  • m (int) – Subsequence length to use for computing the matrix profile.

  • mheight (int, optional) – Height of the pooled distance matrix to output. Default 50.

  • mwidth (int, optional) – Width of the pooled distance matrix to output. Default 50.

  • threshold (float, optional) – Correlation threshold [0,1] (Default 0), matches which have a correlation less than the threshold will be ignored

Returns:

A 2D array of height mheight and width mwidth. This is a pooled version of the full distance matrix.

Return type:

2D array

pyscamp.selfjoin_sum(a, m, **kwargs)

Returns the sum of the correlations above specified threshold (default 0) for each subsequence in a time series.

Parameters:
  • a (1D array) – Time series to compute matrix profile for.

  • m (int) – Subsequence length to use for computing the matrix profile.

  • threshold (float, optional) – Correlation threshold [0,1] (Default 0), matches which have a correlation less than the threshold will be ignored

Returns:

For each subsequence in A, returns the sum of correlations above the specified threshold to other subsequences in A.

Return type:

np.ndarray[float64]