================================ SEP 5 — SDyPy Unified timeseries ================================ :Authors: Wout Weijtjens , Janko Slavič , Domen Gorjup :Status: Draft :Type: Standards :Created: 2023-07-07 Abstract -------- To simplify the interface across packages a unified way to represent timeseries data is proposed. This proposal targets future input-output packages to return the data in this common style. Motivation and Scope -------------------- SDyPy is supporting individual user(s) driven packages. However, for ease of integration, to reduce the risk of errors, and long-term stability a unified way to represent timeseries data is enforced. The current proposal targets timeseries data in particular, whether it is for input and or output. A simple but generalistic structure is put forward to allow sufficient flexibility for various applications, yet guaranteeing consistent handling of the data. Future packages that provide import/export functionality for data from and to SDyPy will have to comply with this format. Reference implementation and access ................................... The compliance checker called for by this SEP now exists: the `sdypy-sep005 `_ package provides ``assert_sep005`` (and per-channel/field validators) that raise on a non-compliant timeseries object. It is surfaced on the SDyPy umbrella as ``sd.sep005`` for discoverability — note that ``sep005`` is a cross-cutting data-format standard, not a ``sdypy.*`` namespace sub-package, so it remains the standalone ``sdypy_sep005`` distribution that the backends themselves depend on. The same validator is re-exported by ``sdypy.FRF.assert_sep005``. .. code-block:: python import sdypy as sd sd.sep005.assert_sep005(timeseries) # raises ValueError/TypeError if non-compliant Detailed description -------------------- The current proposed structure of a timeseries is that of Python `dictionary`, representing a single timeseries, with possibly multiple rows of data ("channels") with common time-sampling parameters. .. code-block:: { "data": np.array([[1,2,3,4], [5,6,7,8]]), # Array of data shaped (n,) (or (m, n) for `m` channels) and `n` time-samples. "fs": float(125), # Sampling frequency in Hz. Optional if `time` specified. "name": "structure xyz", # Timeseries name "time": np.array([1,3,4,7]), # Optional: for equally spaced data: time-vector in seconds (shaped (n,)). "channel_name": ["1_x", "1_y"], # Optional: Name of the measured physical channel (or list of `m` channel names if `data` is of shape (m, n)). "quantity": "a", # Optional: "f", "a", "v", "d", "e", "s" for force, acceleration, velocity, displacement, strain, stress, respectively. "start_timestamp": "2023-06-29T15:00:00.000000", # Optional: timestamp in absolute time. Default format is ISO 8601. "start_timestamp_format": "%d/%m/%y %H:%M:%S.%f" # Optional: if custom format is used in `start_timestamp`, specify it here. "unit_str": "m/s²", # Optional: unit string (or list of `m` unit strings if `data` is of shape (m, n)). "unit_tex": "m/s$^2$", # Optional: LaTeX interpretation of `unit_str`. "...": '...' # Optional: additional data. } To represent multiple timeseries with potentially different time-sampling parameters or different quantities, a `list` of timeseries-object dictionaries should be used: .. code-block:: [ { # timeseries 1 "data": ..., "fs": ... }, { # timeseries 2 ... }, ] Compulsory fields ................. * ``data`` (np.array): 1D (single channel) or 2D (multiple channels with equal time sampling) array of timeseries data. * ``name`` (str): Unique name of the time-series. * ``unit_str`` (str): Physical measurement unit, can be left empty or None * Either ``fs`` (int or float) or ``time`` (np.array): For channels with an equidistant sampling, a sampling frequency in _Hz_ is required. Alternatively or a timevector ``time`` of equal length of the ``data`` is required. Optional fields ............... * ``channel_name`` (str or list of str): Physical measurement channel names. If ``data`` is of shape ``(m, n)``, representing ``m`` measured channels, ``channel_name`` must be of length ``m``. * ``unit_tex`` (str): LaTeX interpretation of `unit_str`. * ``quantity``: (str): "f", "a", "v", "d", "e", "s" for force, acceleration, velocity, displacement, strain, stress. * ``start_timestamp`` (str): Starting timestamp in absolute time and including timezone inf. The default format is ISO8601 which enables to use: ``datetime.isoformat()`` and ``datetime.fromisoformat()``. * ``start_timestamp_format`` (str): formatting for timestamp in case the default ISO8601 format is not used. Prohibited fields ................. The proposed timeseries format allows for any arbitrary field to be added to the channel information to serve the needs of any particular application. However, variations to the compulsory or optional fields listed above in case and/or the incorrect use of underscores (`_`) is prohibited to avoid confusion. Examples of prohibited field names such as ``Fs`` (erroneous case), ``starttimestamp`` (erroneous use of the underscore), are prohibited to avoid confusion. Following field names with names similar to the existing compulsory and/or optional fields are also **prohibited**, again to avoid confusion: ``unit``, ``timestamp``, ``signal``, ``sample_frequency``. Other requirements .................. Aside from the suggestions above following properties need to be verified * When ``time`` is defined, the vector should be of equal length as `data`. Discussion ---------- This proposal is open for discussion and suggestions. The discussion is here: - `Issue #15 `_ Copyright --------- This document has been placed in the public domain.