CACHET_CADB¶

class torch_ecg.databases.CACHET_CADB(db_dir: Optional[Union[str, pathlib.Path]] = None, working_dir: Optional[Union[str, pathlib.Path]] = None, verbose: int = 1, **kwargs: Any)[source]¶

Bases: torch_ecg.databases.base._DataBase

CACHET-CADB: A Contextualized Ambulatory Electrocardiography Arrhythmia Dataset

ABOUT

The database has 259 days of contextualized ECG recordings from 24 patients and 1,602 manually annotated 10 s heart-rhythm samples.
The length of the ECG records in the CACHET-CADB varies from 24 h to 3 weeks.
The patient’s ambulatory context information (activities, movement acceleration, body position, etc.) is extracted for every 10 s interval cumulatively.
nearly 11% of the ECG data in the database is found to be noisy.
Webpages for downloading the database 1 and the short-format database 2, see also the GitHub repository 3.

Usage

ECG arrhythmia detection
Self-Supervised Learning

References

1: https://data.dtu.dk/articles/dataset/CACHET-CADB/14547264
2: https://data.dtu.dk/articles/dataset/CACHET-CADB_Short_Format/14547330
3: https://github.com/cph-cachet/cachet-ecg-db/

Citation

10.3389/fcvm.2022.893090 10.11583/DTU.14547264 10.11583/DTU.14547330

Parameters

db_dir (str or pathlib.Path, optional) – Storage path of the database. If not specified, data will be fetched from Physionet.
working_dir (str, optional) – Working directory, to store intermediate files and log files.
verbose (int, default 1) – Level of logging verbosity.
kwargs (dict, optional) – Auxilliary key word arguments

property all_subjects: List[str]¶: List of all subject IDs.

property database_info: torch_ecg.databases.base.DataBaseInfo¶: The DataBaseInfo object of the database.

property df_metadata: pandas.core.frame.DataFrame¶: The table of metadata of the records.

download(files: Optional[Union[str, Sequence[str]]]) → None[source]¶

Download the database from the DTU website.

Parameters: files (str or Sequence[str], optional) – Files to download, can be subset of “CACHET-CADB.zip”, “cachet-cadb_short_format_without_context.hdf5.zip”. If is None, all files will be downloaded.

get_absolute_path(rec: Union[str, int], extension: str = 'signal-ecg') → pathlib.Path[source]¶

Get the absolute path of the signal folder of the record.

Parameters

rec (str or int) – Record name or index of the record in all_records.
extension (str, default "signal-ecg") – Extension of the file, can be one of “header”, “annotation”, “signal”, “annotation-context”, “signal-ecg”, “signal-acc”, “signal-angularrate”, “signal-hr_live”, “signal-hrvrmssd_live”, etc.

Returns

Absolute path of the file.

Return type

pathlib.Path

get_record_metadata(rec: Union[str, int]) → Dict[str, str][source]¶

Get metadata of the record.

Parameters: rec (str or int) – Record name or index of the record in all_records, or “short_format” (-1) to load data from the short format file.
Returns: metadata – Metadata of the record
Return type: dict

get_subject_id(rec: Union[str, int]) → str[source]¶

Attach a unique subject ID for the record.

Parameters: rec (str or int) – Record name or index of the record in all_records.
Returns: sid – Subject ID attached to the record.
Return type: str

get_subject_info(rec_or_sid: Union[str, int], items: Optional[List[str]] = None) → Dict[str, str][source]¶

Read auxiliary information of a subject (a record) stored in the header files.

Parameters

rec (str or int) – Record name, or index of the record in all_records, or the subject ID.
items (List[str], optional) – Items of the subject”s information (e.g. sex, age, etc.).

Returns

subject_info – Information about the subject, including “age”, “gender”, “height”, “weight”.

Return type

dict

load_ann(rec: Union[str, int], ann_format: str = 'pd') → Union[pandas.core.frame.DataFrame, numpy.ndarray, Dict[Union[int, str], numpy.ndarray]][source]¶

Load annotation from the metadata file.

Parameters

rec (str or int) – Record name or index of the record in all_records.
ann_format (str, default "pd") – Format of the annotation, currently only “pd” is supported.

Returns

ann – The annotation of the record.

Return type

pandas.DataFrame or numpy.ndarray or dict

load_context_ann(rec: Union[str, int], sheet_name: Optional[str] = None) → Union[pandas.core.frame.DataFrame, Dict[str, pandas.core.frame.DataFrame]][source]¶

Load context annotation.

Parameters

rec (str or int) – Record name or index of the record in all_records.
sheet_name (str, optional) – Sheet name of the context annotation file, can be one of “movisens DataAnalyzer Parameter”, “movisens DataAnalyzer Results”. If is None, all sheets will be loaded.

Returns

context_ann – Context annotations of the record.

Return type

pandas.DataFrame or dict

load_context_data(rec: Union[str, int], context_name: str, sampfrom: Optional[int] = None, sampto: Optional[int] = None, channels: Optional[Union[str, int, List[str], List[int]]] = None, units: Optional[str] = None, fs: Optional[numbers.Real] = None) → Union[numpy.ndarray, pandas.core.frame.DataFrame][source]¶

Load context data (e.g. accelerometer, heart rate, etc.).

Parameters

rec (str or int) – Record name or index of the record in all_records.
context_name (str) – Context name, can be one of “acc”, “angularrate”, “hr_live”, “hrvrmssd_live”, “movementacceleration_live”, “press”, “marker”.
sampfrom (int, optional) – Start index of the data to be loaded.
sampto (int, optional) – End index of the data to be loaded.
channels (str or int or List[str] or List[int], optional) – Channels (names or indices) to be loaded. If is None, all channels will be loaded.
units (str, optional) – Units of the output signal, currently can only be “default”; None for digital data, without digital-to-physical conversion.
fs (numbers.Real, optional) – Sampling frequency of the output signal. If not None, the loaded data will be resampled to this frequency, otherwise, the original sampling frequency will be used.

Returns

context_data – Context data in the “channel_first” format.

Return type

numpy.ndarray or pandas.DataFrame

Note

If the record does not have the specified context data, empty array or DataFrame will be returned.

load_data(rec: Union[str, int], sampfrom: Optional[int] = None, sampto: Optional[int] = None, data_format: str = 'channel_first', units: Optional[str] = 'mV', fs: Optional[numbers.Real] = None, return_fs: bool = False) → Union[numpy.ndarray, Tuple[numpy.ndarray, numbers.Real]][source]¶

Load physical (converted from digital) ECG data, or load digital signal directly.

Parameters

rec (str or int) – Record name or index of the record in all_records, or “short_format” (-1) to load data from the short format file.
sampfrom (int, optional) – Start index of the data to be loaded.
sampto (int, optional) – End index of the data to be loaded.
data_format (str, default "channel_first") – Format of the ECG data, “channel_last” (alias “lead_last”), or “channel_first” (alias “lead_first”), or “flat” (alias “plain”).
units (str or None, default "mV") – Units of the output signal, can also be “μV” (aliases “uV”, “muV”); None for digital data, without digital-to-physical conversion.
fs (numbers.Real, optional) – Sampling frequency of the output signal. If not None, the loaded data will be resampled to this frequency, otherwise, the original sampling frequency will be used.
return_fs (bool, default False) – Whether to return the sampling frequency of the output signal.

Returns

data (numpy.ndarray) – The loaded ECG data.
data_fs (numbers.Real, optional) – Sampling frequency of the output signal. Returned if return_fs is True.

plot(rec: Union[str, int], **kwargs: Any) → None[source]¶: Not implemented.

property subject_records: Dict[str, List[str]]¶: Dict of subject IDs and their corresponding records.

property url: Dict[str, str]¶: URL(s) for downloading the database.