Last updated: 2022-07-19
Checks: 7 0
Knit directory:
false.alarm/docs/
This reproducible R Markdown analysis was created with workflowr (version 1.7.0). The Checks tab describes the reproducibility checks that were applied when the results were created. The Past versions tab lists the development history.
Great! Since the R Markdown file has been committed to the Git repository, you know the exact version of the code that produced these results.
Great job! The global environment was empty. Objects defined in the global environment can affect the analysis in your R Markdown file in unknown ways. For reproduciblity it’s best to always run the code in an empty environment.
The command set.seed(20201020)
was run prior to running the code in the R Markdown file.
Setting a seed ensures that any results that rely on randomness, e.g.
subsampling or permutations, are reproducible.
Great job! Recording the operating system, R version, and package versions is critical for reproducibility.
Nice! There were no cached chunks for this analysis, so you can be confident that you successfully produced the results during this run.
Great job! Using relative paths to the files within your workflowr project makes it easier to run your code on other machines.
Great! You are using Git for version control. Tracking code development and connecting the code version to the results is critical for reproducibility.
The results in this page were generated with repository version f86f334. See the Past versions tab to see a history of the changes made to the R Markdown and HTML files.
Note that you need to be careful to ensure that all relevant files for the
analysis have been committed to Git prior to generating the results (you can
use wflow_publish
or wflow_git_commit
). workflowr only
checks the R Markdown file, but you know if there are other scripts or data
files that it depends on. Below is the status of the Git repository when the
results were generated:
Ignored files:
Ignored: .Renviron
Ignored: .Rhistory
Ignored: .Rproj.user/
Ignored: .devcontainer/exts/
Ignored: .docker/
Ignored: .github/ISSUE_TEMPLATE/
Ignored: .httr-oauth
Ignored: R/RcppExports.R
Ignored: _regime_change/meta/process
Ignored: _regime_change/meta/progress
Ignored: _regime_change/objects/
Ignored: _regime_change/scratch/
Ignored: _regime_change/user/
Ignored: _regime_optimize/meta/meta2
Ignored: _regime_optimize/meta/process
Ignored: _regime_optimize/meta/progress
Ignored: _regime_optimize/objects/
Ignored: _regime_optimize/user/
Ignored: _targets/meta/process
Ignored: _targets/meta/progress
Ignored: _targets/objects/
Ignored: _targets/user/
Ignored: analysis/shiny/rsconnect/
Ignored: analysis/shiny_land/rsconnect/
Ignored: dev/
Ignored: inst/extdata/
Ignored: papers/aime2021/aime2021.md
Ignored: papers/epia2022/epia2022.md
Ignored: presentations/MEDCIDS21/MEDCIDS21-10min_files/
Ignored: presentations/MEDCIDS21/MEDCIDS21_files/
Ignored: presentations/Report/Midterm-Report_cache/
Ignored: presentations/Report/Midterm-Report_files/
Ignored: protocol/SecondReport_cache/
Ignored: protocol/SecondReport_files/
Ignored: protocol/_files/
Ignored: renv/python/
Ignored: renv/staging/
Ignored: src/RcppExports.cpp
Ignored: src/RcppExports.o
Ignored: src/contrast.o
Ignored: src/false.alarm.so
Ignored: src/fft.o
Ignored: src/mass.o
Ignored: src/math.o
Ignored: src/mpx.o
Ignored: src/scrimp.o
Ignored: src/stamp.o
Ignored: src/stomp.o
Ignored: src/windowfunc.o
Ignored: thesis/Rplots.pdf
Ignored: thesis/_bookdown_files/
Ignored: tmp/
Untracked files:
Untracked: output/work_output_202110.rds
Note that any generated files, e.g. HTML, png, CSS, etc., are not included in this status report because it is ok for generated content to have uncommitted changes.
These are the previous versions of the repository in which changes were made
to the R Markdown (analysis/blog-202204.Rmd
) and HTML (docs/blog-202204.html
)
files. If you’ve configured a remote Git repository (see
?wflow_git_remote
), click on the hyperlinks in the table below to
view the files as they were in that past version.
File | Version | Author | Date | Message |
---|---|---|---|---|
Rmd | 03d1e68 | Francisco Bischoff | 2022-07-19 | Squashed commit of the following: |
html | 5927668 | Francisco Bischoff | 2022-04-17 | Build site. |
Rmd | ba0c9e1 | Francisco Bischoff | 2022-04-17 | refactor blog |
These last couple of months were dedicated to:
Version | Author | Date |
---|---|---|
0f2f487 | Francisco Bischoff | 2022-03-03 |
Back at the beginning of this thesis, we talked about using some kind of filtering during signal acquisition to remove artifacts and disconnected cables. This was described in the section “Preparing the data” on the report.
The initial approach to selecting the “complexity” formula1 (from now on, we will change this name to “complex”) was based on a few experiments, and I felt it deserves a more scientific approach.
We will use the terms ‘noise’ and ‘artifact’ interchangeably for simplicity.
Several signal quality indexes (SQI) were used to assess the ECG signal’s noise in the literature. A brief list of them:
\[ complex = \sqrt{\sum{(\nabla data)^2}} \tag{4} \]
Version | Author | Date |
---|---|---|
d9dc8ec | Francisco Bischoff | 2022-03-08 |
\[ Kurtosis = \frac{1}{n}\sum_{i=1}^{n}{\left(\frac{x_i - \bar x}{\sigma}\right)^4} - 3 \tag{5} \]
This is not an exhaustive list. Also, we want a simple SQI as we must use the smallest processor and memory possible. For this reason, we will do the experiments with the following SQI on that list: Activity, Mobility, Complexity, and complex. In addition, we will experiment with another simple index, the signal’s amplitude. For the baseline, we will use the “maximum” of the clean signal that will be naïvely “true” if the signal gets above a certain reading.
Let’s take the first 12 files from Physionet’s Challenge dataset, in alphabetic order, from a103l
to a165l
. By manual inspection, we have the following files as negative for artifacts (clean)
#4, #7, #8, #10, #11, #12. And for positive for artifacts the following #1, #2, #3, #5, #6, #9.
Fig. 2 shows a sample of records containing artifacts and a clean signal and an example of some SQI. We need an index with a low value when the signal has low noise and a high value when there is a noisy signal (that we will not process).
Version | Author | Date |
---|---|---|
ba0c9e1 | Francisco Bischoff | 2022-04-17 |
First, we need to specify how to evaluate the performance of the SQIs, without a hard labeled annotation, i.e., only knowing that one series has artifacts (but we don’t know where) and other series doesn’t. Another piece of information by quick inspection is that all 12 time series are “clean” from data points 30500 to 32500. This gives us a hint of the starting threshold for what is “clean”.
As shown in Fig. 2, the SQI fluctuates where there is a clean QRS complex. The threshold must be above this fluctuation.
To be more robust in defining such threshold \(\theta\), instead of finding the maximum value, we will take the quantile 0.9 of those values and then multiply the largest sample by some constant \(\epsilon\) (by default \(\epsilon = 1.1\)):
\[ \theta = \max{(Q_{SQI}(0.9))} \cdot \epsilon, \quad \{\epsilon \in \mathbb{R}^+_*\} \tag{6} \]
What is expected is that we will have no values above the threshold on the negative set. And on the positive set, we will have values above, where there is noise.
Another aspect we must consider is the normalization of the time series and the gain value. The Physionet’s data, in general, has the information about the gain used to convert the signal from mV to the digital format. Afterward, when importing the dataset, their tool divides the digital values by the gain, converting them back to mV. This should, in theory, give us signals on a similar scale, but in some recordings, the gain seems to introduce some differences (maybe the patient had a weak signal or the wrong gain was used?). This is only a problem if we do not normalize (mean zero and standard deviation one) the signal before computing the SQI.
It is essential to say that the normalization step is done when computing the Matrix Profile. Thus, we don’t want to add yet another normalization. This means we must use the window size of the Matrix Profile, while the filter does not necessarily need to use that same window size. This has implications for comparing Mobility, Complexity2, and Kurtosis because they are invariant to applying or not the gain on the data (since all have the data in numerator and denominator). They would also be invariant to normalization only if both the normalization and the SQI use the same window size.
Furthermore, as we can see below, we can anticipate that the remaining SQIs are invariant to the gain \(g\) applied directly to the data or on the SQI itself.
The Activity (variance):
\[ \begin{eqnarray} Activity(\vec{X}) &=& Var\left(\frac{(x_i, x_{i+1}, \ldots, x_n)}{g}\right) \\ &\equiv& \frac{Var(x_i, x_{i+1}, \ldots, x_n)}{g^2} \end{eqnarray} \quad, \{ \vec{X} \in \mathbb{R}, g \in \mathbb{R} \mid g \gt 0 \} \tag{7} \]
The complex, from Batista, et al.1:
\[ \begin{eqnarray} complex(\vec{X}) &=& \sqrt{\sum_{i=1}^n{\left[\nabla \frac{(x_i, x_{i+1}, \ldots, x_n)}{g}\right]^2}} \\ &\equiv& \frac{\sqrt{\sum_{i=1}^n{\left[\nabla (x_i, x_{i+1}, \ldots, x_n)\right]^2}}}{g} \end{eqnarray} \quad, \{ \vec{X} \in \mathbb{R}, g \in \mathbb{R} \mid g \gt 0 \} \tag{8} \]
The amplitude of the segments:
\[ \begin{eqnarray} A(\vec{X}) &=& \max{\left(\frac{(x_i, x_{i+1}, \ldots, x_n)}{g}\right)} - \min{\left(\frac{(x_i, x_{i+1}, \ldots, x_n)}{g}\right)} \\ &\equiv& \frac{\max{(x_i, x_{i+1}, \ldots, x_n)} - \min{(x_i, x_{i+1}, \ldots, x_n)}}{g} \end{eqnarray} \ , \{ \vec{X} \in \mathbb{R}, g \in \mathbb{R}\} \tag{9} \]
Thus, assuming these characteristics of each SQI, the combinations needed for comparison are:
The gain, applied before or after normalization, doesn’t create another set of different results to be compared.
Using the Equation (6), we set a threshold \(\theta\) using the subsets we know have no artifacts.
Now we will evaluate the first aspect of each algorithm, which is the variability of values that can (in theory) be our negative threshold. Assuming a perfect scenario, we may suggest that the lower this variability, the better (and stable) the algorithm is across the dataset.
The SQI doesn’t have a defined unit. Thus, to compare different SQIs, we must define a standard
range for normalization. Except for Kurtosis (that returns actually the “excess” of Kurtosis), all
studied algorithms have a minimum of zero
. The maximum value will be taken empirically using a
value that will multiply the computed threshold into an optimized threshold that minimizes the false
positives. These “multipliers” are listed in Table 1.
type | activity | ampl | complex | complexity | mobility | kurtosis | maximum |
---|---|---|---|---|---|---|---|
raw | 3.0 | 1.1 | 1.1 | 2.00000 | 1.500 | 2.00000 | 1.1 |
norm | 1.1 | 1.1 | 1.1 | 2.00000 | 1.562 | 2.00000 | 1.1 |
raw_gain | 3.0 | 1.1 | 1.1 | 3.86198 | 1.100 | 2.17999 | 1.1 |
norm_gain | 1.1 | 1.1 | 1.1 | 3.14400 | 1.100 | 2.13179 | 1.1 |
Fig. 3 shows a boxplot of the threshold candidates obtained for each algorithm.
From these boxplots, we may state some observations that may or not be relevant to the final choice:
The need for tweaking the threshold: “activity” (norm and norm_gain), “amplitude” (all), “complex” (all), and “mobility” (norm_gain and raw_gain) didn’t need to be adjusted. “activity” (raw and raw_gain) needed an adjustment of x3 (rounded). The remaining needed a fine tweak. We can infer that SQIs that don’t need tweaks (or have a fixed value, as “activity”) are more robust.
The range of values: “activity_norm” seems to have the most compact distribution. While “kurtosis” (norm and raw) have a wide range of candidate values for the threshold. We can infer that SQIs that have lower variability are more robust.
Skewness: The use of “gain” seems to have an effect on skewness. Also, two different patterns are present, “Complexity”, “Kurtosis,” and “Mobility” seem to be more affected by gain, and the other SQIs follow a different pattern of distribution. We could infer that less skewness is better than a skewed distribution.
Now let’s analyze the values considered “artifacts” by the thresholds we’ve set.
Fig. 4 shows the histograms of all ‘negative’ records for each SQI. The threshold divides the histogram into ‘artifact’ and ‘clean’ classes. We assume that the artifacts detected here are, in fact, false positives as we are looking in the ‘negative’ records.
Fig. 5 shows each record individually. Each line is a different SQI, and each column one of the six records.
Fig. 6 shows the histograms of all ‘positive’ records for each SQI. The best scenario, in this case, would be having an SQI that results in two distinct distributions with a clear boundary between them. Clearly, this is not the case, but we may infer some characteristics like 1) does the threshold have some gap between the artifact and clean distribution? 2) does the distribution has some “inflection point” that may suggest a threshold point? 3) does the positive results contain all the true artifacts? This may be better analyzed by looking at the next figure (Fig. 7).
Fig. 7 shows each record individually. As we can see, some SQIs are not able to detect the ‘artifact’ class in all records: “Activity”, “Amplitude,” and “complex” (norm_gain); “Complexity” (all); “Kurtosis” and “Mobility” (gain).
Finally, as we set the threshold to give us a lower false-positive rate, we want a good recall.
First, let’s show how the false positives are distributed in the “negative” records. The following plots show the false positives of each SQI. The upper plot of Fig. 8 is a concatenation of the “negative” records with a margin of 100 samples before and after any detection. The figure is interactive and may be zoomed in for inspection.
Finally, Fig. 9 displays the detection in the “positive” records for each SQI. The figure is interactive and may be zoomed in for inspection. As before, the first plot is a concatenation of the records with a boundary of 1000 samples before and after any detection.
Fig. 10 shows the best SQIs only.
From Fig. 12, we can see that four algorithms worked similarly across the new positive set of records: activity_raw, activity_raw_gain, ampl_raw, and complex_raw.
It is worth mentioning that these algorithms are susceptible to the signal scale (not just the gain). The implication of this matter is that the threshold should be set according to the expected range of values reported by the hardware. For example, if all records are divided by the same factor, the threshold will change. Unfortunately, the algorithms invariant to the signal scale didn’t perform well on this task.
The baseline algorithm (maximum) have some similar pattern to the other algorithms, but at close inspection we see that the detected values are not consistent and a shift on the global mean is enough to trigger several false positives.
During this exercise, it was found that the dataset used on PhysioNet’s 2015 challenge is not fully normalized as other datasets also available on PhysioNet. For example, the Paroxysmal Atrial Fibrillation Challenge3 dataset has the raw data well constrained in the whole 16-bit range. One record also stood out on PhysioNet’s 2015 challenge for having an uncommon value for gain (a non-integer value) of 470.7mV. Using such a value, the range of physical values gets about +-56mV while the usually seen range is less than +-5mV. Was this a “typo”?
It is understandable that the dataset used on PhysioNet’s 2015 challenge is not fully normalized because it contains several wandering patterns and some artifacts that blow out the steady ECG signal range. Nevertheless, it is a factor that must be considered when implementing on a device.
At this point, we can’t tell for sure which algorithm is the best for filtering artifacts, as we don’t have a precise label of the artifacts position to acuratelly calculate their performances.
Further work still to be done in order to attempt to leverage the gain information and baseline shifts in order to improve the performance of the algorithms.
─ Session info ───────────────────────────────────────────────────────────────
setting value
version R version 4.2.1 (2022-06-23)
os Ubuntu 20.04.4 LTS
system x86_64, linux-gnu
ui X11
language (EN)
collate en_US.UTF-8
ctype en_US.UTF-8
tz Europe/Lisbon
date 2022-07-19
pandoc 2.17.0.1 @ /usr/bin/ (via rmarkdown)
─ Packages ───────────────────────────────────────────────────────────────────
package * version date (UTC) lib source
askpass 1.1 2019-01-13 [1] CRAN (R 4.2.0)
assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.2.0)
backports 1.4.1 2021-12-13 [1] CRAN (R 4.2.0)
base64url 1.4 2018-05-14 [1] CRAN (R 4.2.0)
bookdown 0.27.3 2022-07-06 [1] Github (rstudio/bookdown@900f921)
bslib 0.3.1 2021-10-06 [1] CRAN (R 4.2.0)
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.2.0)
callr 3.7.1 2022-07-13 [1] CRAN (R 4.2.1)
cli 3.3.0 2022-04-25 [1] CRAN (R 4.2.0)
codetools 0.2-18 2020-11-04 [2] CRAN (R 4.2.0)
colorspace 2.0-3 2022-02-21 [1] CRAN (R 4.2.0)
conflicted 1.1.0 2021-11-26 [1] CRAN (R 4.2.0)
crayon 1.5.1 2022-03-26 [1] CRAN (R 4.2.0)
credentials 1.3.2 2021-11-29 [1] CRAN (R 4.2.0)
crosstalk 1.2.0 2021-11-04 [1] CRAN (R 4.2.0)
data.table 1.14.2 2021-09-27 [1] CRAN (R 4.2.0)
DBI 1.1.3 2022-06-18 [1] CRAN (R 4.2.0)
debugme 1.1.0 2017-10-22 [1] CRAN (R 4.2.0)
devtools 2.4.3 2021-11-30 [1] CRAN (R 4.2.0)
digest 0.6.29 2021-12-01 [1] CRAN (R 4.2.0)
dplyr * 1.0.9 2022-04-28 [1] CRAN (R 4.2.0)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.2.0)
evaluate 0.15 2022-02-18 [1] CRAN (R 4.2.0)
fansi 1.0.3 2022-03-24 [1] CRAN (R 4.2.0)
farver 2.1.1 2022-07-06 [1] CRAN (R 4.2.0)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.2.0)
fs 1.5.2 2021-12-08 [1] CRAN (R 4.2.0)
generics 0.1.3 2022-07-05 [1] CRAN (R 4.2.0)
gert 1.6.0 2022-03-29 [1] CRAN (R 4.2.0)
getPass 0.2-2 2017-07-21 [1] CRAN (R 4.2.0)
ggplot2 * 3.3.6 2022-05-03 [1] CRAN (R 4.2.0)
git2r 0.30.1.9000 2022-04-29 [1] Github (ropensci/git2r@80ba185)
gittargets * 0.0.3.9000 2022-04-29 [1] Github (wlandau/gittargets@13a9cd8)
glue * 1.6.2 2022-02-24 [1] CRAN (R 4.2.0)
gridExtra 2.3 2017-09-09 [1] CRAN (R 4.2.0)
gtable 0.3.0 2019-03-25 [1] CRAN (R 4.2.0)
here * 1.0.1 2020-12-13 [1] CRAN (R 4.2.0)
highr 0.9 2021-04-16 [1] CRAN (R 4.2.0)
htmltools 0.5.2 2021-08-25 [1] CRAN (R 4.2.0)
htmlwidgets 1.5.4 2021-09-08 [1] CRAN (R 4.2.0)
httpuv 1.6.5 2022-01-05 [1] CRAN (R 4.2.0)
httr 1.4.3 2022-05-04 [1] CRAN (R 4.2.0)
igraph 1.3.3 2022-07-15 [1] CRAN (R 4.2.1)
jquerylib 0.1.4 2021-04-26 [1] CRAN (R 4.2.0)
jsonlite 1.8.0 2022-02-22 [1] CRAN (R 4.2.0)
kableExtra * 1.3.4 2021-02-20 [1] CRAN (R 4.2.0)
knitr 1.39 2022-04-26 [1] CRAN (R 4.2.0)
labeling 0.4.2 2020-10-20 [1] CRAN (R 4.2.0)
later 1.3.0 2021-08-18 [1] CRAN (R 4.2.0)
lazyeval 0.2.2 2019-03-15 [1] CRAN (R 4.2.0)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.2.0)
magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.2.0)
memoise 2.0.1 2021-11-26 [1] CRAN (R 4.2.0)
munsell 0.5.0 2018-06-12 [1] CRAN (R 4.2.0)
openssl 2.0.2 2022-05-24 [1] CRAN (R 4.2.0)
pillar 1.7.0 2022-02-01 [1] CRAN (R 4.2.0)
pkgbuild 1.3.1 2021-12-20 [1] CRAN (R 4.2.0)
pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.2.0)
pkgload 1.3.0 2022-06-27 [1] CRAN (R 4.2.0)
plotly * 4.10.0 2021-10-09 [1] CRAN (R 4.2.0)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.2.0)
processx 3.7.0 2022-07-07 [1] CRAN (R 4.2.1)
promises 1.2.0.1 2021-02-11 [1] CRAN (R 4.2.0)
ps 1.7.1 2022-06-18 [1] CRAN (R 4.2.0)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.2.0)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.2.0)
Rcpp 1.0.9 2022-07-08 [1] CRAN (R 4.2.1)
remotes 2.4.2 2021-11-30 [1] CRAN (R 4.2.0)
renv 0.15.5 2022-05-26 [1] CRAN (R 4.2.0)
rlang 1.0.4 2022-07-12 [1] CRAN (R 4.2.1)
rmarkdown 2.14.3 2022-06-23 [1] Github (rstudio/rmarkdown@d23e479)
rprojroot 2.0.3 2022-04-02 [1] CRAN (R 4.2.0)
rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.2.0)
rvest 1.0.2 2021-10-16 [1] CRAN (R 4.2.0)
sass 0.4.1 2022-03-23 [1] CRAN (R 4.2.0)
scales 1.2.0 2022-04-13 [1] CRAN (R 4.2.0)
sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.2.0)
stringi 1.7.8 2022-07-11 [1] CRAN (R 4.2.1)
stringr 1.4.0 2019-02-10 [1] CRAN (R 4.2.0)
svglite 2.1.0.9000 2022-04-29 [1] Github (r-lib/svglite@d673908)
sys 3.4 2020-07-23 [1] CRAN (R 4.2.0)
systemfonts 1.0.4 2022-02-11 [1] CRAN (R 4.2.0)
tarchetypes * 0.6.0 2022-04-19 [1] CRAN (R 4.2.0)
targets * 0.12.1 2022-06-03 [1] CRAN (R 4.2.0)
tibble * 3.1.7 2022-05-03 [1] CRAN (R 4.2.0)
tidyr 1.2.0 2022-02-01 [1] CRAN (R 4.2.0)
tidyselect 1.1.2 2022-02-21 [1] CRAN (R 4.2.0)
usethis 2.1.6 2022-05-25 [1] CRAN (R 4.2.0)
utf8 1.2.2 2021-07-24 [1] CRAN (R 4.2.0)
uuid 1.1-0 2022-04-19 [1] CRAN (R 4.2.0)
vctrs 0.4.1 2022-04-13 [1] CRAN (R 4.2.0)
viridisLite 0.4.0 2021-04-13 [1] CRAN (R 4.2.0)
webshot 0.5.3 2022-04-14 [1] CRAN (R 4.2.0)
whisker 0.4 2019-08-28 [1] CRAN (R 4.2.0)
withr 2.5.0 2022-03-03 [1] CRAN (R 4.2.0)
workflowr * 1.7.0 2021-12-21 [1] CRAN (R 4.2.0)
xfun 0.31 2022-05-10 [1] CRAN (R 4.2.0)
xml2 1.3.3 2021-11-30 [1] CRAN (R 4.2.0)
yaml 2.3.5 2022-02-21 [1] CRAN (R 4.2.0)
[1] /workspace/.cache/R/renv/proj_libs/false.alarm-d6f1a0d1/R-4.2/x86_64-pc-linux-gnu
[2] /usr/lib/R/library
──────────────────────────────────────────────────────────────────────────────