5 Scientific contributions
5.1 Matrix Profile
Since the first paper presenting this new concept22, lots of investigations were made to speed up its computation. It is notable how all computations are not dependent on the rolling window size as previous works not using Matrix Profile. Aside from this, we can see that the first STAMP22 algorithm has the time complexity of \(O(n^2log{n})\) while STOMP66 \(O(n^2)\) (a significant improvement), but STOMP lacks the “any-time” property. Later SCRIMP67 solves this problem keeping the same time complexity of \(O(n^2)\). Here we are in the “exact” algorithms domain and we will not extend the scope for conciseness.
The main issue with the algorithms above is the dependency on a fast Fourier transform (FFT) library. FFT has been extensively optimized and architecture/CPU bounded to exploit the most of speed. Also, padding data to some power of 2 happens to increase the efficiency of the algorithm. We can argue that time complexity doesn’t mean “faster” when we can exploit low-level instructions. In our case, using FFT in a low-power device is overkilling. For example, a quick search over the internet gives us a hint that computing FFT on a 4096 data in an ESP32 takes about 21ms (~47 computations in 1 second). This means ~79 seconds for computing all FFT’s (~3797) required for STAMP using a window of 300. Currently, we can compute a full matrix of 5k data in about 9 seconds in an ESP32 MCU (Fig. 4.3), and keep updating it as fast as 1 min of data (at 250hz) in just 6 seconds.
Recent works using exact algorithms are using an unpublished algorithm called MPX, which computes the Matrix Profile using cross-correlation methods ending up faster and is easily portable.
On computing the Matrix Profile: the contribution of this work on this area is adding the Online capability to MPX, which means we can update the Matrix Profile as new data comes in.
On extending the Matrix Profile: the contribution of this work on this area is the use of an unexplored constraint that we could apply on building the Matrix Profile we are calling Similarity Threshold (ST). The original work outputs the similarity values in Euclidean Distance (ED) values, while MPX naturally outputs the values in Pearson’s correlation coefficients (CC). Both ED and CC are interchangeable using the equation (5.1). However, we may argue that it is easier to compare values that do not depend on the window size during an exploratory phase. MPX happens to naturally return values in CC, saving a few more computation time. The ST is an interesting factor that we can use, especially when detecting pattern changes during time. The FLOSS algorithm relies on counting references between indexes in the time series. ST can help remove “noise” from these references since only similar patterns above a certain threshold are referenced, and changes have more impact on these counts. The best ST threshold is still to be determined.
\[\begin{equation}
CC = 1 - \frac{ED}{(2 \times WindowSize)} \tag{5.1}
\end{equation}\]
5.2 Regime change detection
In the original paper, in chapter 3.5, the authors of FLOSS wisely introduce the temporal constraint, which improves the sensitivity of regime change detection on situations where a regime may alternate in short periods of time.
Nevertheless, the authors declare the correction curve typically used on the algorithm as “simply a uniform distribution”, but this is not an accurate statement. the Arc Counts of newly incoming data are truncated by the same amount of temporal constraint. This prevents completely the detection of a regime change in the last 10 seconds as this thesis requires.
The main contribution of this work on this area is overcoming this issue by computing the theoretical distribution using the temporal constraint parameters beforehand. as shown in Fig. 5.1. That gives us enough data to evaluate a regime change accurately utilizing a minimum of \(2 \times WindowSize\) datapoints.