Shapelet Discovery for Time Series Classification (Byron Choi et al.)

Time series classification (TSC) has attracted considerable attention from both academia and industry. The classical approach to solving the TSC problem has been the whole series-based approach. Recently, time series shapelets (or simply, shapelets), which are discriminative subsequences, have been found effective for solving TSC. Importantly, shapelets themselves are time series subsequences, which are useful to explain the classification model to non-technical users (see Figure 1 for an illustration). The quality of shapelets is evidently crucial to both the accuracy and efficiency of the shapelet-based approach. Various innovative methods, including efficient pruning of poor quality or redundant subsequences, efficient heuristics and deep learning methods, can be investigated to discover shapelets. Our research can be applied to our homegrown human motion time series data, patients’ time series data, and energy consumption data, in addition to some well-known benchmark datasets.



Figure 1 (from left to right): Two shapelet examples, four time series of four different classes of human activities, the transformed representation of time series with respect to the two shapelets.


Objectives:


Findings So Far:

Time-series shapelets are discriminative subsequences, recently found effective for time series classification (TSC). It is evident that the quality of shapelets is crucial to the accuracy of TSC. However, to determine high-quality shapelet candidates for model building, existing studies are surprisingly simple. For example, there are studies that enumerate subsequences of some fixed lengths, or even randomly select some subsequences as shapelets candidates. The major bulk of computation is then on building the model from the candidates, which is computationally costly. Hence, we have proposed a novel efficient shapelets discovery method, called BSPCOVER [1], to discover a set of high-quality shapelets candidates for model building. BSPCOVER consists of filtering identical or similar candidates, and heuristic for computing candidates, among others. Our experimental results show that BSPCOVER speeds up the state-of-the-art methods by more than 70 times, and the accuracy is often comparable to or higher than existing works.

For multivariate time series classification, we investigated a deep learning approach. Specifically, we propose a novel method called ShapeNet [2], which embeds shapelet candidates from different lengths into the unified space for shaplets selection. A neural network is trained using our cluster-wise triplet loss, which considers the distance between an anchor and multiple positive (negative) samples and the distance among positive (negative) samples. Then, we compute representative and diversified final shapelets rather than directly using all the embeddings for model building to avoid a large fraction of computing non-discriminative shapelet candidates. A classical classifier (e.g., SVM) is then adopted. Our experimental results show that the accuracy of ShapeNet is the best of all the methods compared.


Selected Publications:

[1] Guozhong Li, Byron Choi, Jianliang Xu, Sourav S Bhowmick, Daphne N.Y Mah, and Grace L.H.Wong. IPS: Instance Profile for Shapelet Discovery for Time Series Classification. In Proceedings of the 38th IEEE International Conference on Data Engineering (ICDE), 2022. (to appear)

[2] Guozhong Li, Byron Choi, Jianliang Xu, Sourav S Bhowmick, Kwok-Pan Chun, and Grace L.H.Wong. Efficient Shapelet Discovery for Time Series Classification. IEEE Transactions on Knowledge and Data Engineering (TKDE), 34(3):1149-1163, 2022.

[3] Guozhong Li, Byron Choi, Jianliang Xu, Sourav S Bhowmick, Kwok-Pan Chun, and Grace L.H.Wong. ShapeNet: A Shapelet-Neural Network Approach for Multivariate Time Series Classification. In Proceedings of the AAAI Conference on Artificial Intelligence (AAAI). Vol. 35. No. 9. 2021.


For further information on this research topic, please contact Dr. Byron Choi.