Entropy came from a field called thermodynamics, a branch of physics. It was initially proposed as a state function of a thermodynamic system which depends only on the current state while is independent of how the state has been achieved. Later on, this macroscopic concept was found to be meaning uncertainty or disorder microscopically measuring the possible number of microscopic states in which the system could be arranged.
Being directly analogous to the microscopic thermodynamic definition, information entropy was defined to quantify the average information content to be expected from an event, or, in other words, the uncertainty or unpredictability of the state of an event. In time-series analysis field, this concept had triggered the idea of assessing the unpredictability of the evolution of dynamical systems, specifically the Kolmogorov entropy of time-series (or Kolmogorov–Sinai entropy, a specific case of Kolmogorov entropy with the time delay factor being equal to unity) [25].
The calculation of Kolmogorov–Sinai entropy was highly noise-sensitive and needed to solve limits, making it infeasible for real world applications. In 1991, Pincus proposed an approximation algorithm—approximate entropy (ApEn) [5] which showed reasonable robustness against noise and was relatively stable for medium length time-series [6, 26]. Since then, ApEn has been successfully applied to many physiological data and has helped gain a lot of valuable, additional insights into physiological controls [27,28,29,30,31,32]. It has also been introduced to mechanical and many other physical systems [33, 34]. In the meantime, investigators have also been aware of its deficiencies in terms of strong dependence on input parameters and unreliable performance in short-length data, and have thereby proposed a couple of solutions to improve its performance. In the subsections below, I will briefly summarize the algorithms of ApEn and two common refined ApEn metrics, namely sample entropy (SampEn) and fuzzy entropy (FuzzyEn). Besides, I will also introduce several other entropy-like metrics that were proposed in similar contexts.
ApEn and refined ApEn algorithms
Entropy metrics, in general, quantify the similarity of motifs (which are called vectors in the state space representation) as a proxy to the unpredictability or irregularity of a time-series. For a time-series of N points \(\mathbf u ={u(i), 1 \le i \le N}\), its m-dimension state space representation is
$$\begin{aligned} \mathbf X _{m}(i) = \{u(i), u(i+\tau ), \ldots , u(i+(m-1)\tau )\}, \end{aligned}$$
(1)
where \(1 \le i \le N-m\tau\) and \(\tau\) is the time delay parameter, which, together with the dimension parameter m, determine how well the state space reconstruction of the dynamical system is. In order to quantify whether two vectors, namely, \(\mathbf X _{m}(i)\) and \(\mathbf X _{m}(j)\), are similar, the Chebyshev distance between the two vectors is calculated as follows:
$$\begin{aligned} {d[\mathbf X _{m}(i), \mathbf X _{m}(j)] = \max _{0 \le k \le m-1}(|u(i+k) - u(j+k)|)}. \end{aligned}$$
(2)
The difference between ApEn and its refined algorithms lies basically in the means that is used to assess the overall similarity of each pair of vectors in the state space, which, in turn, leads to different performance.
ApEn
In ApEn, the percentage of the vectors \(\mathbf X _{m}(j)\) that are within r of \(\mathbf X _{m}(i)\) is calculated by \(C_{i}^{(m)}(r) = \frac{N_i^{(m)}(r)}{N-m\tau }\), where \(N_i^{(m)}(r)\) indicates the number of j’s that meet \(d_{i,j} \le r\), and \(1 \le j \le N-m\tau\). And then the average of the percentage over \(1 \le i \le N-m\tau\) after logarithmic transform is defined by \(\Phi ^{(m)}(r)=\frac{1}{N-m\tau }\sum _{i=1}^{N-m\tau }\ln C_i^{(m)}(r)\). In a similar way, \(\Phi ^{(m+1)}(r)\) is defined after increasing the dimension to \(m+1\). Then, the ApEn value of the time-series \(\mathbf u\) can be calculated by [5]:
$$\begin{aligned} \text {ApEn}(m,\tau ,r) = \Phi ^{(m)}(r) - \Phi ^{(m+1)}(r). \end{aligned}$$
(3)
SampEn
In SampEn, self-matches are excluded when calculating the percentage of the vectors \(\mathbf X _{m}(j)\) that are within r of \(\mathbf X _{m}(i)\), i.e., \(A_{i}^{(m)}(r) = \frac{N_i^{(m)}(r)}{N-m\tau -1}\), where \(N_i^{(m)}(r)\) indicates the number of j’s that meet \(d_{i,j} \le r\), and \(1 \le j \le N-m\tau , j\not =i\). The average of the percentage \(A_{i}^{(m)}(r)\) over \(1 \le i \le N-m\tau\) is then defined by \(\Psi ^{(m)}(r)=\frac{1}{N-m\tau }\sum _{i=1}^{N-m\tau }A_i^{(m)}(r)\). In a similar way, \(\Psi ^{(m+1)}(r)\) is defined after increasing the dimension to \(m+1\). The SampEn value of the time-series \(\mathbf u\) can be calculated by [8]:
$$\begin{aligned} \text {SampEn}(m,\tau ,r) = -\ln \frac{\Psi ^{(m+1)}(r)}{\Psi ^{(m)}(r)}. \end{aligned}$$
(4)
FuzzyEn
FuzzyEn is methodologically the same to SampEn except that it replaces the percentage of vectors \(\mathbf X _m(j)\) that are within r of \(\mathbf X _m(i)\) with the average degree of membership which offers reliability especially for short-length data. Specifically, for a given fuzzy membership function \(e^{-\ln (2)(x/y)^2}\), \(A_i^{(m)}=\frac{\sum _{j=1,j\not =i}^{N-m\tau }e^{-\ln (2)(d_{i,j}/r)^2}}{N-m\tau -1}\) is applied in FuzzyEn [11].
Other entropy-like metrics
Conditional entropy (CE)
CE evaluates the information carried by a new sampling point given the previous samples by estimating the Shannon entropy of the vectors with length m and vectors with new sampling point added (i.e., with length \(m+1\)) [7]. CE first coarse-grains the time-series \(\mathbf u ={u(i), 1 \le i \le N}\) with a quantification level of \(\xi\) (i.e., coarse-grain resolution is \(\frac{\max (u)-\min (u)}{\xi }\)). Instead of using the original time-series, it reconstructs the coarse-grained time-series into the state space. After the reconstruction, the signal motifs of length m and these of length \(m+1\) are codified in decimal format (i.e., the first element in the motif of length m has a weight of \(\xi ^{(m-1)}\) and so on), rendering the sequence of motifs series of integer numbers. The frequency of each possible integer value can then be calculated. CE is defined by the difference between the Shannon entropy of the motif of length \(m+1\) and that of the motif of length m. In [16], Shi et al. have a beautiful summary of the details of CE algorithm with mathematical formulas.
Permutation entropy (PermEn)
PermEn assesses the diversity of ordinal patterns within a time-series. For each signal motif of length m, it defines a permutation vector \({\varvec{\pi }}\) by indexing its elements in an ascending order. Then, the frequency of each permutation pattern \(\pi _{j}\,(1 \le j \le m!)\) can be calculated. The PermEn of the original time-series is defined by the Shannon entropy of permutation patterns [10, 16].
Distribution entropy (DistEn)
Instead of only calculating the probability of similar vectors (i.e., the percentage of the vectors \(\mathbf X _{m}(j)\) that are within r of \(\mathbf X _{m}(i)\)), DistEn takes full advantage of the matrix \(d_{i,j},1 \le i,j \le N-m\tau\) defined in SampEn algorithm by estimating the Shannon entropy of all distances. Specifically, the empirical probability density function of the distance matrix \(d_{i,j}\) except the main diagonal, i.e., \(i\not =j\), is first estimated by histogram approach with a fixed bin number B. If denoting the probability of each bin by \(\{p_t,t=1,2,\ldots ,B\}\), DistEn is then calculated by the formula for Shannon entropy [13]:
$$\begin{aligned} \text {DistEn}(m,\tau ,B)=-\frac{1}{\log _2(B)}\sum _{t=1}^{B}p_t\log _2(p_t). \end{aligned}$$
(5)
Parameters for entropy metrics
Common parameters shared by all these entropy metrics include the embedding dimension (or motif length) m and the time delay variable \(\tau\). ApEn, SampEn, and FuzzyEn also share a threshold parameter r. A quantification level \(\xi\) is needed in CE, and a bin number B is required in DistEn.