To extend HOSVD to PET image denoising and propose our HOSVD-based 3D PET images denoising method, we will first give a brief introduction about HOSVD in image denoising. Then, we will introduce the Anscombe variance-stabilizing transformation and its exact unbiased inverse. Finally, we will describe our method and its implementation.

### Non-local HOSVD denoising for 2D images

Similar to most denoising algorithms, the HOSVD denoising method is originally designed for the additive zero mean Gaussian noise. The main assumption in the denoising process is that the observed noisy image *Y* from a noise free image *X* with additive Gaussian noise *N* can be modeled as \(Y = X+N\), where *N* has a known variance \({\sigma }^2\) and zero mean.

To find a good estimate \(\tilde{X}\) of *X* from *Y*, the HOSVD denoising method first groups the similar patches for each reference image patch into a 3D stack. Then, the HOSVD transformation of the stack is performed to obtain its representation by the HOSVD bases and coefficients. Based on the uncorrelated properties of noise in the HOSVD transform domain, the coefficients are truncated by a hard thresholding operator, then one denoised estimate of this stack is obtained by the inverse HOSVD transform. Repeating this operation for each reference patch results in multiple estimates for each pixel. The final denoised image is obtained by averaging these estimates. The details of the standard HOSVD denoising process are described in the following.

Given a \(p\,{\times }\,p\) reference patch \(P_{ref}\) in the original noisy 2D image, *K* similar patches are found including \(P_{ref}\) and stacked into a 3D matrix \(Z\,{\in}\,R^{p\,{\times }\,p{\times }\,K}\). Then, the HOSVD of *Z* is performed and can be formulated as [38]

$$\begin{aligned} Z = S\,{\times }\,_{1}U^{(1)}\,{\times }\,_{2}U^{(2)}\,{\times }\,_{3}U^{(3)}, \end{aligned}$$

(1)

where \(U^{(1)},U^{(2)}\,{\in}\,R^{p\,{\times }\,p}\), and \(U^{(3)}\,{\in}\,R^{K\,{\times }\,K}\) are orthonormal unitary matrices. They are computed from the SVD of the unfolding matrices of *Z* along three directions respectively [38]. \({\times }_n\) stands for the *n*-mode tensor product. *S* is a 3D coefficient matrix of size \(p\,{\times}\,p\,{\times}\,K\). Eq. (1) indicates that the HOSVD can exploit signal sparsity across each dimension.

The coefficient matrix *S* can be computed as:

$$\begin{aligned} S = Z\,{\times }\,_{3}{U^{(3)}}^{T}\,{\times }\,_{2}{U^{(2)}}^{T}\,{\times }\,_{1}{U^{(1)}}^{T}, \end{aligned}$$

(2)

where *T* represents the transpose of a matrix. Then, stack *Z* can be filtered by a hard-threshold operation for *S*, and the threshold \(\tau\) can be determined as follows [39]:

$$\begin{aligned} \tau =\sigma \sqrt{2\log (p^2K)}. \end{aligned}$$

(3)

Thus, the truncated coefficient matrix \(\tilde{S}\) with entries \(\tilde{s}_i\) can be determined as follows:

$$\begin{aligned} {\tilde{s}_i} = \left\{ { \begin{array}{*{20}{c}} {{s_i}}&{}\quad{\mathrm{{for}}\, \left| {{s_i}} \right| \ge \tau }\\ 0&{}\quad{\mathrm{{for}}\, \left| {{s_i}} \right| < \tau } \end{array}} \right. \end{aligned}$$

(4)

where the threshold \(\tau\) is optimal from a statistical risk viewpoint for any orthonormal basis and for \(N{\sim }\mathcal {N}(0,\sigma ^2)\). This truncation of the HOSVD coefficients is based on the assumption that the underlying noise-free image can be sparsely represented by the HOSVD bases. Using (1), we can obtain an estimate \(\tilde{Z}\) of *Z* by following:

$$\begin{aligned} \tilde{Z} = \tilde{S}\,{\times }\,_1{U^{(1)}}\,{\times }\,_2{U^{(2)}}\,{\times }\,_3{U^{(3)}}. \end{aligned}$$

(5)

Repeating the above process for each reference patch in sliding window fashion, multiple estimates at each pixel will be obtained. The final filtered image is produced by averaging these estimates at each pixel.

To improve the denoising performance even further, an additional Wiener filter step after the standard HOSVD denoising process can be performed. Let \(\tilde{Z}\) be a stack of similar patches from the standard HOSVD filtered image, which corresponds to *Z* from the original noisy image, then the coefficients of \(\tilde{Z}\) and *Z* in the HOSVD bases of \(\tilde{Z}\) are denoted as \(\tilde{s}\) and *s* respectively. Then, the filtered coefficients of *Z*, denoted as \(\bar{s}\), are computed as follows:

$$\begin{aligned} \bar{s} = \frac{s{\tilde{s}}^2}{{{\tilde{s}}^2}{+}{{\sigma }^2}}. \end{aligned}$$

(6)

In this paper, the standard HOSVD and its Wiener filter-augmented version are referred to as HOSVD-S and HOSVD-W respectively.

### HOSVD-based denoising for 2D PET images

The above-mentioned HOSVD denoising for 2D images cannot be straightly extended to denoise PET images, as (1) for nonlinear reconstruction algorithms, usually, the exact noise model in reconstructed PET images is unknown, even though raw PET data follow the Poisson distribution, and (2) the intensity value at each voxel in PET images is the representation of the activity concentration (Bq/ml) in the corresponding region of the space, which is low, so the assumption of an image-independent noise model does not hold.

In fact, for images with low photon counts, such as a PET image or SPECT image, the pixel or voxel intensity can be modeled approximatively with the Poisson distribution, which follows the observations that the noise variation in PET images are multiplicative, asymmetric and varying in the image. In this paper, we approximate noise in reconstructed PET images with the Poisson model. Then, with the Anscombe root transformation [40] and its inverse, the HOSVD-based denoising algorithms can be used to denoise PET images. Here, we employ the closed-form approximation of the exact unbiased inverse of the Anscombe variance-stabilizing transformation [41] to alleviate the biased estimates caused by the nonlinearity of the Anscombe transformation. The Anscombe transformation and its inverse are denoted as \(AVST\) and \(IAVST\), respectively, which are formulated as follows:

$$\begin{aligned} AVST (I)=2\sqrt{I{+}3/8} \end{aligned}$$

(7)

and

$$\begin{aligned} IAVST (Z) = \frac{1}{4}Z^2+\frac{1}{4}\sqrt{\frac{3}{2}}Z^{-1}-\frac{11}{8}Z^{-2}+\frac{5}{8}\sqrt{\frac{5}{8}}Z^{-3}-\frac{1}{8}, \end{aligned}$$

(8)

where *I* represents the noisy PET image, and *Z* is the denoised data in the HOSVD transform domain.

The HOSVD-based denoising algorithms can then be extended to the denoising application of PET images by using the following formula:

$$\begin{aligned} \tilde{I}= IAVST ({Den}_{\scriptscriptstyle {HOSVD}}( AVST (I))), \end{aligned}$$

(9)

where \({Den}_{\scriptscriptstyle {HOSVD}}(\cdot )\) represents the HOSVD denoising manipulation, *I* and \(\tilde{I}\) are the noisy and denoised PET images, respectively.

### Weighted HOSVD denoising scheme for 3D PET images

The hard-threshold operation used in HOSVD-S may result quite different between reference and selected similar patches in structure; this is especially true when the exact structure is unavailable in the underlying clean images, as with PET image denoising. In such applications, straight use of HOSVD-S may lead to stair-step artifact in the denoised image, as shown in Fig. 1. Although the HOSVD-W can modify this disadvantage, its use also increases time consumption dramatically to about twice the time required for HOSVD-S.

To improve the denoising performance of the HOSVD while keeping the time consumption from dramatically increasing, we assign different weights to different selected similar patches, depending on their level of similarity with the reference patch, in structure and statistics. This is different to the HOSVD-S method, which assign identical weight to each selected similar patch. The similarity level in statistics is measured by the *p*-value output of the Kolmogorov-Smirnov (K-S) test between reference patch and the selected similar patch.

For the calculation of weights, we first consider the patch similarity measure used in the HOSVD-S. The distance between the three-dimension (3D) reference patch \(P_{ref}\) and its *i*th similar patch \(P_i\) can be formulated as:

$$\begin{aligned} d(P_{ref},P_i) = \sqrt{{\Vert P_{ref}-P_i\Vert }^2}. \end{aligned}$$

(10)

Note that the reference patch \(P_{ref}\) and a patch *P* within the search window is similar only if the Euclidean distance \(d(P_{ref},P)\) between them meets the following rule [36]:

$$\begin{aligned} d^2(P_{ref},P)<3{\sigma }^2p^3. \end{aligned}$$

(11)

Based on this, we calculate the similarity weight between \(P_i\) and \(P_{ref}\) as follows:

$$\begin{aligned} s(P_{ref},P_i) = \exp \left(-\frac{1}{2}d^2(P_{ref},P_i)\right), \end{aligned}$$

(12)

which means that patches with high similarity to the reference patch will be assigned a larger weight.

To alleviate the disadvantage caused by the original similarity measure, we used a hypothesis test (the one-sided Kolmogorov-Smirnov test) to check how well the values in \(P_{ref}-P_i\) conform to \(\mathcal {N}(0,2\sigma ^2)\), and the *p*-value output by the K-S tests are used to modify the weighting factor by a multiplication factor. We denote the *p* value output by the K-S test as \(p_{KS}(P_{ref},P_i)\), then the weight of patches that are similar to the reference patch \(P_{ref}\) can be formulated as follows:

$$\begin{aligned} c(P_{ref},P_i) = s(P_{ref},P_i){\cdot }p_{KS}(P_{ref},P_i). \end{aligned}$$

(13)

Thus, the *i*th 3D patch that is similar to \(P_{ref}\) can be formulated as follows:

$$\begin{aligned} \hat{P}_{i} = c(P_{ref},P_i)P_{i}. \end{aligned}$$

(14)

Then, the clustering 4D stack can be formulated as:

$$\begin{aligned} C(P_{ref}) = (P_{ref},\hat{P}_{1},\hat{P}_{2},...). \end{aligned}$$

(15)

Note that \(C(P_{ref})\) is a matrix of order four. The denoised 4D stack is formulated as:

$$\begin{aligned} \tilde{C}(P_{ref}) = Den_{\scriptscriptstyle {HOSVD}}(C(P_{ref})) = (\tilde{P}_{ref},\tilde{P}_{1},\tilde{P}_{2},...). \end{aligned}$$

(16)

The estimation \(\breve{P}_i\) of \(P_i\) can then be obtained as follows:

$$\begin{aligned} \breve{P}_i = \frac{1}{c(P_{ref},P_i)}\tilde{P}_i. \end{aligned}$$

(17)

Repeating the above process for each reference patch and averaging the estimates at each voxel, we can obtain the denoised image in the Anscombe transform domain. The final filtered PET image can then be produced by performing the \(IAVST (\cdot )\) operation. For clarity, the proposed weighted HOSVD denoising algorithm is referred to as HOSVD-KSW, and the flow diagram is shown in Fig. 2.

HOSVD-KSW not only preserves all the advantages of the HOSVD-S method, but suppresses the stair-step artifact caused by the HOSVD-S with similar time consumption.

### Practical implementation of the HOSVD-based denoising algorithm

In this work, three HOSVD-based methods are applied to denoise 3D PET images. The most time-consuming parts of HOSVD-based algorithms are 3D patch grouping and HOSVD transform. In the HOSVD-S and HOSVD-W implementation, the K-nearest-neighbors of the reference patch are found based on (11) and are then grouped into a stack. For HOSVD-KSW, the patches that correspond to the first K-maximum-weight are selected as the K-nearest-neighbors of the reference patch (including the reference patch itself) and similar patches obtained by using (11).

To improve the computation efficiency while preserving the denoising efficacy of algorithms, we restricted the size of the searching space of similar patches and the sliding step of the reference patch. On one hand, bigger search windows do not always lead to better results, and a further increase of the search window will increase time consumption, especially in 3D PET image denoising, where similar structures of a special reference patch are usually concentrated in a restricted area close to it. On the other hand, the algorithm execution time is linearly corrected with the number of reference patches selected in the implementation.

In our implementation of the HOSVD-based algorithms, the size of the search window is set to 20 voxels and the sliding step is set to \(p-1\) at each image dimension, where *p* is the size of the reference patch in each dimension.