Combining dual-tree complex wavelets and multiresolution in iterative CT reconstruction with application to metal artifact reduction

Background This paper investigates the benefits of data filtering via complex dual wavelet transform for metal artifact reduction (MAR). The advantage of using complex dual wavelet basis for MAR was studied on simulated dental computed tomography (CT) data for its efficiency in terms of noise suppression and removal of secondary artifacts. Dual-tree complex wavelet transform (DT-CWT) was selected due to its enhanced directional analysis of image details compared to the ordinary wavelet transform. DT-CWT was used for multiresolution decomposition within a modified total variation (TV) regularized inversion algorithm. Methods In this study, we have tested the multiresolution TV (MRTV) approach with DT-CWT on a 2D polychromatic jaw phantom model with Gaussian and Poisson noise. High noise and sparse measurement settings were used to assess the performance of DT-CWT. The results were compared to the outcome of the single-resolution reconstruction and filtered back-projection (FBP) techniques as well as reconstructions with Haar wavelet basis. Results The results indicate that filtering of wavelet coefficients with DT-CWT effectively removes the noise without introducing new artifacts after inpainting. Furthermore, adoption of multiple resolution levels yield to a more robust algorithm compared to varying the regularization strength. Conclusions The multiresolution reconstruction with DT-CWT is also more robust when reconstructing the data with sparse projections compared to the single-resolution approach and Haar wavelets.

acceptance, affordability and accessibility of metallic restorations in forms of dental implants, fillings, crowns, screws, nails, prosthesis and plates in dentistry, and the increasing popularity of CBCT in image-guided therapy, dental CT specific metal artifact reduction (MAR) algorithms became a field of its own in the scientific research [3]. The attenuation of high density objects such as stainless steel, gold alloys, silver amalgam, platinum, lead, tin and aluminum, can corrupt the images of the underlying anatomical structures in dental CT, allowing fewer photons to reach detectors. This photon starvation corrupts the projection data, leading to streak artifacts over the surrounding tissue upon back-projection. These artifacts can reduce the applicability of dental CT by hindering the underlying anatomical structures [4]. For recent applications of MAR in the field of CT ranging from its use in positron emission tomography scans to spinal deformity correction in surgeries, see [5,6]. The latest comparison of the available MAR algorithms from the largest vendors have also been tested with a customized phantom by Chou et al. [7]. For the effectiveness of MAR with various metals in CT, readers can refer to [8].
The aim of MAR methods is to remove artifacts caused by the presence of metallic objects in the reconstructed images. MAR methods can be generally divided into two main categories: (1) interpolation/completion of projection data and (2) iterative reconstruction methods. The former approach is not sufficient in complicated cases such as multiple metals [9]. The combination of these two categories is also possible and it can further improve the reconstruction results. An overview of these methods is provided in [10].
Inpainting is one of the most commonly used projection completion methods due to its high computational efficiency [9]. It is an interpolation based method for filling the missing information in an image by interpolating the information surrounding it. Inpainting was introduced in signal processing by [11] and it has been widely used in MAR in projection domain [9,12] and wavelet domain [13]. In practice, inpainting replaces the gaps in the data with NaNs and then fill them by interpolating the intensity values surrounding the NaNs. The inpainting methods in this work was implemented via the code of John D'Errico [14]. 1 As the following multiresolution reconstruction method already is an iterative method, inpainting was chosen here instead of iterative approaches to optimize the efficiency of the algorithm. Although inpainting fills the gaps in an image efficiently, it can lead to secondary artifacts during analytic reconstruction due to discontinuities at the boundary pixels, e.g., at the metal-tissue boundary. In order to prevent such artifacts, we propose filtering the projection data in dual complex wavelet basis within a multiresolution framework, which combines inpainting [14] with iterative total variation (TV) reconstruction. This combination is motivated as complementary with respect to correcting the primary and secondary effects of the metals, that is, the missing data intensity profile and details, respectively. The multiresolution iterative total variation (MRTV) is an extension of the classical single-resolution TV iteration [15][16][17]. It utilizes a coarse-to-fine approach, in which the coarse image details are reconstructed before the finer ones to enhance the regularity, suppress the noise, and 1 https ://se.mathw orks.com/matla bcent ral/filee xchan ge/4551-inpai nt-nans. avoid the secondary artifacts after inpainting [18][19][20]. Namely, under missing data, only coarse level details might be distinguishable and methods not taking this into account might have a poor performance or numerical instability with respect to these details. The multiresolution decomposition in MRTV have been successfully applied in MAR to resolve some of such issues related to the existing methods [4,12,20]. In [20], a wavelet-based filtering for MAR was applied with CT data acquired for a hip joint prosthesis, and it was found to be effective in reducing the artifacts from beam hardening and photon starvation. Following a similar reasoning, we chose to use wavelet coefficients to distinguish different frequency components and filter the high-frequency artifacts caused by metals and noise without disturbing the edges of the object. For achieving the best possible performance, we applied the dual-tree complex wavelet transform (DT-CWT) [21][22][23]. The DT-CWT is based on two real discrete wavelet transforms (DWTs), which give the real and imaginary parts of the DT-CWT separately. As a directionally accurate transform, 2D DT-CWT can recognize the orientation of the image fluctuations, making it considerably less sensitive to the artifacts related to alteration or compression of the coefficients as compared to the classical wavelets, e.g., Daubechies or biorthogonal wavelets used in [20]. The complex wavelet transform (CWT) achieves perfect reconstruction and the dual-tree approach ensures this when decomposition level is greater than one [24]. In contrast to the ordinary 2D wavelet transform, which includes vertical, horizontal and diagonal direction modes, DT-CWT oversamples the target image with a doubled directional selectivity. Consequently, it distinguishes both ascending and descending curves in the image, whereas DWT does not. This is essential for preserving the reconstruction quality as good as possible. The advantages of DT-CWT was utilized within the multiresolution framework in order to achieve good noise filtering without filtering out the details in the image. In this study, our goal is to find out, how MRTV approach performs compared to the ordinary single-resolution TV (SRTV) regularization and also to the classical filtered back-projection (FBP) technique, which is used as the reference method to evaluate the performances of other methods presented here.
In the numerical experiments, the MRTV approach was found to stabilize the reconstructions compared to SRTV. Differences between the methods investigated were observed, especially, in regions of interest (ROIs) containing metals and their near surroundings. The influence of angular density on the reconstructions was studied by using different number of projections. The results with sparse projections would be relevant with respect to lowering the total radiation dose [25,26]. Additionally, the stability of the algorithm against the total number of projections could make it applicable for various CBCTs available on the market. For instance, in 2013, the number of projections acquired ranged from 180 to 1024. The Kodak CS 9300C CBCT device utilizes 180 projections for a total rotation angle of 180 degrees, while most devices deliver 360 projections per full angle rotation [27].

Results
The resulting images from the reconstructions are presented in Fig. 1. The secondary artifacts in FBP around ROI 2 are slightly less pronounced with the DT-CWT filtering step. These artifacts are almost completely vanished once multiresolution approach is combined with DT-CWT. The images reconstructed with Haar wavelets are so pixelized that it is not possible to evaluate the secondary artifacts. When images with the tooth within ROI 3 are visually assessed, the same observations for the ROI 2 still apply. Additionally, in SRTV, artifacts caused by single-resolution filtering are visible, but these artifacts are decreased by the increased penalty weight in SRTV-H. The contrast difference between the tooth and the inpainted metal is pronounced in the single resolution images and the FBP, whereas this difference is significantly less with MRTV and MRTV-H.
The quantitative evaluation of the results, using RMSE, PSNR and SSIM, is depicted in Table 1. For Configurations I (noisy) and II (noisy and sparse), the multiresolution approach with DT-CWT fared better compared to single-resolution approaches. In general, the filtering of wavelet coefficients in MRTV-F improved the RMSE and PSNR values for Configuration II. In Configuration I, however, the filtering deteriorated the PSNR and RMSE despite the marginal improvement in SSIM. Increasing the penalty weight in SRTV improved all quantitative parameters for Configurations I and II. Due to the pixelization in the reconstruction with Haar wavelets, its RMSE was higher than other methods even in the noiseless measurements. In the case of Configuration III (noiseless data), all the methods with DT-CWT yield to similar results due to the preliminary stage optimization of the reconstruction parameters. For dense projection data in Configuration I, the multiresolution with wavelets (both Haar and DT-CWT) performed better than single-resolution approaches in ROI 1. For the sparse projections in Configuration II, MRTV with DT-CWT outperformed the Haar wavelets.
The line profiles in Fig. 2 were calculated along the red line in Fig. 3. Based on these line profiles, it can be seen that the MRTV with wavelet filtering suppresses the noise better than SRTV with a high penalty (SRTV-H). The pixelization of the Haar wavelet reconstruction

Discussion
This study focused on enhancing the reconstruction quality of iterative regularization via the dual-tree complex wavelet transform (DT-CWT) [21][22][23] in dental CT, combined with multiresolution. Although FBP resulted in comparable values of RMSE and SSIM with complete data and low noise scenarios, the difference of the proposed method became apparent with sparse data. The central finding of this study was that the DT-CWT equipped MRTV inversion technique was more robust in terms of reduction of noise and artifacts for sparse data. This observation was supported by the numeric evaluations and visual comparisons. Although part of this robustness of the reconstruction compared to FBP can be attributed to TV penalization, the difference in error and similarity measures of Haar and DT-CWT point at the importance in selection of the coefficients to be filtered. Based on our results, DT-CWT provided virtually an artifact-free multiresolution basis, which can be observed based on the nearly identical outcome of MRTV and SRTV in the case of the noiseless data (Configuration III). The conventional wavelets used in the preliminary tests, in particular, the Haar basis [28], led to pixelization of the final reconstruction. That is, the correction steps for the finer resolutions did not match accurately enough with the coarse level estimate. Hence, DT-CWT was found to be vital for the appropriate function of MRTV. Some ringing effects were observed for the individual resolution levels, but, the final estimate did not suffer from ringing. Other potential multiresolution bases for MRTV are provided by ridgelets and curvelets [29][30][31] which similarly to DT-CWT cover an extended set of orientations compared to the classical wavelets.
Sinogram denoising with a 80 % hard threshold (MRTV-F) improved the RMSE values with sparse projections (Configuration II). However, the RMSE results of the dense projections with filtering were inferior to the outcome obtained with MRTV despite the improvement in SSIM, suggesting that some details were lost in the thresholding process along with some noise reduction. This suggests that additional denoising in single resolution is a not as effective technique recovery of the intensity values as employing a multiresolution decomposition in iterative reconstruction. We emphasize that present hard threshold filter in MRTV-F can be improved, e.g., via a soft threshold and regional adaptivity, especially, regarding the metal implants.
Using multiple resolution levels was also found to be preferable compared to controlling the regularization strength. With sparse projection data used in Configuration II, the SRTV-H performed equally well compared to MRTV in terms of RMSE, possibly due to the strong penalization of the noise. With SRTV-H, the overall image quality could be improved with respect to the artifacts by increasing the level of the regularization, but, with the cost of decreased image sharpness. The line profiles, however, showed a high positive bias for the tooth around the metal and lower intensity values for the metallic implant. In contrast, MRTV achieved an enhanced accuracy for the coarse details while maintaining the sharpness at the level of SRTV. Another important observation was that MRTV successfully reconstructed both 256 and 128 projection angles utilized in Configuration I and II, respectively. In general, the coarse-to-fine reconstruction approach seems to be advantageous regarding MAR, where reconstructing the implanted teeth accurately can be difficult due to the inpainted sinogram regions and, thereby, the incompleteness of the data. As suggested by the present study, recovering the coarse level fluctuations before the finer ones can result in a more accurate tooth boundaries than, if the whole image is reconstructed at once. This can be understood, since for the present inverse problem the numerical null space S − ε [19,32] is non-trivial and there is infinitely many candidate solutions which fit the incomplete data. Hence, in addition to TV, a multiresolution setting akin to the present one might work also with other reconstruction approaches. Note that it is possible to change the multiresolution levels depending on the spatial resolution of the image. For instance, for a 256 × 256 image, the resolution level would be 3, while 5 levels could be chosen for a 1024 × 1024 image.
An important direction for future work is to validate the present DT-CWT based MRTV approach in 3-dimensional clinical dental CT data. For that purpose, the current implementation of MRTV needs to be sped up. The matrix-based MRTV implementation of this study utilized only a single computing thread and was, thereby, far from optimal with respect to a multi-thread CPU performance. Consequently, it required several minutes of CPU time, whereas the FBP reconstruction could be obtained in a fraction of a second. A parallelized matrix-free implementation would obviously accelerate the MRTV. Another potential solution would be to employ a graphics processing unit (GPU) for the inverse computations instead of a CPU, which might enable a 10-100 times faster performance based on the general performance difference between GPUs and CPUs. An analogous computationally intensive future direction would be to finding optimized ways to grow the imaging resolution per se without remarkably extending the computing time. The denoising technique used in MRTV-F can also be improved in order to achieve optimal imaging results. In addition to the sinogram, also the reconstruction can be filtered using DT-CWT. This approach was omitted in this study, as it did not enhance the RMSE compared to MRTV in the preliminary tests. To fully understand the effects of the noise, for example, with respect to the instrument-specific factors, such as the interplay between the detector response and the beam hardening effects, it will be essential to use real experimental or clinical measurement data in the future studies.

Conclusion
In this work, we showed how DT-CWT can be applied in the tomographic reconstruction process via a multiresolution (coarse-to-fine) version of a classical TV regularization algorithm. The numerical experiments were aimed at minimizing the reconstruction errors due to the inpainting of metallic regions in the projection data. The multiresolution technique (MRTV) was compared to the single-resolution TV approach, for which a lower and higher regularization strength (SRTV and SRTV-H) was used. The results were also compared with reconstructions using Haar wavelet basis. Qualitative and quantitative results showed that data filtering with DT-CWT combined with multiresolution reconstruction is beneficial for recovering the details of images while reducing the noise with filtering at each resolution level. The robustness of the reconstruction with sparse projections using DT-CWT indicates the feasibility of these wavelets especially for sparse measurements. This could potentially help decreasing the radiation dose by reconstructing high quality images from sparse projection angles.

Dataset preparation
As the simulation dataset (Table 2), we used the density map (unit g/cm 3 ) of a twodimensional 1024 × 1024 pixel jaw phantom. This dataset was based on the FORBILD jaw phantom. 2 Metal (golden crown), teeth, jaw bone (cortical), soft tissue (modeled as water) and air gap inside the mouth were modeled with density values of 19.32, 2.99 (enamel), 1.92, 1.00 and 0 g/cm 3 , respectively. The locations for metallic implants in the image and projection domains can be seen in Fig. 3 as well as regions of interest (ROIs). In order to avoid committing "inverse crime" during the reconstruction, the sinogram was constructed on a fine grid of 1024 pixels, then reconstructed on a 512-pixel grid, similar to the approach of Nuyts et al. [33]. The projection data consisted of 768 radial bins and 256 angular views, covering 180 degrees. For a reference, industrial datasets might have a resolution of 600-pixels [25]. For modeling of the beam hardening, a polychromatic beam model was used. The beam hardening in this context refers to the "hardening" of the beam as it passes 2 http://www.imp.uni-erlan gen.de/forbi ld/engli sh/resul ts/index .htm. through the object being scanned, meaning that the lower energy rays are attenuated more than the higher energy ones. The hardening of the beam at the detector end is not modeled, as the algorithms of the manufacturers often account and correct for this effect already on the raw projection data. The energy dependent mass attenuation coefficients (with coherent scattering) of gold, bone, hard tissue and soft tissue were obtained from the National Institute of Standards and Technology (NIST) database. 3 The mass attenuation coefficient for the tooth was approximated using the material composition of enamel from [34] and NIST database. 4 The 80 kVp spectrum (half-value layer (Al) of about 5.5 mm) was used with 1 mm Al filtration from Fessler's IRT toolbox [35]. As the cone beam itself creates additional artifacts due to the shape of the beam, the parallel beam approach was selected for the construction of the system matrix. This allows one to evaluate the effectiveness of the MAR methods specifically on the artifacts created by the metals without the influence of the cone beam. The possible geometrical artifacts due to parallel beams were omitted here as the emphasis was on the effect of noise. Both Poisson and Gaussian noise were modeled in the sinogram construction, following the description of [36], which was also used in TIGRE Toolbox. 5 For Poisson noise, the total emitted photon count per pixel ( I 0 ) was taken as 10 5 and a zero mean additive Gaussian noise was used with standard deviation of 10. In order to maintain the generality of the model, the instrument-specific details such as the detector response were omitted in this study. Three different measurement settings were used to evaluate the algorithm's performance against noise and sparsity of measurements. In the first one (Configuration I), the number of projections was 256 with Poisson and Gaussian noise. In Configuration II, the noise model was the same, while a sparse pattern of 128 projections was applied to investigate the effects of the projection count which in some of the clinical scanners is fewer than in I [27]. In Configuration III, the projection pattern of I was used without the Gaussian noise to assess the performance of the single and multiresolution methods under more ideal conditions without changing the count statistics.
The metals were extracted by global thresholding from the projection data. For the sake of simplicity in evaluating the performance of the suggested methods, perfect segmentation of the metals was assumed. The gaps left on the sinogram after metal extraction were filled via inpainting.

Dual-tree complex wavelet transform
The ordinary real (orthogonal) DWT [28,37] is based on a low-and high-pass filter function φ : R → R and ψ : R → R which together enable decomposing a given signal f(t) as given by with α k and β k,ℓ denoting the so-called approximation and detail coefficients, respectively. The filter functions are orthogonal and normalized to one, i.e., the product between two different filter functions integrated over the real line is zero and Consequently, the coefficients α k and β k,ℓ can be obtained via the following integrals: Furthermore, the DWT conserves signal energy, meaning that the Parseval's identity holds: Together the coefficients can be organized into a tree-structured hierarchy of multiple resolution levels: each level has two branches, one for low-and one for high-pass filter coefficients.
The two-dimensional filter functions can be obtained as separable products between their one-dimensional counterparts, i.e., φ(x, y) = φ(x)φ(y) , ψ H (x, y) = φ(x)ψ(y) , ψ V (x, y) = ψ(x)φ(y) , and ψ D (x, y) = ψ(x)ψ(y) . The high-pass filters ψ H (x, y) , ψ V (x, y) , and ψ D (x, y) correspond to a horizontal, vertical and diagonal directional mode, respectively. Characteristic to the 2D DWT is that, due to their symmetry in the Fourier domain, these modes do not distinguish between upward and downward slopes in the image [23]. Consequently, DWT easily produces checkerboard-like dense and non-directional artifacts around edges, if the coefficients are altered or compressed. The lowest-order case of the DWT is constituted by the piecewise constant Haar wavelets which have been previously used together with TV in reconstruction [13,38]. Therefore, it was also used here for comparison.
In DT-CWT, the low-and high-pass filter function is assumed to be of the form where φ h (t), φ g (t), ψ h (t) , and ψ g (t) are real functions. The dual-tree structure follows as each of the pairs φ h (t), ψ h (t) and φ g (t), ψ g (t) forms a real-valued and orthogonal wavelet-tree. The two-dimensional high-pass filters of the DT-CWT have altogether six directional modes [23], corresponding to the real part of the separable products φ(x)ψ(y) , φ(x)ψ(y) , ψ(x)φ(y) , ψ(x)φ(y) , ψ(x)ψ(y) , and ψ(x)ψ(y) and the angular orientations of − 63, 63, − 27, 27, − 45, and 45 degrees with respect to the x-axis, respectively. Of these, the first two are nearly horizontal, 3rd and 4th one nearly vertical and the last two are diagonal.

Total variation regularization
The goal of any image reconstruction in a linear system is to invert the equation where x is the image to be reconstructed, the vector y contains the measurement (projection) data, the matrix L is a discretized Radon transform (Radon matrix). This system is an idealized expression for the signal attenuation and measurement process. It is introduced and used here for deriving the further mathematical equations. In fact, the entries of the Radon matrix contain some uncertainty, as the X-ray photon emission is a Poisson process, and n is a measurement noise term. A regularized solution of (6) can be obtained through the following: where Ŵ ℓ is a weighting matrix that satisfies Ŵ 0 = I and Ŵ ℓ = diag(|Dx ℓ | + γ I) −1 for ℓ ≥ 1 with a suitably chosen regularization parameter γ ≥ 0 . D is the regularization matrix given by with P i and P j denoting the boundary of the ith and jth pixel, respectively. Their intersection coincides with the edges shared by these pixels. The governing regularization parameter α determines the strength of the TV regularization. The roles of β and γ are mainly to ensure the invertibility of the matrices D and Ŵ ℓ so that the TV iteration does not diverge. The first term of D i,j in (8) penalizes the jumps over the pixel edges and the second one corresponds to the norm of x . In this work, β was fixed at 10 −8 . The conjugate gradient method was applied for matrix inversion with the number of steps fixed to 100. If this iteration converges, it minimizes the regularized objective function F (x) = �Lx − y� 2 2 + 2�Dx� 1 in which the l1 norm of Dx is the total variation of x , if β = 0 [39]. Consequently, the reconstructed image is likely to have large connected subsets close to constant, which helps to reduce noise, while preserving the edges. In this study, we call (7) the single-resolution TV (SRTV) approach. The SRTV-H refers to the stronger penalization of TV with a larger α value.

Multiresolution TV regularization
We propose approaching MAR via a multiresolution TV (MRTV) technique, that is, a coarse-to-fine extension (see Appendix) of the algorithm in (7). To explain this idea, we introduce the following definition of the numerical null-space [19,32]: Here ε denotes the floating-point accuracy, which is mainly concentrated on the fine image fluctuations. We assume that the target spaces of the wavelet low-and high-pass filter pair provide approximations of the space of strongly suppressed image details S − ε and that of the well-detectable details S + ε = {0} ∪ {x | �Lx� > ε�x�} , respectively. These spaces decompose the candidate solution space as given by R n = S + ε ⊕ S − ε . The aim of the coarse-to-fine approach is to separate S + ε and S − ε in the reconstruction process in order to maximize the distinguishability of the details belonging to S − ε . Processing the coarse details before the finer ones can approximately separate the strongly suppressed fluctuations of S − ε from the well-detectable ones belonging to the space S + ε = {0} ∪ {x | �Lx� > ε�x�} . The low-and high-pass wavelet filters can be obtained via a wavelet decomposition by zeroing all the high-pass and low-pass coefficients, respectively. In other words, the reconstruction of each wavelet level helps in separating the fine image details from the undesired components of the image such as noise and artifacts.

Numerical experiments
The present reconstruction approach was validated with numerical experiments using the jaw phantom described earlier. The reconstruction procedure included the following four stages: 1. Detecting the metals in the sinogram via global thresholding, 2. Laplacian smoothed inpainting of the metals using the algorithm in [14], 3. DT-CWT denoising with a given hard threshold percent (0% or 80%), 4. Inversion of the data via the MRTV, MRTV-F, SRTV, SRTV-H, or FBP technique.
The hard threshold refers to the percentage of the smallest wavelet coefficients which are set to zero. It aims to further reduce the noise in the sinogram before reconstruction. In MRTV-F, with 80% threshold, only the largest 20% of the wavelet coefficients were used in the reconstruction. The DT-CWT was used in the inversion stage (4) to obtain the multiresolution decomposition for MRTV.
The regularization parameter values were chosen empirically. MRTV, MRTV-F and SRTV were optimized for Configuration III. The minimal level of regularization sufficient to suppress any staircase patterns was sought for SRTV. The regularization strength applied in the case of MRTV was matched roughly with that of SRTV. In SRTV-H, slightly higher value of α was used for an enhanced noise tolerance. For SRTV and SRTV-H, it was necessary to choose γ > 0 , and it was set to γ = 10 −2 . For MRTV, the optimal performance was obtained with γ = 0 . The number of MRTV and SRTV iteration steps taken in computing a single reconstruction was set to be three.
The number of nested resolution levels used in MRTV computations and denoising was set to four. The multiresolution inverse estimates computed without and with DT-CWT denoising are referred to as MRTV and MRTV-F, respectively. The regularization parameter α was chosen empirically as 4. MRTV results were compared with FBP and x (ψ i )