# GPU based parallel acceleration for fast C-arm cone-beam CT reconstruction

- Ken Chen
^{1, 2}View ORCID ID profile, - Cheng Wang
^{1}, - Jing Xiong
^{1}and - Yaoqin Xie
^{1}Email author

**Received: **8 March 2018

**Accepted: **23 May 2018

**Published: **5 June 2018

## Abstract

### Background

With the introduction of Flat Panel Detector technology, cone-beam CT (CBCT) has become a novel image modality, and widely applied in clinical practices. C-arm mounted CBCT has shown extra suitability in image guided interventional surgeries. During practice, how to acquire high resolution and high quality 3D images with the real time requirement of clinical applications remain challenging.

### Methods

In this paper, we propose a GPU based accelerated method for fast C-arm CBCT 3D image reconstructions. A filtered back projection method is optimized and implemented with GPU parallel acceleration technique. A distributed system is designed to make full use of the image acquisition consumption to hide the reconstruction delay to further improve system performance.

### Results

With the acceleration both in algorithm and system design, we show that our method significantly increases system efficiency. The optimized GPU accelerated FDK algorithm improves the reconstruction efficiency. The system performance is further enhanced with the proposed system design by 26% and reconstruction delay is accelerated by 2.1 times when 90 frames of projections are used. When the number of frames used increases to 120, the numbers are 39% and 3.3 times. We also show that when the projection acquisition consumption increases, the reconstruction acceleration rate increases significantly.

## Keywords

## Background

With the introduction of Flat Panel Detector (FPD) technique, cone-beam computed tomography (CBCT) has become a novel image technology. FPD provides several theoretical advantages such as high space resolution, wide dynamic range, square FOV and real-time imaging capability with no geometric distortion [1]. Such good features enable CBCT to generate an entire volumetric data set in a single gantry rotation [2], and allows for verification of the delivered dose distribution [3]. The radiation dose is also reported to decrease [1, 4]. Therefore, CBCT has been widely applied in clinical applications in image guided surgery and interventional radiology, such as CBCT guidance of brachytherapy, spinal, orthopedics, thoracic and abdominal surgery [5–10]. Some groups reported that CBCT can achieve good performance in fenestrated/branched aortic endografting and small lung nodule percutaneous transthoracic needle biopsy. Some group showed that CBCT depicts considerably more small aneurysms and important anatomic details, and can be used as new gold standard in the detection of intracranial aneurysms [11, 12].

Besides, C-arm mounted CT shows especially suitable features for image guided interventions. The system is compact, therefore the patient can stay stationary during the image acquisition. Volumetric tomographic images can be combined and co-displayed with conventional 2D angiographic imaging, therefore pre-operative surgery planning, surgery device tracking and navigation, final result access and margins verification achievable [13, 14].

To acquire 3D volumetric images, several categories of algorithms are explored. One of the major category is the iterative algorithms such as ART combined with compress sensing theory using a Total Variation (TV) norm to regularize the cost function such as mentioned in [15, 16]. The main challenge of such algorithms is the cost of calculation. The time consuming process and high hardware requirement may limit their use in clinical applications. Therefore, FDK algorithm still seems to be a better choice for practical application. The filtered back projection algorithms can be further accelerated using GPU parallel techniques. From [17–19] we can see that some groups have made progress about accelerate FDK algorithm with GPU. In [18] the author reviewed how the GPU can be applied to almost every kind of image reconstruction algorithms. In [19] the author compared implementations of FDK method over different platforms to show a significant performance improvement. What is more, with a carefully designed distributed system, the algorithm can be run on high performance devices especially targeted to parallel acceleration, and the system delay can be further improved with latency hiding techniques.

In this paper, we propose a distributed system for c-arm mounted CBCT imaging system, and a GPU based acceleration method for fast CBCT reconstruction. As stated above, filtered back projection methods are more suitable for real time clinical 3D imaging acquisition than iterative optimization kind methods, and GPU parallel acceleration can be applied. Although the GPU parallel acceleration technique is not new, the acceleration plan can be further optimized with geometric symmetry and a proper system design. Therefore we propose to further optimize the FDK algorithm based on geometric symmetry, and implement it with GPU parallel acceleration techniques. We also propose to design a delay hiding scheme based on a distributed system layout connected via TCP/IP protocol, making full use of the projection acquisition consumption to hide the reconstruction delay. The rest of the paper is organized as follows: in “Methods” section we explain the details of system design, the GPU accelerated FDK algorithm implementation and latency hiding scheme. In “Experiment results and discussion” section, we show the reconstruction result and evaluate the system performance.

## Methods

### System design

To achieve a better acceleration effect a high performance GPU specially designed for computing task may be required, which may make ordinary hardware system not suitable. Besides, as the main constrain of the system efficiency, the reconstruction process is relatively an independent part of the image chain, which implies further change or update will not intervene with other parts of the system. Therefore, a pluggable computing unit with a distributed system architecture is favored. We briefly describe our system design as follows.

### GPU accelerated FDK algorithm

The nature of FDK algorithm is especially suitable for parallel acceleration. The main idea of GPU parallel acceleration technique is that the GPU provides far more arithmetic units than general purpose processors, and a stream processing scheme for high efficient parallel computing. For each element of an input stream data, a kernel is defined to carry out arbitrary calculations to produce an output stream data. Therefore GPU acceleration is especially suitable for pixel-wise operations, turning iterative loops of similar operations into parallel execution.

For FDK algorithm, the projection position calculation process to determine the projection position of each volume voxel on the flat panel detector plane, and the calculation of the weighting factor \({W_1}\) and \({W_2}\) for each volume voxel in the back projection procedure are most time consuming. However, these calculations are highly similar for each volume voxel, and there is no dependency between each voxel, so intuitively the voxel-wise iterative loop can be parallelized by assigning a kernel to each volume voxel to improve efficiency. We also observe that \({W_2}\) is only dependent on the projection coordinate on the detector plane, therefore the calculation of \({W_2}\) can be separated from the voxel wise calculation and treated as a filtering process before the back projection process. The stream processing scheme of the reconstruction for an arbitrary frame can be briefly described as Fig. 3.

### Latency hiding implementation

With the distributed system design, the efficiency of reconstruction process can be further boosted with a latency hiding technique. As stated in the last section, the only dependency of the reconstruction with an arbitrary projection frame is the acquisition of the frame, while the projection acquisition does not depend on the reconstruction result. Therefore the efficiency can be further improved on system level by designing a parallel control time sequence of the image chain to make full use of system time consumption such as C-arm rotation, image acquisition and processing, and data transmission, etc., which we define as the projection acquisition consumption. The control is designed as follows:

### Experiment design

We test our proposed method from two aspects. First we show the reconstruction result of our methods. We use a Shepp–Logan numeric phantom for quantitative evaluation of reconstruction accuracy. We also show reconstruction results for phantoms of blood vessel, head and foot respectively. Then we discuss the efficiency of our proposed method. We first discuss solely the reconstruction process by comparing our proposed method with other methods either with different methodology, or with different acceleration technique. We then evaluate the system performance enhancement by introducing two acceleration ratio. The first ratio, system performance ratio \({\beta _{sys}}\) represents the system performance boost by comparing the system overall delay of our proposed system and a linear image chain system, yielding \({\beta _{sys}} = 1 - {T_{prop}} / (T_{recon} + T_{acq})\), where \({{T_{prop}}}\) is the average of the measured system delay of our proposed system, and \({{T_{acq}}}\) and \({{T_{recon}}}\) are the average time consumption for projection acquisition and reconstruction process respectively. Another ratio, reconstruction acceleration ratio \({\beta _{recon}}\) aims to evaluate the reconstruction efficiency improvement provided by our proposed system, yielding \(\beta _{recon} = T_{recon} / (T_{prop} - T_{acq})\). The average is acquired over a test data set of 10 gantry rotations of our C-arm mounted CBCT.

### System and environment setup

We test our method on our designed C-arm imaging system. The C-arm DSD is 1000 mm, SAD is 500 mm. The X-ray source is imd X-RAY TUBE HEAD E-40R, the parameter is 65 kv 2 mA with an exposure time of 15 ms. The projection image has a dimension of \(1560 \times 1440\) pixels, with a \(0.18 \times 0.18\)-mm resolution, with the acquisition angle averagely covers a range of 210°. A Quadro 6000 is used for GPU acceleration, with 256 threads in parallel.

## Experiment results and discussion

### 3D reconstruction evaluation

We first discuss the 3D reconstruction result from our system, to show that our method does not compromise the reconstruction accuracy. We test our method on a numeric phantom for quantitative analysis by evaluating the reconstruction error with the ground truth. We also show reconstruction result of a blood vessel phantom, a head phantom and a foot phantom respectively, to show that our proposed method is capable of correctly reconstructing the interested structure from actual projection data acquired from a clinical practical C-arm CBCT.

### Efficiency analysis

Comparison of GPU, IPP accelerated FDK and GPU accelerated TV-ART algorithms time consumption \({{T_{recon}}}\)

Non-accelerated FDK | IPP FDK | GPUFDK | GPU ART | ||
---|---|---|---|---|---|

Frames | 90 | 90 | 90 | 120 | 120 |

Time | 964 s | 53.40 s | 14.28 s | 25.76 s | 32 min |

Summary of system delay \({{T_{prop}}}\) and projection acquisition cost \({{T_{acq}}}\)

System delay | Projection acquisition cost | |||||
---|---|---|---|---|---|---|

Frames | 90 | 120 case 1 | 120 case 2 | 90 | 120 case 1 | 120 case 2 |

Time (s) | 21.49 | 28.17 | 50.53 | 14.72 | 20.53 | 50.14 |

Summary of linear system delay \({{T_{linear}}}\), proposed reconstruction delay \({{T_{recon\_prop}}}\), system performance ratio \({\beta _{sys}}\) and reconstruction acceleration ratio \({\beta _{recon}}\)

Linear system delay (s) | Proposed reconstruction delay (s) | System performance ratio (%) | Reconstruction acceleration ratio | |
---|---|---|---|---|

90 frames | 29.00 | 6.78 | 26 | 2.1 |

120 frames LAN | 46.29 | 7.64 | 39 | 3.3 |

120 frame WAN | 75.90 | 0.39 | 33 | 66 |

## Conclusion

In this paper, we propose a GPU parallel acceleration based fast CBCT 3D reconstruction method. We describe how the FDK algorithm is parallelized, and also a control time sequence designed to further improve efficiency by hiding system latency. We can see that our proposed method significantly improves system performance. GPU parallel acceleration significantly improves the FDK reconstruction process. Our designed latency hiding scheme further improves the system performance. When 90 frames of projections are used for reconstruction, our proposed method improve system delay by 26% and the reconstruction delay by 2.1 times. When 120 frames are used, the numbers are 39% and 3.3 times. We also show that when the projection acquisition delay is dominant in the image chain, the reconstruction process can be almost fully hidden, yielding a significant improvement of reconstruction delay, which is 66 times in our case.

Although the quality of the reconstruction volume may suffer from the approximate nature of the filtered back projection algorithms compared with iterative algorithms such as ART, we show that the features of interest are acceptably preserved. A typical ART kind algorithm as described in [16] may take more than 40 min for reconstruction, while our proposed method only take 20+ seconds under the same circumstance. To trade off between image quality and real time requirement, our proposed method will be more suitable for clinical practice. A distributed system design with TCP/IP protocol makes the system pluggable and adaptive. With this design, algorithm and hardware update of reconstruction techniques can be fulfilled more easily, the system is also prepared for further expansion, such as multi-task support and distant network medical applications.

## Declarations

### Authors’ contributions

KC developed the algorithm and implemented the system, wrote the manuscripts. CW anticipated the system design and implementation, and helped the experiment. JX and YX oversaw the project. All authors read and approved the final manuscript.

### Acknowledgements

This work is supported partly by grants of National Key Research Program of China (Grant No. 2016YFC0105102), National Natural Science Foundation of China (No. 61403368), Union of Production, Study and Research Project of Guangdong Province (Grant No. 2015B090901039), Science Foundation of Guangdong (2017B020229002, 2014A030312006), Leading Talent of Special Support Project in Guangdong (2016TX03R139), Technological Breakthrough Project of Shenzhen City (Grant No. JSGG20160229203812944), Shenzhen High-level Oversea Talent Program Grant (KQJSCX20160301144248), Shenzhen Fundamental Research Project, Shenzhen Key Technical Research Project (JSGG20160229203812944) and Beijing Center for Mathematics and Information Interdisciplinary Sciences.

### Competing interests

The authors declare that they have no competing interests.

### Ethics approval and consent to participate

Not applicable.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

## Authors’ Affiliations

## References

- Hatakeyama Y, Kakeda S, Korogi Y, Ohnari N, Moriya J, Oda N, Nishino K, Miyamoto W. Intracranial 2d and 3d dsa with flat panel detector of the direct conversion type initial experience. Eur Radiol. 2006;16:2594–602.View ArticleGoogle Scholar
- Jaffray D, Siewerdsen J. Conebeam computed tomography with a flat-panel imager: initial performance characterization. Med Phys. 2000;27(6):1311–23.View ArticleGoogle Scholar
- Xing L, Thorndyke B, Schreibmann E, Yang Y, Li TF, Kim GY, Luxton G, Koong A. Overview of image-guided radiation therapy. Med Dosim. 2006;31(2):91–112.View ArticleGoogle Scholar
- Ishikura R, Ando K, Nagami Y, et al. Evaluation of vascular supply with cone-beam computed tomography during intraarterial chemotherapy for a skull base tumor. Radiat Med. 2006;24(5):384–7.View ArticleGoogle Scholar
- Siewerdsen JH, Jaffray DA, Edmundson GK, Sanders WP, Wong JW, Martinez A. Flat-panel cone-beam CT: a novel imaging technology for image-guided procedures. Proc SPIE. 2001;4319:435–44.View ArticleGoogle Scholar
- Jaffray DA, Siewerdsen JH, Edmundson GK, Wong JW, Martinez A. Flat-panel cone-beam CT on a mobile isocentric c-arm for image-guided brachytherapy. Proc SPIE. 2002;4682:209–17.View ArticleGoogle Scholar
- Siewerdsen JH, Moseley DJ, Burch S, Bisland SK, Bogaards A, Wilson BC, Jaffray DA. Volume CT with a flat-panel detector on a mobile, isocentric C-arm: pre-clinical investigation in guidance of minimally invasive surgery. Med Phys. 2005;32(1):241–54.View ArticleGoogle Scholar
- Khoury A, Whyne CM, Daly MJ, Moseley DJ, Bootsma G, Skrinskas T, Siewerdsen JH, Jaffray DA. Intraoperative cone-beam CT for correction of periaxial malrotation of the femoral shaft: a surfacematching approach. Med Phys. 2007;34(4):1380–7.View ArticleGoogle Scholar
- Siewerdsen JH, Chan Y, Rafferty MA, Moseley DJ, Jaffray DA, Irish JC. Cone-beam CT with a flat-panel detector on a mobile C-arm: pre-clinical investigation in image-guided surgery of the head and neck. In: Galloway RL, Cleary KR, editors. Medical imaging. Proceedings of SPIE, SPIE, Bellingham, vol. 5744; 2005. pp. 789–797Google Scholar
- Chan Y, Siewerdsen JH, Rafferty MA, Moseley DJ, Jaffray DA, Irish JC. Cone-beam, CT on a mobile C-arm: a novel intraoperative imaging technology for guidance of head and neck surgery. Proc SPIE. 2001;4319:435–44.View ArticleGoogle Scholar
- Karamessini MT, Kagadis GC, Petsas T, Karnabatidis D, Konstantinou D, Sakellaropoulos GC, Nikiforidis GC, Siablis D. CT angiography with three-dimensional techniques for the early diagnosis of intracranial aneurysms. Comparison with intra-arterial DSA and the surgical findings. Eur J Radiol. 2004;49(3):212–23.View ArticleGoogle Scholar
- van Rooij W, Sprengers M, de Gast A, Peluso J, Sluzewski M. 3d rotational angiography: the new gold standard in the detection of additional intracranial aneurysms. Am J Neuroradiol. 2008;29(5):976–9.View ArticleGoogle Scholar
- Orth RC, Wallace MJ, Kuo MD. C-arm cone-beam CT: general principles and technical considerations for use in interventional radiology. J Vasc Interv Radiol. 2008;19(6):814–20.View ArticleGoogle Scholar
- Floridi C, Radaelli A, Abi-Jaoudeh N, Grass M, Lin MD, Chiaradia M, Geschwind J-F, Kobeiter H, Squillaci E, Maleux G, Giovagnoni A, Brunese L, Wood B, Carrafiello G, Rotondo A. C-arm cone-beam computed tomography in interventional oncology technical aspects and clinical applications. Radiol Med. 2014;119(7):521–32.View ArticleGoogle Scholar
- Niu T, Ye X, Fruhauf Q, Petrongolo M, Zhu L. Accelerated barrier optimization compressed sensing (ABOCS) for CT reconstruction with improved convergence. Phys Med Biol. 2017;59(7):1801–14.View ArticleGoogle Scholar
- Park JC, Song B, Kim JS, Park SH, Kim HK, Liu Z, Suh TS, Song WY. Fast compressed sensing-based CBCT reconstruction using Barzilai–Borwein formulation for application to on-line IGRT. Med Phys. 2012;39(3):1207–17.View ArticleGoogle Scholar
- Sharp G, Kandasamy N, Singh H, Folkert M. GPU-based streaming architectures for fast cone-beam CT image reconstruction and demons deformable registration. Phys Med Biol. 2007;52(19):5771–83.View ArticleGoogle Scholar
- Despres P, Jia X. A review of GPU-based medical image reconstruction. Phys Medica. 2017;42:76–92.View ArticleGoogle Scholar
- Leeser M, Mukherjee S, Brock J. Fast reconstruction of 3D volumes from 2D CT projection data with GPUs. BMC Res Notes. 2014;7:582.View ArticleGoogle Scholar
- Feldkamp L, Davis L, Kress J. Practical cone-beam algorithm. J Opt Soc Am A. 1984;1(6):612–9.View ArticleGoogle Scholar