TY - JOUR
T1 - GPU-based streaming architectures for fast cone-beam CT image reconstruction and demons deformable registration
AU - Sharp, G. C.
AU - Kandasamy, N.
AU - Singh, H.
AU - Folkert, M.
PY - 2007/9/21
Y1 - 2007/9/21
N2 - This paper shows how to significantly accelerate cone-beam CT reconstruction and 3D deformable image registration using the stream-processing model. We describe data-parallel designs for the Feldkamp, Davis and Kress (FDK) reconstruction algorithm, and the demons deformable registration algorithm, suitable for use on a commodity graphics processing unit. The streaming versions of these algorithms are implemented using the Brook programming environment and executed on an NVidia 8800 GPU. Performance results using CT data of a preserved swine lung indicate that the GPU-based implementations of the FDK and demons algorithms achieve a substantial speedup - up to 80 times for FDK and 70 times for demons when compared to an optimized reference implementation on a 2.8 GHz Intel processor. In addition, the accuracy of the GPU-based implementations was found to be excellent. Compared with CPU-based implementations, the RMS differences were less than 0.1 Hounsfield unit for reconstruction and less than 0.1 mm for deformable registration.
AB - This paper shows how to significantly accelerate cone-beam CT reconstruction and 3D deformable image registration using the stream-processing model. We describe data-parallel designs for the Feldkamp, Davis and Kress (FDK) reconstruction algorithm, and the demons deformable registration algorithm, suitable for use on a commodity graphics processing unit. The streaming versions of these algorithms are implemented using the Brook programming environment and executed on an NVidia 8800 GPU. Performance results using CT data of a preserved swine lung indicate that the GPU-based implementations of the FDK and demons algorithms achieve a substantial speedup - up to 80 times for FDK and 70 times for demons when compared to an optimized reference implementation on a 2.8 GHz Intel processor. In addition, the accuracy of the GPU-based implementations was found to be excellent. Compared with CPU-based implementations, the RMS differences were less than 0.1 Hounsfield unit for reconstruction and less than 0.1 mm for deformable registration.
UR - http://www.scopus.com/inward/record.url?scp=34748901307&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=34748901307&partnerID=8YFLogxK
U2 - 10.1088/0031-9155/52/19/003
DO - 10.1088/0031-9155/52/19/003
M3 - Article
C2 - 17881799
AN - SCOPUS:34748901307
SN - 0031-9155
VL - 52
SP - 5771
EP - 5783
JO - Physics in medicine and biology
JF - Physics in medicine and biology
IS - 19
M1 - 003
ER -