Efficient Fusion of Medical Images Based on CNN

Ali, Randa; El-Sayed, Fathi; El-Shafai, Walid; Elsayed Taha, Taha

doi:10.21608/mjeer.2021.195522

Efficient Fusion of Medical Images Based on CNN

Document Type : Original Article

Authors

¹ Department of Electronics and Electrical Communications Engineering Faculty of Electronic Engineering Menoufia University: Menouf, Egypt

² Department of Electronics and Electrical Communications, Faculty of Electronic Engineering, Menoufia University, 32952, Menouf, Egypt.

³ Department of Electronics and Electrical Communication Engineering, Menoufia University, Menouf, Menoufia, Egypt

10.21608/mjeer.2021.195522

Abstract

Fusion ofimages has an important role in medical image diagnosis. Medical imaging systems have some limitations due to degradations and noise. Hence, it is necessary to apply some techniques such as image fusion to integrate information from different image modalities into the fusion result. With this strategy, the image features will be more defined, and this in turn, helps for better diagnosis. This paper presents an image fusion technique for different kinds of images to enhance the quality of brain tumor images. This technique depends on convolutional neural network (CNN) with more layers to obtain high-quality images for better diagnosis. The learning process of the CNN helps to measure the activity level and fusion ratio. The obtained fusion resultsare compared with thoseof traditional fusion methods. Different quality evaluation metrics are used in the assessment of the proposed technique.

Highlights

This paper introduced a complete optimized fusion system for achieving image fusion by learning a CNN. The obtained results are good, when compared with those of other techniques.In addition, the processing time is low. So, the proposed techniqueis a recommended solution to image fusion for images of different kinds like medical images of different modalities.

Keywords

Main Subjects

Telecommunications and Signal Processing

Full Text

Medical imaging is a new trend in the last decades that can help specialist for better diagnosis. Both Magnetic Resonance (MR) and Computed Tomography (CT) types of images have different characteristics and limitations. For example, MR images contribute much more details about soft tissues. On the other hand, CT images give information about bone structures [1]. The main limitations here are the absence of information about both soft tissues and dense layers. That is why the fusion of registered MR and CT images is necessary for information integration.
We use image fusion as a tool to get useful information from medical images. Generally, MR images are rich in information about soft tissues, while CT images are rich in information about dense tissues. Hence, the fusion of both types of images is necessary to obtain complementary information from both of them in a single image [2]. Table (1) illustrates a comparison between MR and CT images.
Different techniques have been presented for MR and CT image fusion. Wavelet transform has been used for multi-modality medical image fusion. It depends on the decomposition of both images with different scales. Fusion is performed using a certain fusion role such as maximum and average fusion rules. The wavelet fusion with maximum fusion role reduces any possible amount of blurring in each observation.
Unfortunately, wavelet fusion cannot deal with curved lines, correctly. Hence, other transforms such as curvelet transform can be used for medical image fusion.
The algorithm of curvelet fusion of MR and CT images [3] divides the images into tiles and appliesAdditive Wavelet Transform (AWT) for decomposition. Maximum fusion rule has been adopted in this work.Interpolation is utilized after the fusion process to obtain High Resolution (HR) images. Different interpolation techniques including LMMSE, maximum entropy and regularized interpolation have been used for this purpose.
Wavelet and curvelet fusion are traditional fusion techniques,which do not consider the local image activity levels in the fusion process. Hence, there is a need to use sophisticated image fusion tools such as deep neural networks to accommodate for the image local activity levels.

Fusion of medical images helps for disease diagnosis. So, the fusion should be performed, carefully. The resolution and details of medical images should be high [4]. The most important characteristics of the organ of interest should be represented in the fused images, and our rule is to select the best fusion technique.
2. Preliminaries
2.1 Convolutional Neural Network
A CNN is a trainable multi-stage neural network, which is fed-forward. Neuron is a coefficient or unit in the feature map. Convolution, activation, and spatial-domain pooling areused to create the feature maps. Shared weights, sub-sampling and local receptive fields are the basic building blocks of CNNs [5]. The input neurons are basically the pixel intensities of the input image. Each neuron isconnectedto the local receptive field (region in the input image) [6]. Spatially-invariant feature maps are generated from the convolutional kernels [7]. Mathematically, xi is

the i-th feature map input and yj is the j-th feature map output of the convolutional layer. The ReLU activation function used in the CNN is a non-linear function explained as follows [8]:

where bj is the bias and kij is the convolutional kernel between xi and yj . A convolution matrix is a small matrix used for edge detection, sharpening, blurring and embossing. This is done by a convolution between an image and a kernel. The symbol ∗ indicates convolution operation. Another type of feature extraction is the pooling, which is used to decrease the size of the image by combining neighboring pixels into a single pixel. Average and max- pooling are operations performed in the CNN.
2.2 CNNs for Image Fusion
In the fusion of images, the focus map generation can be viewed as a classification task [9]. Fusion with a CNN is similar to classification. Spatial-domain fusion methods are performed by the measurement of the activity levels to extract the high-frequency details by designing the local filters. In a CNN model, the basic operation is the convolution [10]. Fusion based on CNN surpasses existing fusion methods, as it overcomes the difficulties in designing filters to extract effective features. Learning of a CNN is performed to allow activity level measurement and fusion. So,the fusion method based on the CNN gives higher quality results than those of traditional fusion methods.
3- Proposed CNN-Based Medical Image Fusion Method
The CNN-based medical image fusion method is illustrated in Fig. 2. We used two input images (A and B). The proposed technique goes through four stages: focusing detection, segmentation, consistency verification and fusion.
In the first step, we input the two images (A and B) to a CNN trained before to get a score map. The score map includes the focus information of the two input images. The average overlapping in the score map gives the focus map with the same size as that of the two input images.
In the second step, we transform the focus map into a binary map with a threshold level of 0.5.
In the third step, we perform removal of the small regionsthrough a filtering strategy to refine the binary segmented map.

In the fourth step, consistency verification is implemented through pixel-wise weighted averaging to get the fused image with the final decision map.
3.1 Design of the Network
The input images have a variable spatial size. Images aredivided into overlapping patches, and then we apply a sliding window technique.

As the input and output data of the fully-connected layers have fixed dimensions, to solve this problem, we perform reshaping of parameters after the conversion to feed the fully-connected layer from the convolutional layer. The network contains max-pooling and convolutional layers and can treat input images with different sizes to get the dense predictions [11]. The score map output is obtained, and it shows the focus property for the patch pairs of the input images to the network. In [12], to compare the similarity of the patches, weuse a Siamese model for its advantages over other models, including 2-channeland pseudo-Siamese models. Siamese model is easier to train in image fusion, leading to an easy convergence.
The batch size selection is a vital problem in any classification problem. Large batch sizes of size32 × 32 are appropriate for getting high classification performance. Unfortunately, the process may include focused and unfocused regions of images. Hence, there is a difficultly in the determination of the number of max-pooling layers.
On the other hand, small patch sizes are not also appropriate as much details are missed. The solution that we will adopt in this paper is a patch size of 16 × 16. Max pooling is implemented to reduce the amount of features used in the classification process. Fig.3 shows the proposed CNN model. The network consists of a single max-pooling layer, and three convolutional layers. The convolutional layer kernel size and stride are set to 3 × 3 and 1, respectively. The max-pooling layer kernel size and the stride are set to 2 × 2 and 2, respectively. After converting the fully-connected layers into convolutional ones inthetest/fusion process, the network can be fed with two input images with arbitrary sizes to generate adense score map. When the input images are of size H × W, the map size of the output score is , whereis the ceiling operation. Fig. 3 shows the obtained score map. The output score in the score map keeps the patch size of the input images to be 16 × 16 to go through the different layers of the network.

3.2 Fusion Model
We have 2 input images A and B in gray-scale format. When feeding the input images to the trained CNN model, a score map S is obtained. The values of the score map range from 0 to 1. When the value is close to 1 or 0, the patch is more focused to the input image A or B. The focus map M has the same size as the input image, and it is generated by assigning the score map S coefficient values within the corresponding patch in M and averaging the overlapping pixels. Regions with more details are close to 1 (white) or 0 (black), while the regions are close to 0.5 (gray).
“Choose-max” strategy is used to process the focus map M tokeepusefulinformationaccording to a fixed threshold level of 0.5, to geta binary map T. Good performance of the CNN is obtained, when it is applied to classify gray levels in the focus map.

In this paper, the threshold of the input image is set to 0.01 × W × H, where W and H are the width and height of every input image, respectively.
4 Simulation Results and Comparisons
Several simulation experiments are presented on three different cases comprising MR and CT images for fusion.
The proposed fusion technique is implemented to fuse a pair of MR and CT images in each experiment. Different evaluation metrics including average gradient (avg. G), local contrast (local C), standard deviation (STD), and entropy (E) have been considered in the assessment of results. In addition, the visual images obtained are also included. Simulation results of these experiments are presented in Tables (2, 3, 4).
It is clear that the processing time is acceptable, and entropy values are high and close to 8. This means a large amount of information in the fusion results. In addition, much details are found in the fusion results, and this is proved through STD and variance values.

6 Comparison with Other Techniques
6.1 Principal Component Analysis (PCA)
It is a spatial fusion method used to improve the resolution of the image and reducethe redundancy [13]. The spatial domain is the image plane, where some values are specified to different positions (x,y) [14].
6.2 Discrete Wavelet Transform (DWT)
The wavelet transform ofdiscretetype (DWT) is frequently used in medical image fusion. It gives information about the image from all perspectives.
6.3 DT-CWT
Unlike the DWT which combines positive and negative frequencies, the DT-CWT treats positive and negative frequencies separately at each level.
6.4 Proposed Technique
The CNN fusion technique iscomparedwiththePCA[13],and two other traditional fusion techniques.

7 Quantitative Evaluations
A comparison is presented between the proposed technique and the PCA, DWT and DT-CWT techniques,

References

1-H. M. El-Hoseny, S. M. EL-Rabaie, W.A. El-rahman, F.E. Abd-El-Samie, “Medical image fusion techniques based on combined discrete transform domains”, National Radio Science Conference (NRSC). IEEE, pp. 471–480, 2017.
2- K. Parmar, R.Kher, “A Comparative Analysis of Multimodality Medical Image Fusion Methods”, Sixth Asia Modelling Symposium IEEE, pp. 93-97, 2012.
3-F. E. Ali, I. M. El-Dokany, A. A. Saad, and F. E. Abd El-Samie, “Curvelet fusion of MR and CT images”, Progress In Electromagnetic Research C, vol. 3, pp. 215–224, 2008.
4-Y. L. Ping, L. B. Sheng, Z. D. Hua (2007) Novel image fusion algorithm with novel performance evaluation method. Systems Engineering and Electronics, pp.509–513,2007.
5- Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-based learning applied to document recognition”. Proceedings of the IEEE, pp. 2278-2324, 1998.
6-https://www.quora.com/What-is-a-receptive-field-in-a-convolutional-neural-network 20-12-2019.
7-Y. Liu, X. Chen, H. Peng, Z. Wang, “Multi-focus image fusion with a deep convolutional neural network”, Information Fusion, pp. 191-207,36,2017.
8- V. Nair, G. Hinton, “Rectified linear units improve restricted boltzmann machines”, 27th International Conference on Machine Learning, pp.807–814, 2010.
9- S. Li, J. Kwok, Y. Wang,“ Multi focus image fusion using artificial neural networks” Pattern Recognition Letters, pp.985–997, 2002, 23, 8.
10- J. Long, E. Shelhamer, T. Darrell, “Fully convolutional networks for semantic segmentation”, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.3431–3440, (2015).
11- P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. L, “Over Feat: integrated recognition localization and detection using convolutional networks”, arXiv preprint, vol. 4, pp.1–16, (2014).
12- S. Zagoruyko, N. Komodakis, “Learning to compare image patches via convolutional neural networks”, IEEE Conference on Computer Vision and Pattern Recognition, pp. 4353–4361, (2015).
13- S. S. Bedi, “Contrast enhancement for PCA fusion of medical images”, Journal of global research in computer science, pp. 25-29, (2013).
14-https://www.quora.com/What-is-spatial-domain-in-image-processing 20-1-2020.