Spectral Features Based on Bidirectional Long Short-Term Memory for DNA Classification

Document Type : Original Article

Authors

1 dept. Electronics and Electrical Communications Engineering Faculty of Electronic Engineering Menoufia University: Menouf, Egypt

2 dept. of Electronics and Communication Faculty of Engineering, Zagazig University, Zagazig, Egypt Faculty of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia

3 dept. Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University Menouf, Egypt

4 dept. of Computer Science and Engineering, Faculty of Electronic Engineering, Menoufia University Menouf, Egypt

Abstract

the analyzing hidden features of DNA sequence is main challenge in bioinformatics. Since learning from DNA sequences based on analytical approaches to identify the hidden patterns provides a vital role in various genomic applications. In classification tasks, Re-current neural network with Bidirectional Long Short-Term Memory (BLSTM) is usually used for sequential data that is strongly dependent on the feature's extraction stage. Recently, digital signal processing (DSP) techniques such as spectral transformations has been used in genomic data for extracting the hidden features and periodicities within the DNA fragments. The objective of this paper is comparing different spectral transformations of DNA sequences such as Discrete Cosine Transform (DCT), Discrete Fourier Transform (DFT) and Discrete Wavelet Transform (DWT) based on BLSTM to achieve high performance in the taxonomic classification of bacteria. The reason for applying these transformations emerges from its wide and effective use for extracting features, decorrelation, ordering and dimensionality reduction purposes in the fields of speech and image processing. Evaluation metrics such as F1 score and accuracy show that DWT features give surpassing performance compared with other features. 

Keywords


[1]     B. Alberts, “Molecular Biology of the Cell,” 4th ed., New York: Garland Science, 2002.
[2]     J. Aerssens, M. Armstrong, R. Gilissen et al., “The Human Genome: an Introduction,” Oncologist, vol. 6, no. 1, pp: 100-9, 2001.
[3]     Hamidreza Saberkari, Mousa Shamsi, Mohammad Hossein Sedaaghi, Faegheh Golabi, “Prediction of protein coding regions in DNA sequences using signal processing methods,” IEEE Symposium on Industrial Electronics and Applications (ISIEA2012), Bandung, Indonesia, pp. 355 360, 23-26 September 2012.
[4]     Wang, Y. et al., “The Spectrum of Genomic Signatures: from Dinucleotides to Chaos Game Representation” Gene. 346, pp.173–185, 2005.
[5]     Riccardo Rizzo, Antonino Fiannaca, Massimo La Rosa, Alfonso Urso, “Classification  Experiments of DNA Sequences by Using a Deep Neural Network and Chaos Game Representation,” International Conference on Computer Systems and Technologies - CompSysTech’/16, Palermo, Italy, pp. 222-228, 23-24 June 2016.
[6]     Zielezinski A, Vinga S, Almeida J, Karlowski WM, “Alignment-free sequence comparison: benefits, applications, and tools,” Genome Biol;18`(1):186 ; 3Oct 2017.
[7]     Susana Vinga , Jonas Almeida, “Alignment-Free Sequence Comparison- Areview,” Bioinformatics, Vol: 19, Issue: 4, pp. 513–523, 1 Mar 2003.
[8]     Genta Aoki Yasubumi Sakakibara, “Convolutional Neural Networks for Classification of Alignments of Non-coding RNA Sequences,” Bioinformatics, Volume 34, Issue 13, pp: i237–i244, 1 July 2018.
[9]     Christof Angermueller1, Tanel Pärnamaa, Leopold Parts & Oliver Stegle1, “Deep Learning for Computational Biology” Molecular Systems Biology, 29 Jul, 2016.
[10]  Seonwoo, M., Byunghan, L., Sungroh, Y.: Deep learning in bioinformatics. In: Briefings in Bioinformatics (2016).
[11]  Giosu´e Lo Bosco and Mattia Antonino Di Gangi, “Deep Learning Architectures for DNA Sequence Classification,” Fuzzy Logic and Soft Computing Applications, 11th International Workshop, Naples, Italy, pp. 162–171, 07 March 2017.
[12]  P. He, W. Huang, Y. Qiao, C. C. Loy, and X. Tang, “Reading Scene Text in Deep Convolutional Sequences,” CoRR, vol. abs/1506.04395, 2015.
[13]  J. S. Sepp Hochreiter, “Long Short-Term Memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
[14]  Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio, “Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling,” arXiv preprint arXiv: 1412.3555, 2014.
[15]  K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder-decoder for statistical machine translation,” in Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, pp. 1724-1734, 2014.
[16]  Y. Wu, M. Schuster, Z. Chen, Q. V. Le, M. Norouzi, W. Macherey, et al., “Google's neural machine translation system: bridging the gap between human and machine translation,” 2016 [Online]. Available: https://arxiv.org/ pdf/1609.08144.
[17]  N. Yu, X. Guo, F. Gu, and Y. Pan, “Signalign: an ontology of DNA as signal for comparative gene structure prediction using information-coding-and-processing techniques,” IEEE Transactions on Nanobioscience, vol. 15, no. 2, pp. 119-130, 2016.
[18]  Samia M. Abd –Alhalem, Naglaa F. Soliman, Salah Eldin S. E. Abd Elrahman, Nabil A. Ismail, El-Sayed M. El-Rabaie, and Fathi E. Abd El-Samie “Bacterial classification with convolutional neural networks based on different data reduction layers” Nucleosides, Nucleotides and Nucleic Acids, 16 Aug 2019.
[19]  Cheever EA, Overton GC, Searls DB. Fast Fourier Transform-Based Correlation of DNA Sequences using Complex Plane Encoding,” Comput Appl Biosci, 7(2), pp.143-154, Apr 1991.
[20]  Grabherr MG, Russell P, Meyer M, Mauceli E, Alfldi J, Di PF, Lindblad-Toh K., “Genome-wide synteny through highly sensitive sequence alignment: Satsuma,” Bioinformatics, 26(9), pp. 1145–1151, 1 May 2010.
[21]  Shilpi Chakraborty, Vinit Gupta, “DWT based Cancer Identification using EIIP,”  IEEE Second International Conference on Computational Intelligence & Communication Technology, pp. 718-723 , 2016.
[22]  Liu Y., “Wavelet Feature Selection for Microarray Data,” Proceedings of the IEEE/NIH Life Science Systems and Applications Workshop, pp. 205-208, 2007.
[23]  Rui Wu, Shuli Yang, Dawei Leng, Zhenbo Luo,Yunhong Wang, “Random Projected Convolutional Feature for Scene Text Recognition” 15th International Conference on Frontiers in Handwriting Recognition, IEEE, Shenzhen, China, pp. 132-137, 23-26 Oct. 2016.
[24]  Https://rdp.cme.msu.edu.(Access date 11 May 2018).
[25]  Tsonis AA, Kumar P., “Wavelet Analysis of DNA Sequences,” Phys Rev E Stat Phys Plasmas Fluids Relat Interdiscip Topics, 53(2), pp.1828-1834, Feb 1996.
[26]  Haimovich AD, Byrne B, Ramaswamy R, Welsh WJ., “Wavelet Analysis of DNA Walks,” J Comput Biol, 13(7), pp.1289-1298.,  Sep 2006.
[27]  Nanni L, Brahnam S, Lumini A., “Combining Multiple Approaches for Gene Microarray Classification,” Bioinformatics, pp.1151-1157, 15 Apr 2012.
[28]  Yasen Jiao and Pufeng Du,” Performance Measures in Evaluating Machine Learning Bioinformatics Predictors for Classifications” Higher Education Press and Springer-Verlag Berlin Heidelberg, pp: 320-330 October 21, 2016.
Volume 28, ICEEM2019-Special Issue
ICEEM2019-Special Issue: 1st International Conference on Electronic Eng., Faculty of Electronic Eng., Menouf, Egypt, 7-8 Dec.
2019
Pages 183-188