A MODIFIED REINFORCEMENT LEARNING APPROACH FOR CONTROLLING THE MULTIVARIABLE MEAN ARTERIAL BLOOD PRESSURE

Document Type : Original Article

Author

Faculty of Electronic Engineering, Menoufia University, Menoufia, Egypt

Abstract

The original reinforcement learning scheme comprises two networks, one performs a controller and the other stands for an evaluator. Based on temporal difference predictive techniques, the evaluative network predicts an external reinforcement signal and estimates a more informative internal signal to adapt a set of  parameters of the controller. This paper introduces a modified reinforcement learning scheme to simplify the original  scheme. The mean theme of the proposed scheme is that it replaces the fuzzy neural network (FNN) evaluator by a simple evaluator that depends directly  on the environment of the process be controlled. Based on a performance index, the proposed evaluator outputs a reinforcement signal in the range of [-1,1] using two methods, one is fuzzy and the other is a  discrete uniform  of reinforcement signals. Compared with the original reinforcement scheme, the computational demand of the proposed scheme is relatively insignificant. That is that no parameters/structure learning is needed for the proposed evaluator.  That makes the proposed reinforcement scheme suitable for controlling real time intensive processes. The mean features of the proposed scheme are reflected in our simulation results on controlling the multivariable mean arterial blood pressure.

Keywords


[1] Awad, H. A., 1999, Fuzzy neural networks for modeling and controlling dynamic systems. PhD. Thesis, Intelligent Systems Laboratory, School of Engineering, CardiffUniversity, Cardiff, UK.
[2] Barto, A. G., and Anandan P., 1985, Pattern-recognizing  stochastic learning automata.  IEEE Trans. Syst. Man. Cybern., SMC-15, (3), 360-375.
[3] Barto, A. G., and Jordan, M. I., 1987, Gradient following without  back-propagation in layered network. in Proceeding’87,  International Joint Conference on  Neural Network,San Diego, CA, II, 629-636.
[4] Barto, A. G., and Sutton R. S., 1981, Landmark learning: an illustration of associative search. Biol. Cybern., 42, 1-8.
[5] Barto, A. G., Sutton, R. S., and Andeson, C. W., 1983, Neuronlike adaptive element that can solve difficult learning problems. IEEE Trans. Syst. Man. Cybern., SMC-13, (5), 834-847. 
 
[6] Batur C., Srinivasan, A., and Chan, C.-C., 1995, Fuzzy model-based fuzzy predictive controllers Journal of Intelligent & Fuzzy Systems, 3, 117-130.
[7] Carpenter, G. A., Grossberg, S., and Rosen, D. B., 1991, Fuzzy ART: fast stable learning and categorization of  analog patterns by an adaptive resonance system. Neural Networks, 4, 759-771.
[8] Chen, C-T, Lin, W-L, Kuo, T-S, and Wang, C-Y, 1997, Adaptive Control of Arterial Blood Pressure with Learning Controller Based on Multilayer Neural Networks. IEEE Transactions on Biomedical Engineering,  44, no. 7.
[9] Estes, W. K., 1950, Towards a statistical theory of learning.  Psych. Rev, 57,  94-107.
[10] Hinton, G. E., 1989, Connectionist learning procedure.  Art.  Intell., 40, 143-150.         
[11] Jouffe, L., 1998, Fuzzy inference system learning by reinforcement methods. IEEE Transactions On Systems, Man, And Cybernetics-Part C: Applications and reviews, 28,  338-355.
[12] Klopf, A. H., 1982, The hedonistic neuron: A theory of memory learning and inelegance. Washington, DC Hemisphere.
[13] Koutb, M. A., El-Rabaie, N. M., Awad, H. A., and El-Hamid, I. A., 2004, Environment control for plants using intelligent control systems. Fifth International workshop on artificial intelligence in agriculture, IFAC, 101-106, Cairo, Egypt.
[14] Lee, C.-C., and Berenji, H. R., 1989, An intelligent controller based on approximate reasoning and reinforcement learning.  in Proceeding’89, IEEE 4th  International  Symposium  Intelligent Control., Albany, N.Y.,   200-205.
[15] Lin, C-T and Lu, Y-C, 1995, A neural fuzzy system with linguistic teaching signals. IEEE Transactions  on Fuzzy Systems, 3, (2), 169-189.
[16] Lin, C-T, and Lee, C. S. G., 1991, Neural-network based fuzzy logic control and decision system. IEEE Transactions on Computers,. 40, (12), 1320-1336.
[17] Lin, C-T, and Lee, C. S. G., 1994, Reinforcement structure/parameter learning for neural network based fuzzy logic control system. IEEE. Trans. Fuzzy. Syst, 2, (1), 46-63 . 
[18] Majumdar, K. K., and Majumdar, D.D., 2004, Fuzzy differential inclusions in atmospheric and medical cybernetics. IEEE Transaction On Systems, Man, and Cybernetic-Part-B :Cybernetics, 34, 877-887.
[19] Mamdani, E. H., 1974, Application of fuzzy algorithms for control of simple dynamic plant. Proceeding IEE, 121, 1585-1588.
[20] Narendra, K. S., and Thathachar, M. A. L., 1989, Learning Automata :An introduction,Englewood  Cliffs, (NJ: Prentice Hell).
[21] Linkens, D. A, and Nie, J., 1992-a, “A unified real time approximate reasoning approach for use in intelligent control: Application to multi-variable blood pressure control, ”Int. J. Control, 56,  334-363.
[22] Nie, J., and Linkens, D. A., 1994, “FCMAC: A fuzzified Cerebellar model articulation controller with self-organizing capacity, ”Automatica, 30,  655-664.
[23] Nie, J., and Linkens, D. A., 1995, Fuzzy-neural control: Principles, algorithms, and applications.UK, London, (Prentice Hall), ISBN 0-13-337916-7.
[24] Pajunen, G. A., Steinmetz, M., and Shankar, R., 1990, Model reference adaptive control with constraints for postoperative blood pressure management IEEE Transactions on Biomedical  Engeering,  37, 679-686.
[25] Rumelhart, D. E., Hinton, G.  E., and Williams, R. J., 1986 , Learning internal representation by error propagation. Parallel Distrib. Proces. Cambridge, MA: M.I.T. press, 1, 318-362.
[26] Shappard, L. C., Shot, J. F., Roberson, N. F., Wallace, F. D., and Kouchoukos, N. T., 1979, Computer controlled infusion of vasoactive drugs in post-cardiac surgical patients, Proceeding IEEE-EMBS Conference,  280-284.
[27] Slate, J.B, and Sheppard, L.C, 1982, Automatic control of blood pressure By using drug infusion Proceeding  IEE.,  129, no. 9.
[28] Sutton, R. S., 1988, Learning to predict by the methods of temporal difference. Machine Learning, 3, 9-44. 
[29] Takagi, T., and Sugeno, M., 1985, Fuzzy identification of systems and its applications to modeling and control. IEEE Transactions On Systems, Man, And Cybernetics,. SMC-15, 116-132.
[30] Williams, R. J., 1987, A class of gradient-estimating algorithms for reinforcement learning  in neural network.  in Proceeding’87,  International Joint Conference on  Neural Network,San Diego, CA, II,  601-608.
[31] Zadeh, L. A.,  1965, Fuzzy sets. Information and Control, 8, 338-353.
[32] Clarke, D. W., Mohtadi, C. and Tuffs, P. S., 1987, "Generalized predictive control- Part-I: The basic algorithm," Automatica, 23, 137-148.