非负费用折扣半马氏决策过程

doi:10.12386/A2010sxxb0056

PDF(550 KB)

数学学报 ›› 2010, Vol. 53 ›› Issue (3) : 503-514. DOI: 10.12386/A2010sxxb0056

论文

非负费用折扣半马氏决策过程

黄永辉, 郭先平

作者信息 +

Discounted Semi-Markov Decision Processes with Nonnegative Costs

Yong Hui HUANG, Xian Ping GUO

Author information +

文章历史 +

摘要

本文考虑可数状态非负费用的折扣半马氏决策过程.首先在给定半马氏决策核和策略下构造一个连续时间半马氏决策过程,然后用最小非负解方法证明值函数满足最优方程和存在ε-最优平稳策略,并进一步给出最优策略的存在性条件及其一些性质. 最后,给出了值迭代算法和一个数值算例.

Abstract

This paper deals with discounted semi-Markov decision processes with countable states and nonnegative costs. We first construct a continuous-time semi-Markov decision process under a given semi-Markov decision kernel and each policy. Then, we prove that the value function satisfies the optimality equation and there exists an ε-optimal stationary policy by using a minimum nonnegative solution approach, and further give conditions for the existence of optimal policies as well as some properties of optimal policies. Finally, a value iteration algorithm for computing the value function is developed and a numerical example is given.

导出引用

黄永辉, 郭先平. 非负费用折扣半马氏决策过程. 数学学报, 2010, 53(3): 503-514 https://doi.org/10.12386/A2010sxxb0056

Yong Hui HUANG, Xian Ping GUO. Discounted Semi-Markov Decision Processes with Nonnegative Costs. Acta Mathematica Sinica, Chinese Series, 2010, 53(3): 503-514 https://doi.org/10.12386/A2010sxxb0056

参考文献

[1] Wessels J., van Nunen J. A. E. E., Discounted semi-Markov decision processes: linear programming and policy iteration, Statistica Neerlandica., 1975, 29: 1--7.

[2] Dong Z. Q., Liu K., Structure of optimal policies for discounted semi-Markov decision programming with unbounded rewards, Science in China Series A: Mathematic, 1985, 11: 975--985 (in Chinese).

[3] Dong Z. Q., Song J. S., A secondary approach to the discounted model in semi-Markov decision processes, Kexue Tongbao (English Ed.), 1988, 33: 448--454.

[4] Wakuta K., Arbitrary state semi-Markov decision processes with unbounded rewards, Optimization, 1987, 18: 447--454.

[5] Puterman M. L., Markov Decision Processes: Discrete Stochastic Dynamic Programming, New York: John Wiley & Sons. Inc., 1994.

[6] Hu G. H., Zhang S., Discounted semi-Markov decision processes with a constraint, Acta Math. Appl. Sinica, 1997, 20: 187--195 (in Chinese).

[7] Hu Q. Y., Discounted semi-Markov decision process in a semi-Markov environment, Optimization, 1997, 39: 367--382.

[8] Lippman S. A., Semi-Markov decision processes with unbounded rewards, Management Sci., 1973, 19: 717--731.

[9] Lippman S. A., On dynamic programming with unbounded rewards, Management Sci., 1975, 21: 1225--1233.

[10] Doshi B. T., Generalized semi-Markov decision processes, J. Appl. Prob., 1979, 16: 618--630.

[11] Sennott L. I., Stochastic Dynamic Programming and the Control of Queuing Systems, New York: John Wiley & Sons. Inc., 1999.

[12] Hernández-Lerma O., Lasserre J. B., Discrete-Time Markov Control Processes: Basic Optimality Criteria, New York: Springer-Verlag, 1996.

[13] Hernández-Lerma O., Lasserre J. B., Futher Topics on Discrete-Time Markov Control Processes, New York: Springer-Verlag, 1999.

[14] Guo X. P., Hernández-Lerma O., Continuous-time controlled Markov chains, Ann. Appl. Prob., 2003, 13: 363--388.

[15] Guo X. P., Hernández-Lerma O., Continuous-time controlled Markov chains with discounted rewards, Acta. Appl. Math., 2003, 79: 195--216.

[16] Guo X. P., Dai Y. L., The unbounded cost discounted model for continuous time Markov decision processes, Acta Mathematica Sinica, Chinese Series, 2002, 45(1): 171--182.

[17] Lin Y. L., Continuous time discounted Markov decision model with moment optimality criterion and the relationship between it and the discrete time quasi-discounted model-Q matrix is necessarily conservative, Acta Mathematica Sinica, Chinese Series, 1992, 35(1): 8--19.

[18] Limnios N., Oprisan J., Semi-Markov Processes and Reliability, Boston: Birkháuser, 2001.

[19] Ross S. M., Average cost semi-Markov decision processes, J. Appl. Prob., 1970, 7: 649--656.

基金

国家自然科学基金资助项目(60874004, 10925107)

PDF(550 KB)

397

Accesses

Citation

Detail

段落导航

摘要
Abstract
关键词
Key words
引用本文
参考文献
基金

收稿日期	修回日期	出版日期
2008-11-27	2009-10-27	2010-05-15
发布日期
2010-05-30

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金

扫码分享

模态框（Modal）标题

选择文件类型/文献管理软件名称

选择包含的内容

摘要

Abstract

关键词

Key words

引用本文

{{custom_sec.title}}

{{custom_sec.title}}

参考文献

{{custom_fnGroup.title_cn}}

脚注

基金