Consider a single-server system where there can be at most 2 customers in the system (including…
Consider a single-server system where there can be at most 2
customers in the system (including the one being served). In each
hour, a new customer enters to the system with probability 1/2
unless there are already 2 customers in the system. Assume that new
arrival occurs at the end of each hour. At the beginning of each
hour, the server can decide a configuration if there is a customer
in the system. If the configuration is fast, with probability 0.8,
one customer is served and he/she leaves the system in a given
hour. On the other hand, if the configuration is slow, this
probability decreases to 0.6. 50 TL revenue is obtained for
each customer whose service is completed. The costs of slow and
fast configurations are 5 and 9 TL per hour, respectively. The
hourly discount rate is ß = 0.9. We would like to maximize total
expected discounted profit over an infinite horizon.
a) Formulate the problem as MDP model by defining states,
decision sets, transition probabilities and expected rewards
clearly.
b) Find the optimal policy using Policy Iteration where the
initial policy is to use slow configuration whenever there is at
least one customer in the system.
c) Write (but not solve) the linear programming program whose
solution can be used to find the optimal policy. What are the
optimal values of your variables? Which constraints are binding in
this program?