{ Analytica Model Taxi_MDP, encoding="UTF-8" } SoftwareVersion 5.4.6 { System Variables with non-default values: } Time := 0..20 SampleSize := 1000 TypeChecking := 1 Checking := 1 SaveOptions := 2 SaveValues := 0 {!-50299|DiagramColor Model: 65535,65535,65535} {!-50299|DiagramColor Module: 65535,65535,65535} {!-50299|DiagramColor LinkModule: 65535,65535,65535} {!-50299|DiagramColor Library: 65535,65535,65535} {!-50299|DiagramColor LinkLibrary: 65535,65535,65535} {!-50299|DiagramColor Form: 65535,65535,65535} NodeInfo FormNode: 1,0,0,,0,0,,,,0,,,0 {!-50299|NodeColor Text: 62258,62258,62258} Model Taxi_MDP Description: An example of a Markov Decision Problem. A taxi driver is in one of 3 towns. At each town, he had three possible actions. The optimal policy figures out which action is best in each town.~ ~ Two methods for finding a policy are given. Value iteration is demonstrated for a finite horizon problem (i.e., you go home at the end). Policy iteration is demonstrated for an infinite horizon problem. (Note: The discount rate must be >0 for the infinite horizon case).~ ~ Example borrowed from slides from Dennis Bricker at University of Iowa. Author: Dale Rice~ Lumina Decision Systems Date: Wed, Sep 8, 2021 3:21 PM DiagState: 2,0,0,756,352,17 WindState: 2,145,525,720,350 FontStyle: Arial,15 FileInfo: 0,Model Taxi_MDP,2,2,0,0,W:\TestModels\Taxi MDP.ana Index Town Title: Town Definition: ['A','B','C'] IndexVals: ['item 1'] NodeLocation: 136,80,1 NodeSize: 48,24 Att_PrevIndexValue: ['A','B','C'] Index Next_town Title: Next town Definition: CopyIndex(Town) NodeLocation: 136,144,1 NodeSize: 48,24 Att_PrevIndexValue: ['A','B','C'] Constant Transition_Prob Title: Transition Prob Definition: Table(Action,Town,Next_town)(~ 1/2,1/4,1/4,~ 1/2,0,1/2,~ 1/4,1/4,1/2,~ 1/16,3/4,3/16,~ 1/16,7/8,1/16,~ 1/8,3/4,1/8,~ 1/4,1/8,5/8,~ 0,1,0,~ 3/4,1/16,3/16) NodeLocation: 264,144,1 NodeSize: 48,24 DefnState: 2,418,232,899,426,0,DFNM ValueState: 2,279,583,416,303,,MIDM Aliases: FormNode Fo643386947 ReformDef: [Next_town,Town] ReformVal: [Next_town,Town] Att__TotalsIndex: [Next_town] Att_EditSliceState: [Action,3,Town,1,Next_town,1] Index Action Title: Action Definition: ['Cruising streets','Waiting at taxi stand','Waiting for dispatch'] NodeLocation: 136,208,1 NodeSize: 48,24 ValueState: 2,44,424,416,303,,MIDM Att_PrevIndexValue: ['Cruising streets','Waiting at taxi stand','Waiting for dispatch'] Constant Payoff Title: Payoff Definition: Table(Town,Next_town,Action)(~ 10,8,4,~ 4,2,6,~ 8,4,8,~ 14,8,0,~ 0,16,0,~ 18,8,0,~ 10,6,4,~ 2,4,0,~ 8,2,8) NodeLocation: 264,208,1 NodeSize: 48,24 DefnState: 2,125,304,416,303,0,DFNM ValueState: 2,389,640,416,303,,MIDM Aliases: FormNode Fo1717128771 ReformDef: [Next_town,Town] Att_EditSliceState: [Action,3,Town,1,Next_town,1] Constant Discount_rate Title: Discount rate Definition: 50% NodeLocation: 264,280,1 NodeSize: 48,24 Aliases: Alias Al1730235971, FormNode Fo374951491 NumberFormat: 2,%,4,2,0,0,4,0,$,0,"ABBREV",0,,,0,0,15 Module Finite_horizon_disco Title: Finite horizon discounted value iteration Description: Finite horizon means that the driver ends eventually -- counted in terms of number of rides remaining. At the end he'll have to go home, so an extra Cost_to_go_home is added. Author: Dale Rice~ Lumina Decision Systems NodeLocation: 432,72,1 NodeSize: 56,56 NodeInfo: 1,0 DiagState: 2,988,435,636,378,17 WindState: 2,186,427,720,350 Index Remaining_cycles Title: Remaining cycles Definition: 0..10 NodeLocation: 104,48,1 NodeSize: 48,24 Constant Cost_to_go_home Title: Cost to go home Definition: Table(Town)(0,1,2) NodeLocation: 224,48,1 NodeSize: 48,24 DefnState: 2,664,318,416,303,0,DFNM Aliases: FormNode Fo1312378435 Variable Q Title: Q Definition: Dynamic[Remaining_cycles](~ -Cost_to_go_home,~ sum(Transition_Prob*(Payoff + (1-Discount_rate) * V[Remaining_cycles-1][Town=Next_Town]),Next_Town)~ ) NodeLocation: 96,152,1 NodeSize: 48,24 ValueState: 2,487,112,698,545,1,MIDM ReformVal: [Remaining_cycles,Action] Att_ResultSliceState: [Town,3,Action,2,Remaining_cycles,1] Variable V Title: V Definition: max(Q,Action) NodeLocation: 232,216,1 NodeSize: 48,24 ValueState: 2,356,315,583,391,1,MIDM Decision Best_action Title: Best action Definition: if Remaining_cycles=0 then null else~ argmax(Q,action) NodeLocation: 96,224,1 NodeSize: 48,24 ValueState: 2,70,516,836,239,,MIDM Aliases: FormNode Fo2002669123 FormNode Fo2002669123 Title: Best action Definition: 1 NodeLocation: 440,224,1 NodeSize: 88,16 Original: Best_action FormNode Fo1312378435 Title: Cost to go home Definition: 0 NodeLocation: 444,88,1 NodeSize: 108,16 Original: Cost_to_go_home Close Finite_horizon_disco Module Infinite_horizon_dis Title: Infinite horizon discounted policy iteration Description: Policy iteration to compute the optimal policy on an infinite horizon. Discount rate can't be zero. Author: Dale Rice~ Lumina Decision Systems NodeLocation: 576,72,1 NodeSize: 56,56 DiagState: 2,655,25,732,405,17 WindState: 2,248,503,720,350 Decision Policy Title: Policy Definition: Iterate( Starting_policy, Better_action, until:Policy=Better_action) NodeLocation: 128,128,1 NodeSize: 48,24 DefnState: 2,817,596,416,303,0,DFNM ValueState: 2,1282,510,416,303,,MIDM Variable Trans_prob_for_Polic Title: Trans prob for Policy Definition: Transition_Prob[Action=Policy] NodeLocation: 256,128,1 NodeSize: 48,24 ValueState: 2,49,632,416,303,,MIDM ReformVal: [Next_town,Town] Variable Payoff_for_policy Title: Payoff for policy Definition: Payoff[Action=Policy] NodeLocation: 256,184,1 NodeSize: 48,24 ValueState: 2,288,389,416,303,,MIDM ReformVal: [Next_town,Town] Alias Al1730235971 Title: Discount rate Definition: 1 NodeLocation: 256,272,1 NodeSize: 48,24 Original: Discount_rate Variable One_step_payoff Title: One step payoff Definition: Sum(Trans_prob_for_Polic*Payoff_for_policy, Next_Town) NodeLocation: 392,128,1 NodeSize: 48,24 ValueState: 2,895,640,416,303,,MIDM Variable Linkage Title: Linkage Definition: Invert((Town=Next_Town) - (1-Discount_rate) * Trans_prob_for_Polic,~ Town, next_town) NodeLocation: 392,192,1 NodeSize: 48,24 ValueState: 2,213,590,416,303,,MIDM Variable V_s_for_policy Title: V_s for policy Definition: MatrixMultiply(Linkage,Town,Next_town, One_step_payoff,Town) NodeLocation: 520,160,1 NodeSize: 48,24 ValueState: 2,709,604,416,240,,MIDM Variable Q_s_for_policy Title: Q_s for policy Definition: Sum(Transition_Prob * (Payoff + Discount_rate * V_s_for_policy), Next_town) NodeLocation: 520,232,1 NodeSize: 48,24 ValueState: 2,159,164,416,303,,MIDM Variable Better_action Title: Better action Definition: argmax(Q_s_for_policy,Action) NodeLocation: 640,232,1 NodeSize: 48,24 ValueState: 2,1426,91,416,303,,MIDM Constant Starting_policy Title: Starting policy Definition: Random(ChanceDist(1, Action, over:Town)) NodeLocation: 128,48,1 NodeSize: 48,24 ValueState: 2,1287,484,416,303,,MIDM Close Infinite_horizon_dis FormNode Fo643386947 Title: Transition Prob Definition: 0 NodeLocation: 592,184,1 NodeSize: 104,16 Original: Transition_Prob FormNode Fo1717128771 Title: Payoff Definition: 0 NodeLocation: 592,216,1 NodeSize: 104,16 Original: Payoff FormNode Fo374951491 Title: Discount rate Definition: 0 NodeLocation: 592,248,1 NodeSize: 104,16 Original: Discount_rate Close Taxi_MDP