Draft, December 2013
CONTINUOUS TIME GAME THEORY: AN INFINITESIMAL
APPROACH
MAXWELL B. STINCHCOMBE
1
Contents
1. Introduction and Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1. Equilibrium Refinement via Infinitesimals . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2. Compact and Continuous Normal Form Games . . . . . . . . . . . . . . . . . . . . . . .
1.2.1. Limit games and limit equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.2. The need for exhaustiveness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2.3. Respecting weak dominance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3. Extensive Form Games with Infinite Action Sets . . . . . . . . . . . . . . . . . . . . . .
1.4. Control Problems and Differential Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.1. An Easy Control Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.2. A Less Easy Control Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4.3. Differential Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5. Continuous Time Stochastic Processes and Diffuse Monitoring . . . . . . . .
1.5.1. Brownian monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.2. Poisson monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.5.3. Levy processes and monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.6. Continuous Time Game Theory with Sharp Monitoring . . . . . . . . . . . . . . .
1.7. Large Population Games. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2. Constructing Infinitesimals and the Tools to Use Them . . . . . . . . . . . . . . . . . . .
2.1. A Purely Finitely Additive Point Mass. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2. The Equivalence Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3. Normal Form Equilibrium Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.1. Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.2. Perfection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.3. Properness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.3.4. p-Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.4. The Basic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.5. Compact and Continuous Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3. Extensive Form Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.1. Decision Theory with Full Support Probabilities . . . . . . . . . . . . . . . . . . . . . .
3.2. Dynamic Decisions, the Basic Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.3. Bridge Crossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.4. Heirarchies of Beliefs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5. Extensive Form Equilibrium Refinement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3.5.1. Perfect, proper, and stable equilibrium outcomes . . . . . . . . . . . . . . . . . .
3.5.2. Iterative dominance arguments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4. Continuous Time Control and Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1. Control Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.1. When it Works Wonderfully . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.1.2. When it Works Less Well . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2. Games with Instantaneous Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1. (R, ≤) is Totally Ordered but Not Well-Ordered . . . . . . . . . . . . . . . . . .
4.2.2. Implications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3. Games Played on a Near Interval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
6
6
7
8
8
8
8
8
8
9
10
10
10
10
11
11
11
12
12
13
13
13
14
14
14
14
15
15
15
15
16
16
17
17
17
17
17
17
17
18
18
18
19
4.3.1. Actions, Histories, Strategies, and Outcomes . . . . . . . . . . . . . . . . . . . . . .
4.3.2. Safety in Continuously Repeated Games . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.3. Some Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.4. Revisiting Cournot vs. Bertrand . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.5. Preemption Games . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.3.6. Wars of Attrition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.4. Brownian Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.5. Poisson Monitoring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.6. Continuous Time Martingales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.7. Itô’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5. Standard and Nonstandard Superstructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.1. Purely Finitely Additive Point Masses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2. The equivalence relation ∼µ and ∗ X . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3. Superstructures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4. Defining V (∗ S) inductively . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5. Internal Sets for Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.6. Some External Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.7. Statements and the Transfer Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6. Some Real Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1. Closed Sets and Closure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.1. The standard part mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1.2. Closedness of Refined Sets of Equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2. Continuity and Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.1. C(X; R), X compact . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.2. Near continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.3. The Riemann-Stieltjes Integral. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.2.4. Some near interval control theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3. Theorem of the Maximum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.1. In Control Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.2. Single person problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.3. Limit games and limit equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4. Compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.1. Existence of optima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.2. Existence of equilibria . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.3. Existence of extened equilibrium outcomes . . . . . . . . . . . . . . . . . . . . . . . .
6.4.4. Compact sets of probabilities on R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5. Probabilities on Metric Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5.1. Loeb Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5.2. Riesz representation theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.5.3. Denseness of finitely supported probabilities . . . . . . . . . . . . . . . . . . . . . .
6.5.4. Tightness and compactness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6. Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6.1. Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.6.2. The implicit function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
20
21
22
23
23
25
28
28
28
28
28
28
29
29
29
30
30
31
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
32
33
33
33
33
33
33
33
33
33
33
6.6.3. Lebesgue’s density theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.7. Completions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8. A Duality Approach to Patience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8.1. Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8.2. Infinite Patience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8.3. Continuous Linear Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8.4. Star-finitely Supported Probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8.5. Preferences with a lim inf Representation . . . . . . . . . . . . . . . . . . . . . . . . .
6.8.6. Concave F -Patient Preferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.8.7. Preferences with a lim inf-Average Representation . . . . . . . . . . . . . . . .
6.9. HERE LIE DRAGONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.10. Some Hints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.11. Related Readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7. Moderately Elementary Stochastic Process Theory . . . . . . . . . . . . . . . . . . . . . . . .
7.1. Collections of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.1. Some Examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.1.2. Time Paths. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2. Monitoring and Likelihood Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3. Monitoring Volatilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.4. A Brief Detour Through Queueing Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5. Expectations, Norms, Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.1. Expectations of some classic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.2. Some norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.3. The triangle inequality for norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.5.4. The Markov/Chebyshev Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.6. Vector Algebras of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.6.1. Algebras and Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.7. Adapted Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8. Some Convergence Results in Probability Theory . . . . . . . . . . . . . . . . . . . . . . . . .
8.1. The Weak Law of Large Numbers (WLLN) . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.1. The easiest WLLN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.1.2. Triangular arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.2. Almost Everywhere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.3. Converging Almost Everywhere . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4. Weak Laws Versus Strong Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4.1. The Maximum of a Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4.2. Sums with Random Signs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.4.3. A First Strong Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.5. The Borel-Cantelli Lemmas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.6. Limited Fluctuation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.7. Versions of Regularity for Time Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.7.1. Time Path Properties. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8.7.2. Asymptotic similarities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9. Time Paths, Near Continuity, Integrals, and Control . . . . . . . . . . . . . . . . . . . . . .
4
33
34
35
35
35
37
38
38
39
40
40
42
43
43
43
43
44
44
46
47
47
47
48
48
49
49
50
50
51
51
51
51
52
52
53
53
54
55
55
55
56
56
56
57
9.1. Near Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.2. Near Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.3. Paths and Integrals I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4. A First Control Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4.1. A Near Interval Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4.2. Solving the Near Interval Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.4.3. Checking that We’ve Solved the Original Problem . . . . . . . . . . . . . . . .
9.5. A Control Problem Without a Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5.1. Nonexistence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5.2. A Near Interval Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.5.3. Trying to Take the Near Interval Solution Back to V (S) . . . . . . . . . .
9.5.4. Another Represenation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.6. The Euler-Lagrange Necessary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.6.1. The Near Interval Formulations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.6.2. Necessary Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.7. Some Examples of Using the Euler-Lagrange Conditions . . . . . . . . . . . . . .
9.7.1. A Squared Law of Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.8. Control Problems Bang-Bang Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9.9. Savings/Investment Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10. The Basics of Brownian Motion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1. Two Versions of the Time Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2. Showing Brownian-ness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3. Derivatives and Bounded Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4. Itô’s Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
57
58
59
59
60
60
60
61
61
61
61
62
62
62
63
63
63
64
64
65
65
65
67
67
68
1. Introduction and Overview
At an intuitive level, the infinitesimals are the df , dx, and dt’s from calculus, smaller
in absolute value than any “real” number, but non-zero, written df ≃ 0, dx ≃ 0, dt ≃ 0.
Using infinitesimals allows us to define a function f : R → R being continuous at x if
f (x + dx) − f (x) is infinitesimal for any infinitesimal dx. It also allows us to define
(x)
f having derivative r at a point x if r − f (x+dx)−f
is infinitesimal. Interest here
dx
centers on their uses in game theory models. The main themes include equilibrium
refinement, control theory, differential games, continuous time games with and without
diffuse monitoring, and large population games.
For this introduction, suspend disbelief for just a bit, and suppose that there exist
infinitesimals, non-zero numbers smaller, in absolute value, than all of the real numbers
you are used to. The next section, §2, will “construct” infinitesimals and develop the
first set of tools to manipulate and use them. It is enough of a formal development
that we can go a good ways further without violating any proprieties. The first set
of tools is not enough to sensibly do all that we will want to do, and Section §5 will
develop everything else we need.
1.1. Equilibrium Refinement via Infinitesimals. A game Γ describes a strategic
interaction, and Eq(Γ) denotes the set of equilibria for Γ. If Eq(Γ) contains many
‘kinds’ of equilibria, then changes in the description of the strategic situation, Γ, can
have many ‘kinds’ of effects. As much of economic analysis is based on the dependence
of equilibria on aspects of the description of a strategic situation, this can be an
impediment to modeling. When the strategic situation has dynamics and differential
information, this problem is widespread.
Equilibrium refinement refers to the elimination of some equilibria or sets of equilibria as being “unreasonable.” This chapter covers some of the major themes in
equilibrium refinement from the perturbation point of view: an equilibrium, σ ∗ , or set
of equilibria, S, is reasonable if there are equilibria of small perturbations to the game
close to σ ∗ or close to S. The main tool is the concept of an infinitesimal perturbation,
which is an idealization of a sequence of perturbations that are converging to 0.
A game Γ, is given by a set I of players, usually people, involved in a strategic
situation. Each i ∈ I has a set of possible actions, Ai . The set of all possible choices
of actions is A := ×i∈I Ai . Preferences for each i ∈ I, are given by a von NeumannMorgenstern utility function, ui : A → R. Combining, Γ = (Ai , ui )i∈I . When there
are two players and each has two actions, we call the game a 2 × 2 game. The starting
point for perturbation-based equilibrium refinement is the following 2 × 2 game, due
to Selten.
left right
up (1, 1) (0, 0)
down (0, 0) (0, 0)
Some conventions: here I = {1, 2}, A1 = {up, down} = {u, d}, A2 = {left, right} =
{l, r}; player 1, listed first, chooses which row occurs by picking either the action
“up” or the action “down;” and player 2 chooses which column by picking either the
action “left” or the action “right;” each entry in the matrix is uniquely identified
6
by the actions a1 and a2 of the two players, each has two numbers, (x, y), x is the
utility of player 1 and y the utility of player 2 when the vector a = (a1 , a2 ) is chosen,
equivalently, (x, y) = (u1 (a1 , a2 ), u2 (a1 , a2 )) when (x, y) is the entry in the a1 row and
the a2 column.
In an equilibrium, player 1 picks action up with probability σ1 ∈ [0, 1] while player
2 picks left with probability σ2 ∈ [0, 1]. (σ1∗ , σ2∗ ) = (1, 1) is the obvious equilibrium,
(σ1∗ , σ2∗ ) = (0, 0) is also an equilibrium. There are many ways to talk about how
unreasonable the second equilibrium is. One of the most convincing comes from the
observation that 1’s payoff to up are 1 or 0 depending on 2’s choice of left or right,
where 1’s corresponding payoff to down are 0 and 0. So, no matter what 2 does, 1 is
at least as well off playing up, and may be strictly better off. Given the symmetry of
the game, the same is true for player 2.
One way to perturb player 1’s strategies is to restrict 1 to play σ1 ∈ [ǫ1,u , 1 − ǫ1,d ],
ǫ1,u > 0 and ǫ1,d > 0. That is, player 1 must put mass at least ǫ1,u on up, and at
least mass ǫ1,d on down. The perturbation is interior because we require that the
ǫ’s be strictly positive, it is infinitesimal if ǫ1,u ≃ 0 and ǫ1,d ≃ 0. The infinitesimal
perturbation makes a huge difference to the set of equilibria: if 1 is playing any strategy
in the perturbed set, 2’s payoff to playing left is at least 1σ1,u > 0, while 2’s payoff to
playing right is 0; and the analysis is directly parallel for 1’s payoffs if 2 is playing any
strategy in the perturbed set. Thus, in the perturbed game, the unique equilibrium is
(σ1∗ , σ2∗ ) = (1 − ǫ1,d , 1 − ǫ2,r ), and (1 − ǫ1,d , 1 − ǫ2,r ) ≃ (1, 1).
In general, ∆i = ∆(Ai ) denotes the set of distributions over Ai and ∆ := ×i∈I ∆i
is the set of vectors of choices with the explicit understanding that any vector (σi )i∈I
corresponds to a product distribution. We will define an vector of strategies, σ ∗ ∈ ∆
to be a perfect equilibrium if there is an interior, infinitesimal perturbation, ∆i (ǫi ),
to the strategy sets ∆i and an equilibrium σ ǫ ∈ ×i∈I ∆i (ǫi ) with σ ǫ ≃ σ ∗ . This directly
parallels the sequence definition: σ ∗ is a perfect equilibrium if there is a sequence
of interior perturbations to the strategy sets, ∆i (ǫni ), with ǫni → 0, and a sequence of
equilibria, σ n ∈ ×i∈I ∆i (ǫni ), with σ n → σ ∗ . We replace the sequence of perturbations
with infinitesimals and replace taking limits with looking for a strategy in ×i∈I ∆i with
one that is infinitesimally close.
The theory begins to have real bite and deep implications when we apply it to extensive form games. Here, we will work through the implications of stability requirements
in terms of “reasonable” conjectures and/or reasonable “beliefs,” especially in their
iterative formulations. Stability requirements are that all equilibria, or sets of equilibria, that are not refined away, be close to the equilibrium set for all version of the
game in a large class of perturbed games.
1.2. Compact and Continuous Normal Form Games. If Γ = (Ai , ui )i∈I is a
game, each Ai is a compact metric space, and each ui is jointly continuous, we have
the simplest class of infinite action set games. Here we can replace each set Ai with
a finite set that is infinitesimally close and examine equilibrium refinement from this
point of view.
7
1.2.1. Limit games and limit equilibria. As upper/lower hemicontinuity, cover results
from [Fudenberg and Levine, 1986].
1.2.2. The need for exhaustiveness. Examples from [Simon and Stinchcombe, 1995]
1.2.3. Respecting weak dominance. Entry example here from [Simon and Stinchcombe, 1995],
dominance arguments to the fore again.
1.3. Extensive Form Games with Infinite Action Sets. If we have an extensive
form game with infinite sets of actions, the situation is much more complicated. Briefly
cover problems in [Simon and Zame, 1990], then the fixes in [Stinchcombe, 2005].
Note: Manelli and cheap talk signaling games [Manelli, 1996]; failures of upper and
lower hemicontinuity.
1.4. Control Problems and Differential Games. If we have an infinitesimals, dt,
then we have an infinite number 1/dt. Using 1/dt steps of size dt, we can approximate
a time interval, e.g. [0, 1], by what we call a near interval, T = {0, dt, 2dt, · · · , N · dt}
where N · dt ≃ 1. This turns continuous time control problems into ∗ -finite problems,
and these can often be solved using moderately elementary techniques.
1.4.1. An Easy Control Problem. For example, consider the following simple control
problems,
R1
min 0 [c1 ẋ2 (t) + c2 x(t)] dt s.t. x(0) = 0, x(1) ≥ B, ẋ(t) ≥ 0.
(1)
The idea is that, starting with none of a good on hand, we need to produce a total
amount B of a good by time 1. There is a storage cost, c2 , per unit stored for a unit
of time, and producing at a rate r, i.e. having x′ (t) = dx/dt = ẋ = r, incurs costs at a
rate c1 (x′ (t))2 . The tradeoff between producing fast productino rate and storage costs
leads us to believe that the solution must involve starting production at a low level at
some point in the interval and increasing the rate at which we produce as we near the
end of the interval.
Let us turn to the near interval formulation of the problem. We replace [0, 1] by
a near interval T with increments dt, and to make life simpler we suppose that the
increments have equal size, dt ≃ N1 for some infinite N , that is, we pick dt to be a
,
special kind of infinitesimal. Now x′ (t) is the action, at , chosen at t, at = x(t+dt)−x(t)
dt
that is, the discrete slope of the amount on stock over the interval of time between t
and t +P
dt. This means that if we choose actions a0 , a1 , . . . , aN −1 , then by any t ∈ T ,
x(t) = s<t as ds. The problem is replaced by
P
P
P 2
(2)
mina0 ,a1 ,...,aN −1
s<T as ds = B, at ≥ 0.
s<t as ds dt s.t.
t c1 at + c2
This problem can be solved by consulting the Kuhn-Tucker conditions.
In general, the existence of characterization of solutions to control problems is often
easier when one uses of infinitesimals in the form of a near interval: when the near
interval solution is infinitesimally close to a time path of actions in [0, 1], that time
path is a solution to the original problem; when a control problem using [0, 1] as the
time set does not have a solution, the near interval solutions have interpretations as
Young measures. The following example shows what is involved in things going wrong.
8
1.4.2. A Less Easy Control Problem. Consider the problem in
R1
max 0 [ẋ2 (t) − x2 (t)] dt s.t. − 1 ≤ ẋ(t) ≤ +1
(3)
where the maximum is taken over piecewise continuous functions t 7→ ẋ(t). The first
term in the integrand tells us that we want to be moving as fast as possible, that is, we
want ẋ2 (t) large. The second term tells us that we want to minimize our displacement,
that is, we want x2 (t) small. These have a rather contradictory feel to them. Let us
examine just how contradictory.
Divide the interval [0, 1] into N equally sized sub-intervals, N ∈ N, and consider
the path that over each interval [ Nk , k+1
] has ẋ = +1 for the first half and ẋ = −1
N
for the second half. This means that x(t) goes up at slope +1 over the first half of
each interval and down with slope −1 over the second half of each interval. An N ↑,
the value to this path in (82) converges up to 1. However, the value 1 cannot be
achieved by any path — that requires that ẋ(t) alway be either ±1 and x(t) always
be 0, contradictory requirements.
Replacing [0, 1] with a near interval T = {0, N1 , . . . , NN−1 , 1} with N unlimited and
even. Reformulate (3) as
2 i
P
P h 2
dt s.t. − 1 ≤ at ≤ +1.
(4)
a
ds
maxa0 ,a1 ,...,aN −1
a
−
s
t
s<t
t
Notice that there is a pair of multipliers for each t ∈ T , one for the constraint −1 ≤ at
and one for the constraint at ≤ +1. Only one of each of these constraints can be binding
at any point in time. Often the time path of the multipliers is very informative about
when and where constraints are most strongly pinching the solution. Here, there is so
much symmetry that the pattern of the multipliers looks like the pattern we will see
in the solutions and has no further information.
One of the two solutions to this is a∗k/N = +1 for the even k and a∗k/N = −1 for
the odd k (the other solution reverses the signs). This gives a utility of 1 − 12 dt2 ≃ 1.
We see a continuation of the pattern of approach the supremum value of 1 — since
dt = N1 , larger N yields smaller dt, yielding a higher value. Thus, the near interval
formulation has a solution, it gives a value ≃ 1.
The optimal path t 7→ a∗t is not infinitesimally close to anything continuous as it
move up or down by 2 between each t and t + dt. This is a phenomenom known as
chattering. Not only is there not any continuous function that behaves like this,
there is not any measurable function.
To see why, look up Lebesgue’s density theorem, it tells us that for any measurable
A ⊂ [0, 1], there is an A′ ⊂ A such U nif (A \ A′ ) = 0 and for each x ∈ A′ ,
limǫ↓0
U nif (A∩(x−ǫ,x+ǫ))
2ǫ
= 1.
(5)
If A = [a, b], then A′ = (a, b), we just get rid of the end-points. The amazing part of
Lebesgue’s result is that this simple intuition of getting rid of end points works for all
measurable sets. In particular, this means that for each x ∈ A′ , the derivative of the
function H(x) = U nif (A∩[0, x]) is equal to 1. Applying Lebesgue’s density theorem to
B := Ac , for x ∈ B ′ , the derivative of H(x) is equal to 0. Since U nif (B)+U nif (B ′ ) =
9
1, this means that the derivative of H is, for Lebesgue almost every x ∈ [0, 1], either
equal to 0 or equal to 1.
Now, if there is a measurable function representing t 7→ a∗t , then we can partition
[0, 1] into a set A on which ẋ(t) = +1 and B := Ac on which ẋ(t) = −1. However,
for every non-infinitesimal x ∈ [0, 1], the proportion of the t ∈ T with t < x and
a∗t = +1 is, up to an infinitesimal, equal to 12 . This means that for our measurable
function we would have to have U nif (A ∩ [0, x]) = U nif (B ∩ [0, x]) = 21 x for each
x ∈ (0, 1]. Lebesgue’s density theorem tells us that this cannot happen, we must have
the derivative equal to 1 or 0 almost everywhere rather than equal to 12 everywhere.
1.4.3. Differential Games. Control theory is single person optimization. Equilibrium
in game theory involves many people simultaneously optimizing given the others’ optimizing behavior. When two or more players are simultaneously picking actions at , and
this affects the state of the system through differential equations, we have a differential
game. Differential games have a long history, but when the actions affect the state of
the system through stochastic differential equations, we are in a relatively new class of
games. The entry to such games passes through continuous time stochastic processes.
1.5. Continuous Time Stochastic Processes and Diffuse Monitoring. We cover
a Brownian monitoring model and a Poisson monitoring model. Both of the continuous
time processes involve are Levy processes, and studying these with near intervals is a
tremendous simplification that leaves us, often enough, with the same diffuse model
of monitoring others actions.
1.5.1. Brownian monitoring. Again let T = {0, N1 , N2 , . . . , NN−1 , 1} with N an infinite
integer. Consider the probability space Ω = {−1, +1}T and define P so that the
canonical projection mappings projt (ω) := ωt are an i.i.d. collection with P (ωt =
−1) = P (ωt = +1) = 21 . Equivalently, let P be the uniform distribution on Ω. From
this, define X(t, ω) as follows: X(0, ω) ≡ 0, X(1, ω) = √1n ω1 , X(2, ω) = √1N (ω1 + ω2 ),
P
. . ., X( Nk , ω) = √1N ki=1 ωi . This is a random walk model that moves through time in
p
step sizes dt := N1 , and moves up and down ± 1/N .
If r ∈ (0, 1] and Nk ≃ r, then X( Nk , ·) is the sum of infinitely many i.i.d. random
variables that have been scaled so that Var(X( Nk , ·)) ≃ r. The oldest (deMoivre)
arguments for the central limit theorem tells us that X( Nk , ·) is infinitely close to being
k′
, ·) −
a Gaussian distribution. Further for k < k ′ < k ′′ , the random increments, (X( N
k
k′′
k′
X( N , ·)) and (X( N , ·) − X( N , ·)) are independent. This is infinitesimally close to
p
a Brownian motion. By changing the probabilities of ± 1/N by the appropriate
infinitesimal, the Brownian motion gains a drift. If the drift depends on the action of
one player, the process provides a diffuse and noisy signal of that action, one in which
evidence becomes stronger and stronger over time.
1.5.2. Poisson monitoring. In a similar vein, continuing with T = {0, N1 , N2 , . . . , NN−1 , 1},
let Ω′ = {0, 1}T and define Q so that thet canonical projection mappings projt (ω ′ ) :=
ωt′ are an i.i.d. collection with P (ωt′ = 1) = λdt where dt := N1 is the infinitesimal size
10
of the incremental steps in the time set, P
and λ is limited and strictly positive. Define
Y (0, ω ′ ) ≡ 0, Y ( N1 , ω ′ ) = ω1′ , Y ( Nk , ω ′ ) = i≤k ωi′ .
If r ∈ (0, 1] and Nk ≃ 0, Y ( Nk , ·) is infinitely close to having a Poisson(λr) distribution
(by the binomial approximation to Poisson distributions). Further, for k < k ′ < k ′′ , the
′′
k′
k′
, ·) − Y ( Nk , ·)) and (Y ( kN , ·) − Y ( N
, ·)) are independent. This
random increments, (Y ( N
is infinitesimally close to a Poisson process. By changing the infinitesimal probability
λdt to λ′ dt, we change the arrival rate of the Poisson process. If the arrival rate
depends on the action of one player, the process provides a diffuse and noisy signal of
that action, one in which evidence becomes stronger and stronger over time.
Exercises: the minimum of two negative exponentials is a negative exponential with
half the mean and cetera; distributions of Erlangs can be had from this; hypoexponentials and Coxian distributions from intensity matrixes; use in queuing theory.
1.5.3. Levy processes and monitoring.
1.6. Continuous Time Game Theory with Sharp Monitoring. In the monitoring of others’ actions is not diffuse but instantaneous, we have a very different class
of games. Consider a game in which
R 1 players change between different actions at any
time t ∈ [0, 1], and receive utility 0 ui (t, a(t)) dt where a(t) is the vector of actions
picked at time t. There are difficulities defining strategies if the players can respond
instantly to other players’ changes of action. These definitional difficulties disappear
if we replace [0, 1] with a near interval T = {0, N1 , N2 , . . . , NN−1 , 1}. With this formality
in hand, we can fruitfully study continuous time folk theorems, adoption games, price
competition.
The situation is much as we will see in the analysis of control problems: when
the near interval equilibrium path is infinitesimally close to a time path of actions in
[0, 1], that time path is an equilibrium to the original problem; when a game using
[0, 1] as the time set does not have a solution, the near interval equilibrium will have
interpretations as Young measures.
1.7. Large Population Games. The unit interval, [0, 1], has served not only as a
time set, but as a model of an infinite population of players. The attraction of this
model is that each player is atomistic, infinitely small, and their actions are easily
modeled as not having any effect on anyone but themselves. However, and this is the
basis for how useful the model is, any positive fraction of the population does have an
effect on others.
Such models are often motivated as idealizations of very very large finite sets of players. However, there are several types of equilibrium results using [0, 1] as the set of
players that are not limits of equilibrium results using sequences of equilibrium in models with larger and larger finite sets of players. Replacing [0, 1] by {0, N1 , N2 , . . . , NN−1 , 1}
means that the limit results hold automatically, providing a better basis for understanding the large population models.
11
2. Constructing Infinitesimals and the Tools to Use Them
Consider the following four sequences, three of them with the same limit:
r =(r, r, r, r, . . .)
(6)
ǫ =(1, 1/2, 1/3, 1/4, 1/5, . . .)
(7)
ǫ2 =(1, 1/4, 1/9, 1/16, 1/25, . . .)
(8)
0 =(0, 0, 0, 0, 0, . . .)
(9)
For any r ∈ R, r > 0, for all but a small set of integers n, we have
1
1
0 < 2 < < r.
(10)
n
n
One development of infinitesimals adds new numbers to R, denoting the new enlarged set as ∗ R, by identifying numbers as equivalence classes of sequences in such a
fashion that none of the four sequences are equivalent — they are unequal for all but
finitely many indices, therefore they are not equal/equivalent.
Defining “∗ <” the same way, we say that 0∗ < ǫ2∗ < ǫ∗ < r because {n ∈ N : 0 <
1
< n1 < r} has at most a finite complement.1 These inequalities are an indication
n2
that we are keeping track of what is happening on the way to the limit. The small,
non-zero sequences are the non-zero “infinitesimals,” written ǫ ≃ 0 and ǫ2 ≃ 0, because
their absolute value is smaller than any real number r > 0, yet they are strictly nonzero, in this case strictly positive.
If Fn is a sequence of finite sets, e.g. F1 = {0, 211 , 1}, F2 = {0, 212 , 222 , 232 , 1}, . . .,
Fn = { 2kn : 0 ≤ k ≤ 2n }, we get a ∗ -finite set by considering the equivalence class of
this sequence. This is one example of the near intervals we mentioned above. What
is crucial here is that we keep track of how we get to the limit from inside the set. In
a bit more detail, xn 6= x′n ∈ Fn , xn → x, and x′n → x, then x belongs to the limit
set for the sequence of sets, and we got to that limit two different ways. What ∗ -finite
sets do is to keep track of the ways in which you converge to points in the limit set.
2.1. A Purely Finitely Additive Point Mass. The basic device for us is the set of µ
equivalence classes of sequences where µ is a purely finitely additive “point mass.” We
will later show that there exists a “probability” on the integers, µ, with the following
properties:
1. for all A ⊂ N, µ(A) = 0 or µ(A) = 1;
2. µ(A ∪ B) = µ(A) + µ(B) for all disjoint A, B ⊂ N;
3. µ(N) = 1; and
4. µ(A) = 0 if A ⊂ N is finite.
Some useful and pretty obvious consequence of these properties:
1. If E1 , . . . , EK is a partition of N, then µ(Ek ) = 1 for exactly 1 of the partition
elements. To give a formal argument, start from the observation that this is true if
K = 2 (from the second property above), and if true for K, then it is true for K,
then it is true for K + 1.
1For
notational simplicity, we will drop the “∗ ” in from of “<.”
12
2. If µ(A) = µ(B) = 1, then µ(A ∩ B) = 1. Since µ(A ∪ B) = 1 because A ⊂ (A ∪ B),
this consequence follows from the observation that µ(Ac ) = µ(B c ) = 0 so that
A \ B = A ∩ B c is a subset of a 0 set, hence has mass 0, and, by the same reasoning,
B \ A is a null set. Finally, A ∪ B is the disjoint union of the sets (A \ B), (B \ A),
and (A ∩ B).
2.2. The Equivalence Classes. For any set X, X N denotes the set of sequences in
X. We define two sequences x = (x1 , x2 , . . .) and y = (y1 , y2 , . . .) to be equivalent,
x ∼µ y, if µ({n ∈ N : xn = yn }) = 1. By the second of the consequences just given,
this is an equivalence relation. For any x ∈ X N , hx1 , x2 , x3 , . . .i denotes the equivalence
class of x.
We define “star X,” written ∗ X to be the set of all equivalence classes, ∗ X =
(X N )/ ∼µ . This gives us new objects to use. The pattern is to “put ∗ ’s on everything,” where by ‘everything’ we mean relations, functions, sets, classes of sets,
correspondences, etc.
Example 2.2.1. ∗ [0, 1] contains the equivalence class dt := h1, 12 , 13 , . . .i, called a nonstandard number as well as all of the equivalence classes r := hr, r, r, . . .i, and these
are called the standard numbers. Since µ({n ∈ N : n1 < r}) = 1 if r ∈ (0, 1] and
µ({n ∈ N : 0 < n1 }) = 1, we write 0∗ < dt∗ < r. This means that our new number,
dt, is strictly greater than 0 and strictly less than all of the usual, standard strictly
positive numbers. We write this as dt ≃ 0 and say that dt is infinitesimal. The
only standard number in ∗ [0, 1] that is infinitesimal is 0.
Example 2.2.2. Another infinitesimal is dx = h1, 41 , 19 , . . . h, indeed, dx = (dt)2 and
0 < dx < dt < r (where we have not put ∗ ’s on the less than signs). Yet another
1
, 1012 , 1013 , . . .i. Now 0 < dy < dx < dt < r, and s := dx
=
infinitesimal is dy = h 10
dy
2
3
1/n
10 10
h 1/10
n i = h10, 2 , 3 , . . .i has the property that for any R ∈ R, R < s. We either say
that s is an unlimited number or we say that it is an infinite number.
Example 2.2.3. For x, y ∈ ∗ R, we define x−y = hx1 −y1 , x2 −y2 , x3 −y3 , . . .i, x+y =
hx1 + y1 , x2 + y2 , x3 + y3 , . . .i, x · y = hx1 · y1 , x2 · y2 , x3 · y3 , . . .i, |x| = h|x1 |, |x2 |, |x3 |, . . .i,
and so on. We write that x ≃ y if |x − y| ≃ 0, and say that x and y are infinitely close
to each other, or we say that they are at an infinitesimal distance from each other.
Example 2.2.4. A function f : [0, 1] → R is continuous iff for all a ∈ [0, 1], [xn →
a] ⇒ [f (xn ) → f (a)]. For any x ∈ ∗ [0, 1], we define ∗ f (x) = hf (x1 ), f (x2 ), f (x3 ), . . .i.
From this, you can see that the function f is continuous at a iff [x ≃ a] ⇒ [∗ f (x) ≃
f (a)]. An infinitesimal move in the domain of the function leads to an infinitesimal
move in the range.
2.3. Normal Form Equilibrium Refinement. Note: expand this section with GTNotes, varieties of dominance, iterated procedures.
2.3.1. Notation. A finite game is Γ = (Ai , ui )i∈I where A := ×P
i∈I Ai is finite and
Ai
A
ui ∈ R . Mixed strategies for i ∈ I and ∆(Ai )P
:= {µi ∈ R+ : ai ∈Ai µi (ai ) = 1}.
Utilities are extended to ×i∈I ∆(Ai ) by ui (µ) = a∈A ui (a)Πi∈I µi (ai ). The (relative)
interior of ∆(Ai ) is denoted ∆◦i and defined by ∆◦i = {µi ∈ ∆(Ai ) : µi ≫ 0}. We will
13
use the notation µ \ νi for the vector (µ1 , . . . , µi−1 , νi , µi+1 , . . . , µI ) and we will pass
back and forth between point mass on ai , i.e. δai , and ai as convenient.
For µ ∈ ×i∈I ∆(Ai ) and j ∈ I, Brj (µ) := argmaxai ∈Ai ui (µ \ ai ). With this notation
we have the starting point for non-cooperative game theory.
Definition 2.3.1. µ∗ is a Nash equilibrium if (∀i ∈ I)[µi (Bri (µ)) = 1]. The set of
Nash equilibria for a game is denoted Eq(Γ).
2.3.2. Perfection. Especially when the strategies ∆(Ai ) are the agent normal form
strategies for an extensive form game, there are many Nash equilibria. One way to get
rid of them is to ask that they be robust to infinitesimal perturbations in the games.
Here are three perturbation based equilibrium refinement concepts, in increasingly
order of strength. After these three we have a version of a set-valued solution concept.
Definition 2.3.2. For ǫ ∈ ∗ R++ , µ ∈ ×i∈I ∗ ∆◦i is ǫ-perfect if
(∀i ∈ I)(∀bi ∈ Ai )[ [max ui (µ \ ai ) > ui (µ \ bi )] ⇒ [µi (bi ) < ǫ] ].
ai ∈Ai
(11)
µ∗ ∈ ×i∈I ∆(Ai ) is a perfect equilibrium if µ∗ = ◦ µ for some ǫ-perfect µ with ǫ ≃ 0.
The set of perfect equilibria for a game is denoted P er(Γ).
Definition 2.3.3. µ∗ ∈ ×i∈I ∆(Ai ) is strictly perfect if for all µ ∈ ×i∈I ∗ ∆◦i ,
[ [µ ≃ µ∗ ] ⇒ (∀i ∈ I)[µi (Bri (µ) ≃ 1] ].
(12)
The set of strictly perfect equilibria for a game is denoted Str(Γ).
2.3.3. Properness.
Definition 2.3.4. For ǫ ∈ ∗ R++ , µ ∈ ×i∈I ∗ ∆◦i is ǫ-proper if
∗
(∀i ∈ I)(∀ai , bi ∈ Ai )[ [ui (µ \ ai ) > ui (µ \ bi )] ⇒ [µi (bi ) < ǫ · µi (ai )] ].
∗
(13)
◦
µ ∈ ×i∈I ∆(Ai ) is a proper equilibrium if µ = µ for an ǫ-proper µ with ǫ ≃ 0.
The set of proper equilibria for a game is denoted P ro(Γ).
2.3.4. p-Stability.
Definition 2.3.5. A closed connected set S ⊂ Eq(Γ) is robust to perturbations if
(∀µ ∈ ×i∈I ∗ ∆◦i )[ [dH (µ, ∗ S) ≃ 0] ⇒ (∀i ∈ I)[µi (Bri (µ)) ≃ 1] ].
(14)
A closed and connected S ⊂ Eq(Γ) is p-stable if it is robust to perturbations and no
closed, connected strict subset of S is robust to perturbations.
2.4. The Basic Results.
A. We will prove the following inclusion results.
1. Every perfect equilibrium is a Nash equilibrium, P er(Γ) ⊂ Eq(Γ).
2. Every proper equilibrium is a perfect equilibrium, P ro(Γ) ⊂ P er(Γ).
3. P ro(Γ) 6= ∅.
4. Every strictly perfect equilibrium is a perfect equilibrium, Str(Γ) ⊂ P er(Γ).
5. Every strictly perfect equilibrium is a proper equilibrium, Str(Γ) ⊂ P ro(Γ).
6. If S is a p-stable set, then S ⊂ P er(Γ).
7. If S is a p-stable set, then S ∩ P ro(Γ) 6= ∅.
14
B. If µ∗ ∈ P er(Γ), then there exists µ ∈ ∗ ×i∈I ∆◦i such that (∀i ∈ I)[µi (Bri (µ∗ ) ≃ 1],
but the reverse is not true. [This captures the difference between sequential and
trembling hand perfect equilibria.]
Example 2.4.1. The perfect equilibria for the following game strictly contains the set
of proper equilibria.
L
R
A2
T
(1, 1)
(0, 0) (−1, −2)
B
(0, 0)
(0, 0)
(0, −2)
A2 (−2, −1) (−2, 0) (−2, −2)
Example 2.4.2. The following game has no strictly perfect equilibrium, but its p-stable
set of equilibria is nice.
L
M
R
T (1, 2) (1, 0) (0, 0)
B (1, 2) (0, 0) (1, 0)
2.5. Compact and Continuous Games.
3. Extensive Form Games
Agent normal form, implications.
3.1. Decision Theory with Full Support Probabilities. Readings for this section
are [Blume et al., 1991a] and [Blume et al., 1991b].
Looking at strategies in ∗ ∆◦i made equilibrium refinement work pretty well, essentially because at all points in a game tree, the players had to pay attention to all
possibilities, but could assign relatively small probability to non-best responses. Another aspect of strictly positive probabilities is that one never has to condition on a
null set, all of the conditional probabilities are well-defined. This will save us a great
deal of hassle once we get to stochastic process theory.
3.2. Dynamic Decisions, the Basic Model. We assume that we have a probability
space (Ω, F, P ) where Ω is a ∗ -finite set, F = P(Ω), and P , the prior distribution, is
strictly positive. Utility depends on a random state, ω ∈ Ω, and the choice of action,
a ∈ A, u(a, ω). When we work with games, ω will be the choices of other players.
Let E1 , . . . , EK be ∗ -finite partition of Ω, representing what the decision maker will
know before their decision. That is, before making a decision, one learns which Ek ,
k ∈ {1, . . . , K} contains ω. In extensive form games, this corresponds to learning what
information set we are at. Because P is strictly positive, we never divide by 0 in the
following observation,
"
#
X
P (A ∩ Ek ) X
(∀A ∈ F) P (A) =
P (Ek )
=
P (Ek )P (A|Ek ) .
(15)
P
(E
k)
k
k
15
Another way to put this is that one’s posterior beliefs, that is, beliefs after having
observed your information, about an event A are P (A|Ek ). This equation tells us
that your
P average belief is your prior belief. As it holds for all A, we could write it
P (·) = k P (·|Ek )Pk in ∆(Ω) where Pk = P (Ek ).
3.3. Bridge Crossing. The decision problem is
Z
1Ek (ω)u(ak , ω) dP (ω).
P :
max
a1 ,...,aK ∈A
(16)
This kind of decision theory leads us to Bayes’ law updating, and the K problems
Z
Pk : max u(a, ω) dP (ω|Ek ).
(17)
a∈A
Recall the saying, “I’ll cross that bridge when I get to it.” It is usually understood
to mean that I’ll figure out what I need to do once I know more about the decision
problem. Here, what you will know is some one of the Ek .
Lemma 3.3.1 (Bridge Crossing). (a∗1 , . . . , a∗K ) solves the decision problem P if and
only if each a∗k solves problem Pk .
The Bridge-Crossing Lemma tells us that solving each Pk and putting it back together is the same as solving P , and vice versa. Defining P (·|Ek ) when P (Ek ) = 0 is
not a straightforward business.
3.4. Heirarchies of Beliefs. For the rest of the semester, we will almost exclusively
be looking at the case when P ∈ ∗ ∆◦ (Ω). In this case, P (Ek ) > 0 for all k, which is
nice. The difference between the infinitesimal and non-infinitesimal P (ω|Ek ) gives rise
to heirachies of beliefs as follows:
1. For P ∈ ∗ ∆◦ (Ω), let Q1 = ◦ P , and let E1 = {ω : Q1 (ω) > 0}.
2. If E1c 6= ∅, define Q2 = ◦ P (·|E1c ), and let E2 = {ω : Q2 (ω) > 0}.
3. If (E1 ∪ E2 )c 6= ∅, define Q3 = ◦ P (·|(E1 ∪ E2 )c ), and let E3 = {ω : Q3 (ω) > 0}.
4. And so on and so forth until some QK is reached (the process must end because Ω
is finite).
5. The heirarchy associated with P is (Q1 , . . . , QK ).
What is at work is the “order” of the infinitesimals.
Example 3.4.1. Let Ω = {ω1 , ω2 , . . . , ω7 }, for an infinitesimal non-zero ǫ, and let
P = ( 21 , 21 − (ǫ + ǫ2 ), 31 ǫ, 21 ǫ, 16 ǫ, 43 ǫ2 , 41 ǫ2 ) so that K = 3 and
Q1 = ( 21 , 21 , 0, 0, 0, 0, 0)
Q2 = (0, 0, 31 , 21 , 16 , 0, 0)
Q3 = (0, 0, 0, 0, 0, 43 , 41 ).
The two papers for this section work out some of the implications and properties
of a decision theory based on heirarchies like this. For game theory, what is at work
is perturbations in beliefs of agents in an agent normal form, and perturbations must
arise from other players playing strictly positive strategies.
16
3.5. Extensive Form Equilibrium Refinement.
3.5.1. Perfect, proper, and stable equilibrium outcomes.
3.5.2. Iterative dominance arguments.
4. Continuous Time Control and Games
4.1. Control Theory. The need for a theory of integration.
4.1.1. When it Works Wonderfully.
4.1.2. When it Works Less Well. On the importance of existence theorems for the
interpretation of necessary conditions, we have Perron’s paradox:
“If N is the largest positive integer, then N = 1. To see why, suppose that N is
the largest positive integer but N > 1. Then N 2 > N , implying that N was not the
largest positive integer.”
To put it another way, a necessary condition for N being the largest positive integer
is that N = 1. However, since there is no largest positive integer, the necessary
condition is non-sense.
Formally, conditioning on N being the largest positive integer, i.e. conditioning on
having an element in the null set, we can derive all kinds of things.
Fillipov’s Theorem: For the basic version and examples of this, see p. 119 of Liberzon’s “Calc of Variations etc.” (which is on the iPad as “cvoc”), [Liberzon, 2012],
A grown-up version is in the following and its prequel, [McShane and Warfield, 1969]
17
4.2. Games with Instantaneous Monitoring.
4.2.1. (R, ≤) is Totally Ordered but Not Well-Ordered.
Definition 4.2.1. A set X is totally ordered by a relation ⊂ X × X if for every
x, y, z ∈ E,
(i) [x y] ∧ [y x] ⇒ [x = y],
(ii) [x y] ∧ [y z] ⇒ [x z], and
(iii) [x y] ∨ [y x].
A chain of sets is totally ordered by ⊆. Any time set X ⊂ R is totally ordered by
≤. For the purposes of game theory, we need to go from what the agents choose to do
now to what happens next.
Definition 4.2.2. A totally ordered set (X, ) is well-ordered if every non-empty
S ⊂ X contains a least element.
A chain of subsets of the integers is totally ordered by ⊆. Any countable time set
X ⊂ R that has no accumulation points from the right is totally ordered by ≤.2
Lemma 4.2.3. If there exists a t ∈ X ⊂ R and a sequence xn in X with xn ↓ t, then
there is no least element in X that is greater than t.
Proof. If there exists t′ ∈ X, t < t′ , but X ∩ (t, t′ ) = ∅, then we have a contradiction
to xn ↓ t.
We now turn to some of the problems this causes for specifying strategies in continuous time games.
4.2.2. Implications. In discrete time, and in any near interval, for each time t, players
look at what they know of the history so far and choose what they will do at the “next
time” they have a chance to move. There is no such “next time” in continuous time
modeled as [0, ∞) or [0, 1] or [0, 1). This already causes problems for specifying how
strategies map to outcomes in single-agent ‘games,’ and causes even worse problems
for games with two or more agents.
A strategy should specify, for each t and each history on the interval [0, t), what
action should be chosen at t. A sensible definition of the outcome associated with a
choice of strategies is one that agrees, at each point in time, t, with what the strategy
calls for at t.
Example 4.2.1. Suppose that I = {1}, the action set is A1 = [0, 1], and at every
t ∈ [0, ∞), player 1 plays the strategy σ1 (h, 0) = 0 and σ1 (h, t) = sup{h(s) : 0 ≤ s < t}
for t > 0. For any τ > 0, any history of the form hτ (t) = 0 for t ≤ τ , hτ (t) = r for
t > τ and r 6= 0 has the property that for all t ∈ [0, ∞), σ1 (hτ , t) = hτ (t).
Intuitively, the strategy calls for the player to start by playing 0 and to continue
playing 0 so long as 0 is all that has been played in the past. It is a well-formed
strategy in the sense that it maps all histories and all times t into a definite choice
of action. The problem is that there is no first time past τ at which the strategy is
“mistaken.”
2t
∈ X is an accumulation point from the right for X if (∃xn inX)[xn ↓ t].
18
There are even worse problems when there are two or more players. Suppose that
at any t ∈ [0, ∞), players i ∈ I = {1, 2} are taking an action in Ai = {0, 1}. This
means that a history is a function h : [0, ∞) → A, A = ×i∈I Ai . We can represent
each history h as a pair of histories giving the actions of the two players, h = (h1 , h2 )
where t 7→ h1 (t) ∈ A1 and t 7→ h2 (t) ∈ A2 . For any history h = (h1 , h2 ) and t > 0, let
h|t− : [0, t) → A denote the truncation of h before t, that is, h|t− (s) := h(s) for s < t.
A strategy for i ∈ I specifies σi (h, t) ∈ Ai with the restriction that [h′|t− = h|t− ] ⇒
[σi (h, t) = σi (h′ , t). A set X ⊂ (0, ∞) is dense-in-itself from the left if for each
x ∈ X and ǫ > 0, X ∩ (x − ǫ, x) 6= ∅: Q ∩ (0, ∞) is dense-in-itself from the left; if
X ∩ (0, ∞) has full Lebesgue measure, then it is dense-in-itself from the left. The
following is from [Stinchcombe, 1992].
Example 4.2.2. The following strategies capture the following mis-matching ideas: if
player 2 has been playing a2 = 0 recently, 1 wants to mis-match, that is to play a1 = 1,
otherwise they want to mis-match by playing a1 = 0; if player 1 has been playing a1 = 1
recently, player 2 wants to match by playing a2 = 1, otherwise wants to mis-match by
playing a2 = 0. More precisely, let σi (h, 0) = 0 for i ∈ I, and for t > 0, let
(
(
1 if lim sups↑t h2 (s) = 0
1 if lim inf s↑t h1 (s) = 1
σ1 (h, t) =
σ2 (h, t) =
(18)
0 else,
0 else.
Being explicit about the “else” cases is useful: ¬ lim sups↑t h2 (s) = 0 iff lim sups↑t h2 (s) =
1; ¬ lim inf s↑t h1 (s) = 1 iff lim inf s↑t h1 (s) = 0.
It is one thing to say that there is no function agreeing with σ(h, t) at all t or at
most t. The next result says much more, it says that for any possible history h, σ(h, t)
disagrees with h(t) at all strictly positive points in the time set.
Claim: If h : X → A and X is dense-in-itself from the left, then for all t ∈ X ∩ (0, ∞),
h(t) 6= σ(h, t).
Proof. At any t ∈ X ∩ (0, ∞), either σ1 (h, t) = 1, i.e. lim sups↑t h2 (s) = 0, or σ1 (h, t) =
0, i.e. lim sups↑t h2 (s) = 1.
Case 0, lim sups↑t h2 (s) = 0: This can only happen if (∃ǫ > 0)(∀s ∈ X ∩ (t −
ǫ, t))[h2 (s) = 0]. To be consistent with σ1 , this means that (∀τ ∈ X ∩(t−ǫ, t))[h1 (τ ) =
1]. To be consistent with σ2 , this in turn means that (∀τ ∈ X ∩ (t − ǫ, t))[h2 (τ ) = 1],
which contradicts lim sups↑t h2 (s) = 0.
Case 1, lim sups↑t h2 (s) = 1: This can only happen if (∀ǫ > 0)(∃τ ∈ X ∩ (t −
ǫ, t))[h2 (τ ) = 1]. For h2 (τ ) = 1 to be consistent with σ2 , there must exists δ > 0 such
that h1 (s) = 1 for all s ∈ X ∩ (τ − δ, τ ). Consistency with σ1 implies that h2 (s) = 0
for s ∈ X ∩ (τ − δ, τ ), which contradicts σ1 and h1 (s) = 1 on this interval.
4.3. Games Played on a Near Interval. The paper [Simon and Stinchcombe, 1989]
gave an expanded history space, one that allows for infinitely fast reactions, and conditions on strategies guaranteeing that playing the strategies on any near interval gave
rise to histories with two properties: the standard parts belong to the expanded history space; and the outcomes are at infinitesimal distance from each other. The paper
[Stinchcombe, 1992] used a more general version of the history space and gave the
19
minimal conditions on vectors of strategies guaranteeing playability on standard time
intervals. Here, we will study games with flow payoffs on near intervals, examine the
set of equilibria, and when we do and do not have nearstandard equilibrium outcomes
in the expanded history space.
4.3.1. Actions, Histories, Strategies, and Outcomes. Each i ∈ I has a non-empty action set Ai , A denotes ×i∈I Ai . Time is T = {tk = Nk : k = 0, . . . , N }, N = m! for
m ≃ ∞. Note that t0 = 0, tN = 1, dt = N1 ≃ 0, and T is a near interval. A complete
history is a point in {γ} × AT , but we will mostly notationally suppress the γ. For
t ∈ T , t > 0, t− denotes the largest tk ∈ T with tk < t, and h|t− denotes the restriction
of h to the set {0, . . . , t−}, i.e. h|t− ∈ A{0,...,t−} and h|t− = proj{0,...,t−} (h). We define
h|0− = γ.
A pure strategy for i ∈ I specifies what ai ∈ Ai player i chooses at t = 0, and
what they choose in response to each h|t− . A behavioral strategy for i ∈ I specifies
what σi ∈ ∆(Ai ) they pick. With the γ convention above, the set of decision nodes is
∪t∈T {h|t− : h ∈ AT }, and behavioral strategies map decision nodes to mixtures over
the Ai ’s.
An outcome is a distribution over AT . For each decision node, h|t− , and vector of
behavioral strategies, σ, there exists a unique outcome O(h|t− ; σ) defined in the usual
inductive way, though now the
P induction is over the near interval.
T
For each h ∈ A , Ui (h) := t∈T ui (t, at ) dt. We extend
P each Ui to strategies starting
at decision nodes in the usual fashion, Ui (h|t− ; σ) = H Ui (h)O(h|t− ; σ)(h), that is,
Ui (h|t− ; σ) is the expected utility if σ is played starting at h|t− .
A game on a near interval T is given by Γ = (H, (ui (t; ·))t∈T )i∈I where H ⊂ AT .
The strategies must be restricted so that only outcomes in H arise. By choosing H
judiciously, we can cover a number of different types of games: if H contains only
histories in which each i ∈ I starts at ai = 0 and changes action only once, then
strategies can only give the option to change if no change has happened before; if H
contains only histories in which i’s action is constant on some sub-interval of T , then
i’s strategies must be restricted so that the cannot move during that sub-interval; if H
contains only histories in which i chooses in a set A′i ⊂ Ai if j has previously chosen
in a set A′j ⊂ Aj , then j’s choices determine i’s options; if at each h|t− , each i ∈ I can
only choose from some non-empty Ai (h|t− ) ⊂ Ai , then H becomes the set of histories
with h(t) = ×i∈I Ai (h|t− ); more subtle restrictions on H allow for a choice at tk to
affect possible choice sets of some or all agents at tk+m .
We will be, at first, mostly interested in the case that t 7→ ui (t, ·) is constant, or at
least, near-continuous. Then, to handle timing and pre-emption games, we will restrict
to subsets of H ⊂ AT where the agents start move at most once, corresponding to the
choice to enter a market. We call Γ a continuously repeated game if H = AT .
Definition 4.3.1. σ ∗ is an equilibrium if for all i ∈ I and all strategies σi for i,
Ui (h|0− ; σ ∗ ) ≥ Ui (h|0− ; σ ∗ \σi ); it is an infinitesimal equilibrium if for all i ∈ I and
all strategies σi for i, ◦ Ui (h|0− ; σ ∗ ) ≥ ◦ Ui (h|0− ; σ ∗ \σi ).
Infinitesimal equilibria are hybrid objects: the strategy spaces belong to V (∗ S); the
utility function is external, it does not belong to V (∗ S). A Loeb space (Ω, L(F), L(P ))
20
is a hybrid objects in much the same way: the set Ω is internal, the class of sets F is
internal, the function P is internal; for E ∈ F, L(P )(E) = ◦ P (E), and we expand the
domain of L(P ) to the completion of the sigma-field generated by F.
Calculations in Loeb spaces are often simplified by being able to show that some
probability is infinitesimal, hence can be ignored. We have a similar situation for
infinitesimal equilibria: σ ∗ is an infinitesimal equilibrium iff for some infinitesimal ǫ,
Ui (h|0− ; σ ∗ ) + ǫ ≥ Ui (h|0− ; σ ∗ \σi ).
Definition 4.3.2. σ ∗ is a subgame perfect equilibrium (sgpe) if for all i ∈ I,
all decision nodes, h|t− , and and all strategies σi for i, Ui (h|t− ; σ ∗ ) ≥ Ui (h|t− ; σ ∗ \σi );
it is an infinitesimal subgame perfect equilibrium (isgpe) if for all i ∈ I, all
decision nodes h|t− , and all strategies σi for i, ◦ Ui (h|t− ; σ ∗ ) ≥ ◦ Ui (h|t− ; σ ∗ \σi ).
To check that a strategy is a sgpe, we need only check that there are no profitable
one-period deviations — a complicated deviation can not gain utility if at every step
it is losing, or not gaining, utility. To check that a strategy is an isgpe, it seems that
we may need to do more — summing an infinite number of infinitesimals utility gains
may give a non-infinitesimal utility gain. However there is a sufficient finite deviation
condition: an internal deviation from an isgpeP
yields an internal sequence of utility
rt
for example, dt
gains, (rt )t∈T ; to be an isgpe, we must have
is
t∈T rt dt ≃ 0; if, P
′
′
infinitesimal except for a subset T ⊂ T with |T |/|T | ≃ 0, then t∈T rt dt ≃ 0; in
particular, finiteness of T ′ is sufficient, and as we will see, this is often enough for our
purposes.
4.3.2. Safety in Continuously Repeated Games. For a continuously repeated game,
there are three “safe” utility levels that one might imagine i being able to guarantee
iself at time t, and their value integrated over the course of the game,
X pure
pure
vi,t dt,
(19)
max ui (t; bi , a−i ) , Vipure :=
v i,t = min
a−i ∈×j6=i Aj
v mixed
=
i,t
v corr
i,t
=
bi ∈Ai
min
σ−i ∈×j6=i ∆(Aj )
min
µ−i ∈∆(×j6=i Aj )
t∈T
max ui (t; bi , σ−i ) , Vimixed :=
bi ∈Ai
max ui (t; bi , µ−i ) , Vicorr :=
bi ∈Ai
v pure
i,t
v mixed
i,t
X
mixed
vi,t
dt and
(20)
t∈T
X
corr
vi,t
dt.
(21)
t∈T
≥
≥ v corr
Since ×j6=i Aj ⊂ ×j6=i ∆(Aj ) ⊂ ∆(×j6=i Aj ),
i,t .
Once we understand what these safety levels are about, it is easy to give games
where the inequalities are strict: the first safety level corresponds of the worst that
dolts who do not understand randomization can do to i; the second corresponds of the
worst that enemies who do understand independent randomization can do to i; the
third corresponds of the worst that fiends who completely understand randomization
can do to i. The three v i ’s are called “safety levels.” Here is one of the reasons.
Lemma 4.3.3. If σ is an equilibrium of a continuously repeated game, then Ui (σ) ≥
Vimixed , if it is an infinitesimal equilibrium, then ◦ Ui (σ) ≥ ◦ V mixed .
This lemma is ridiculously easy to prove once you see how.
21
Proof. For any strategy σ−i for the other players, consider the strategy that at time t
myopically best responds to σ−i (h|t− ). For each t ∈ T , the associated utility is at least
mixed
vi,t
.
4.3.3. Some Examples. An easy one to analyze is the Prisoners’ Dilemna: I = {1, 2},
A1 = A2 = {Sil, Sq}, u(t, ·) independent of t and given by
Sil
Sq
Sil (3, 3) (−1, 4)
Sq (4, −1) (0, 0)
Here V pure = V mixed = V corr = (0, 0). If σ is an equilibrium, then U (σ) = (0, 0).
If ǫ ≃ 0 is larger than dt, then the Nash reversion strategies are an ǫ-sgpe with
U (σ) = (3, 3). Further, st {U (σ) : σ is a ǫ − sgpe} = {v ∈ con (u(A)) : v ≥ (0, 0)}.
Rational Pigs is approximately as easy to analyze: I = {1, 2}, A1 = A2 = {P, W },
u(t, ·) independent of t and given by
P
W
P (−1, 4) (−1, 5)
W (2, 2)
(0, 0)
Again, V pure = V mixed = V corr = (0, 0) and st {U (σ) : σ is an ǫ − sgpe} = {v ∈
con (u(A)) : v ≥ (0, 0)}.
Matching Pennies: u(t; ·) is given by
H
T
H (+1, −1) (−1, +1)
T (−1, +1) (+1, −1)
Here there is a unique equilibrium outcome, involving ( 21 , 12 ) randomization in each
period. The associated equilibrium outcome has no standard part.
An interpretational issue: For Matching Pennies, all ǫ-sgpe equilibria, ǫ ≃ 0 involve chattering, as do many of the equilibria discussed in the previous two examples. Chattering paths on a near interval have no standard counterpart. There are
a number of ways to force the outcomes to have standard paths: the easiest follows [Simon and Stinchcombe, 1989], restrict to H ⊂ AT with only finitely many actions for each player; in a similar vein, make changes of actions costly enough that
it will never be optimal to engage in more than finitely many; more subtly, follow
[Bergin and Macleod, 1993] or [Perry and Reny, 1993] and require that players have
some form of inertia in their choices.
The general result for continuously repeated games is that if t 7→ u(t; ·) is near
continuous, then st {U (σ) : σ is an ǫ − sgpe} is the intersection of the convex hull of
U (AT ) and {v : v ≥ V mixed }. The proof uses strategies a bit more complicated than
those in the examples so far, and essentially the same proof applies if t 7→ u(t; ·) is a
lifting of a measurable function.
22
4.3.4. Revisiting Cournot vs. Bertrand. Simple two-firm Cournot quantity competition models have equilibrium prices above marginal costs but below marginal revenue.
Simple Bertrand competition models with identical firms have a unique equilibrium
with p∗ = C ′ (q(p∗ )/2) where C(·) is the cost function for the firms and p 7→ q(p) is
the demand curve. Bertrand used this argument to assail Cournot’s model of competition. In continuous time, they both have, as an equilibrium, that the two firms split
monopoly profits.
To make things really stark, let us suppose that the technology is such that both
firms can supply the whole market without disadvantaging themselves, C(q) = c · q.
Consider the near interval game with ui (t; pi , pj ) given as follows,
if pi < pj ,
(pi − c) · q(pi )
1
ui (t; pi , pj ) = 2 (pi − c) · q(pi ) if pi = pj ,
(22)
0
if pi > pj .
Here Ai = ∗ [0, p], p being near standard, or Ai is a near interval of prices. In either
case, let pM on be the monopoly price and πM on the associated industry profits.
Claim: Both firms charging the monopoly price for each t ∈ T is an ǫ-sgpe for ǫ ≃ 0.
Proof. The grim-trigger strategies “start by playing pM on , continue to play pM on as long
as that is all that has been played in the past, else play p = c” are ǫ-sgpe provided
ǫ > πM on dt.
Problem 4.3.1. An ǫ-sgpe of the continuously repeated Cournot model, ǫ ≃ 0, yields
the same “split monopoly profits” payoffs.
4.3.5. Preemption Games. Here we examine [Fudenberg and Tirole, 1985].
Let I = {1, 2}, A = {Out, In}, and let H ⊂ AT be the set of time paths where
players change their action at most once, and if there is a change, it is from Out to In.
F&T interpret this switch of actions as the adoption of a new technology. Entering
early and being alone will be good for the firm because they enjoy monopoly profits,
however, entering early will incur a larger cost than entering late.
Payoffs have two parts, here divided into a flow part and a lump part, one can
usually convert a lump cost/benefit into its value-equivalent flow.
Flow payoffs : πO (0) is the net cash flow of firm i when 0 firms have entered and i is
still out; πO (1) is the net cash flow of firm i when the other firm has entered and i is
still out; πI (1) is the net cash flow if i has entered and is the only firm in the industry;
and πI (2) is the the net cash flow if both have entered.
Lump costs : The cost of entering at t is c(t), and it falls over time as the technology
becomes more mature.
V1 (t1 , t2 ) will denote 1’s flow profits if 1 adopts at t1 and 2 at t2 .
Rt
R∞
i. If both firms adopt at t, then V1 (t, t) = V2 (t, t) = 0 πO (0)e−rt dt+ t πI (2)e−rt dt.
ii.
If both firms adopt at t, then
Let L(t) denote the payoffs to being the Lead firm to enter if entry happens at t
and the other firm best responds, F (t) the payoffs to being the Follower firm if the
23
firm enters at t, and M (t) the payoffs if the firms enter siM ultaneously at t. These
functions are continuous, and F&T make assumptions on the flow payoffs that imply
that no firm will adopt at t = 0, and that there exists 0 < T1 < T1∗ < T2∗ < Tb2 < ∞
such that
• T1∗ is a firms favorite time to adopt if the other firm follows and plays a best
response;
• T2∗ is the follower’s best response adoption time to any t < T2∗ ;
• L > F > M on (T1∗ , T2∗ );
• L < F on (0, T1∗ );
• T1∗ maximizes L(t); and
• Tb2 maximizes M (t).
The definition of (T1∗ , T2∗ ) make them the unique equilibrium of a precommitment
game, that is, a game in which the two firms can commit to an entry time.
There are four results about the dynamic game.
1. If L(T1∗ ) > M (Tb2 ), the unique equilibrium outcome has, with probability 1 , firm
2
one adopting at T1 and firm two adopting at T2∗ , and with probability 12 the reverse.
2. If L(T1∗ ) < M (Tb2 ), then, defining S as the solution to M (s) = L(T1∗ ), there are two
classes of equilibria: the previous (T1 , T2∗ ) equilibria; pure strategy equilibria with
joint adoption at every t ∈ [S, Tb2 ].
3. If the pure strategy equilibria exist, they Pareto dominate the precommitment equilibrium, and the precommitment equilibria dominate the (T1 , T2∗ ) equilibrium.
4. The Pareto rank of the pure strategy equilibria adopting at t ∈ [S, Tb2 ] is given by
≤.
The second to last result has a “firms hate competition” flavor to it. Among the
pure strategy equilibria, one can pick using a weak dominance argument: play the
strategy respond instantly to adoption at t ∈ [S, Tb2 ), adopt at Tb2 whether or not the
other has adopted (and fill in subgames past Tb2 sensibly).
Let us consider a tk in the interval (T1 , T2∗ ] and a subgame h|tk − at which no-one has
yet adopted. Consider pure strategy equilibria, hard to find any. Symmetric mixed
equilibria where U (tk ) is the value of going on to tk+1 with no-one having adopted.
This has the following form,
In
Out
In
(M (tk ), M (tk )) (L(tk ), F (tk ))
Out (F (tk ), L(tk )) (U (tk ), U (tk ))
If (γk , 1−γk ) in (In, Out) is an equilibrium, then it has payoffs γk M (tk )+(1−γk )L(tk ).
L(tk )−F (tk )
This leads to γk = L(t
up to an infinitesimal ratio.
k )−M (tk )
A useful second step is the following lemma, especially if you think of r as ddt (L(t) −
F (t))|t=T1 .
24
Lemma 4.3.4. If T is a near interval and Xk is a collection of independent random
variables with P (Xk = 1) = rtk and PP
(Xk = 0) = 1 − rtk for a non-infinitesimal r,
then for some infinitesimal tK ∈ T , P ( k≤K Xk = 0) ≃ 0.
4.3.6. Wars of Attrition. Two competitors face each other in a battle of wills, s/he
who hangs on the longest wins. This can be interpreted in a variety of ways: as a
contest between firms in an industry only large enough to support one of them; as a
contest between animals for a valuable prize, be it a mating opportunity, a prime spot
for a nest, a food source; as a republican budget strategy.
We suppose that the value to competitor i is si ≥ 0, that si and sj are iid with
cdf F . The choice of action is ai ≥ 0, the amount of time to spend hanging on. The
utilities are given by
(
si − aj if ai > aj ,
(23)
ui (si , sj , ai , aj ) =
−ai
else.
As usual, we replace [0, ∞) with T = { Nk : k = 0, 1, . . . , N 2 , set tk = Nk and dt = N1 .
A. There are always the two asymmetric equilibria, ai = tN 2 and aj = 0, fighting
against someone infinitely stubborn, the best response is to quit now.
B. Now suppose that P (si = sj = v) = 1 for some v > 0, i.e. F (x) = 1[v,∞) (x).
(1) Being that the game is symmetric, there is a symmetric equilibrium, and it
cannot be a pure strategy equilibrium because the best response to the other
dropping out is to stay.
(2) For a symmetric mixed equilibrium, i cannot put non-infinitesimal mass on
quitting at any tk : this makes j’s payoff jump a non-infinitesimal amount by
waiting until tk plus some infinitesimal to quit; so j is not indifferent between
tk and other points, violating the symmetry of the equilibrium.
(3) At any h|tk − where neither has quit, the players are in a 2×2 game with payoffs
Quit
Stay
Quit (−tk , −tk ) (−tk , v − tk )
Stay (v − tk , −tk )
(ck , ck )
where ck is the continuation payoff.
Letting γk ∈ (0, 1) denote the probability of quitting at tk , we use indifference
between Quit and Stay right now for one condition, and the observation that
ck can be had by the indifference condition between Quit and Stay at tk+1 to
give us
γk (v − tk ) + (1 − γk )ck = −tk , and ck = −tk+1 = −(tk + dt).
1
dt.
v+dt
(24)
Solving yields γk =
Comments: γk is independent of k, and we should expect this because the only
difference between the game at tk and tk+1 is that we subtract dt from all of the
payoffs in the 2 × 2 matrix; the waiting time until i quits is negative exponential
with mean v, which means that the waiting time until the first player quits is
negative exponential with mean v/2 so that equilibrium payoffs or (0, 0), and we
25
should expect this since the players must be indifferent between any of the times
tk , and 0 is one of the times.
C. Now suppose that F is continuous and has a density f . Verify that the following
strategy vector is a symmetric Nash equilibrium: compete until b(si ) where
Z si
tf (t)
b(si ) =
dt.
(25)
0 1 − F (t)
D. Let G be the cdf of the quitting times in the previous equilibrium, that is,
G(a) = P (b(si ) ≤ a) = P
Z
si
0
tf (t)
dt ≤ a ,
1 − F (t)
(26)
and suppose that for some ǫ > 0, F (v − ǫ) = 0 and F (v + ǫ) = 1 where 0 < v − ǫ.
The following steps leadto the conclusion that G(a)
≥ 1 − e−a/(v+ǫ) .
R si f (t)
a. Show that G(a) ≥ P (v + ǫ) 0 1−F (t) dt ≤ a .
b. Show that the random variable F (si ) has the uniform distribution.
R s f (t)
c. Show that 0 1−F
dt = − log(1 − F (s)).
(t) R
f (t)
dt
≤
a
= P (F (si ) ≤ 1 − e−a/(v+ǫ) ). Combining,
d. Show that P (v + ǫ) 1−F
(t)
conclude that
G(a) ≥ 1 − e−a/(v+ǫ) .
(27)
E. Show that
G(a) ≤ 1 − e−a/(v−ǫ) .
(28)
F. Let Fn be a sequence of continuous cdfs with density fn and suppose that for all
ǫ > 0, Fn (v − ǫ) → 0 and Fn (v + ǫ) → 1. Show that the corresponding cdfs of
quitting times, Gn , converge3 to the symmetric equilibrium quitting time H, for
the case that F (x) = 1[v,∞) (x).
A more general analysis of a war of attrition can be had as follows. Two competitors
face each other in a battle of wills, s/he who hangs on the longest wins. We suppose
that the value to competitor types, ti , are iid U [0, 1], and the value to type ti is given
by v(ti ) = F −1 (ti ) ≥ 0 where F is a continuous cdf with density f . Denote pure
strategies for i as σi : [0, 1] → [0, ∞] where σi (ti ) is the time at which type ti stops
fighting. The action ai = ∞ is interpreted as “never stop fighting.” The utilities are
given by
ui (ti , tj , σi , σj ) =
(
v(ti ) − σj (tj ) if σi (ti ) > σj (tj ),
−σi (ti )
else.
(29)
1. Show that if i uses a non-decreasing strategy, then j has a response in the nondecreasing strategies.
2. Show that the non-decreasing functions from [0, 1] to [0, ∞] are a complete lattice.
3. Show that the set of equilibria in non-decreasing strategies has a lattice structure.
3In
the usual sense of convergence of cdf’s, Gn (a) → H(a) for all continuity points of H(·).
26
It is easy to verify that σi (ti ) ≡ 0 and σj (tj ) ≡ ∞ is an asymmetric equilibrium. The
rest of the problem focuses on the properties of symmetric equilibria in non-decreasing
strategies, (σi , σj ) = (σ, σ). For non-decreasing strategies, σi−1 : [0, ∞] → [0, 1] gives
the cdf of i’s quitting time, and if σ −1 is differentiable, then at time a it has hazard
−1 (a)/da
rate hσ (a) = dσ1−σ−1
.
(a)
1. If ti competes until time a ∈ [0, ∞], show that their payoff is
Z a
(v(ti ) − s) dσ −1 (s) − a(1 − σ −1 (a)).
Ui (ti , a; σ) =
(30)
0
From this, derive the first order conditions for a∗ (ti ) being v(t1i ) = hσ (a).
2. Show that the equilibrium hazard rate of the duration of the conflict must be nondecreasing. [Note that a∗ (ti ) = σ(ti ) so that v(ti ) = v(σ −1 (a)).]
Rt
ds.
3. Argue that σ(0) = 0 must hold in equilibrium and deduce that σ(ti ) = 0 i v(s)
1−s
From this, conclude that the optimal strategy increases in v(·). Examine the possibility of deriving this from supermodularity in eqn. (30).
27
4.4. Brownian Monitoring. standard parts of graphs.
4.5. Poisson Monitoring. standard parts of graphs.
4.6. Continuous Time Martingales. Urn models.
4.7. Itô’s Lemma. Trading models: Kreps; Harrison-Kreps; Harrison-Pliska.
5. Standard and Nonstandard Superstructures
We are going to need ways of talking about strong laws, central limit theorems,
properties of time paths, and we are going to want these to be internal, or near enough
to internal that we can assign probabilities to them. At this point, it makes sense to
go back and be a bit more clear about what we meant by “putting ∗ ’s on everything.”
The basic device for us is the set of µ equivalence classes of sequences where µ
is a purely finitely additive “point mass.” This material is based on Ch. 11.5 in
[Corbae et al., 2009] After this we turn to superstructures, then putting ∗ ’s on superstructures.
5.1. Purely Finitely Additive Point Masses. We are interested in a purely finitely
additive probability µ : P(N) → {0, 1}. Probabilities taking on only the values 0 or
1 are best thought of as point masses, and we will return to the question “Point
mass on what?” at some point later. These probabilities can also be understood as
µ(A) = 1F (A) where F ⊂ P(N) is a free ultrafilter on the integers, which contains
a bunch of as-yet-undefined terms.
F ⊂ P(N) is a filter if it is closed under finite intersections, A, B ∈ F, and supersets,
A ⊂ B and A ∈ F imply B ∈ F.
Examples: F(n) = {A ∈ P(N) : n ∈ A}; the Frechet filter (aka the cofinite filter),
F cof = {A ∈ P(N) : Ac is finite }; the trivial filter, F = {N}; the largest filter,
F = P(N).
A filter is proper if it is a proper subset of P(N), so no proper filter can contain ∅.
We will only T
work with proper filters from here onward.
T
NoteTthat {A : A ∈ F(n)} = {n} =
6 ∅ while {A : A ∈ F cof } = ∅. A filter F is
free if {A : A ∈ F} = ∅.
A (proper) filter is maximal if it is not contained in any other filter. A (proper)
filter is an ultrafilter if for all A ∈ P(N), A ∈ F or Ac ∈ F.
F(n) is an ultrafilter, and cannot be a strict subset of any other (proper) filter.
Lemma 5.1.1. A (proper) filter is maximal iff it is an ultrafilter.
Proof. A little bit of arguing.
Since F cof is a proper, free filter, the following implies that free ultrafilters exist, at
least if you accept Zorn’s Lemma, which is equivalent to the Axiom of Choice.
Theorem 5.1.2. Every proper filter is contained in an ultrafilter.
28
Proof. Zorn’s lemma plus the previous result.
Relevant properties of µ(A) := 1F (A) when F is a free ultrafilter: µ(A) = 0
for all finite A; µ(A ∪ B) = µ(A) + µ(B) if A ∩ B = ∅; µ(N) = 1; [µ(A) = µ(B) = 1] ⇒
[µ(A ∩ B) = 1]; µ(A) = 1 and A ⊂ B imply µ(B) = 1; if A1 , . . . , AK is a partition of
N, then µ(Ak ) = 1 for exactly one k ∈ {1, . . . , K}.
5.2. The equivalence relation ∼µ and ∗ X. For any set X, X N denotes the class
of X-valued sequences. For x, y ∈ X N , x ∼µ y if µ({n ∈ N : xn = yn }) = 1. We define
star-X by ∗ X := X N / ∼µ .
We will spend the rest of the semester working out what we have defined, and what
it is good for. Special cases of interest take X = R, or X = PF (A), the class of
finite subset of a set A. To do all of this once in an consistent fashion, we work with
superstructures.
5.3. Superstructures. Readings: Ch. 2.13 and 11.2 in [Corbae et al., 2009]
We start with a set S containing R and any other points we think we may need
later (which will not be very much).
Definition 5.3.1. Define V0 (S) = S and Vn+1 (S) = Vn (S) ∪ P(S). The superstructure over S is ∪∞
n=0 Vn (S). For any x ∈ V (S), the rank of x is the smallest n such
that x ∈ Vn (S). S is a set, and anything in V (S) with rank 1 or higher is a set,
nothing else is a set.
In particular, every set has finite rank, which avoids Russell’s paradox. A statement A(x) is the indicator function of set, where we interpret A(x) = 1 as “the
statement A is true for x.”
Examples: ordered pairs; functions from R to R; the set of sequences in R; the set of
Cauchy sequences in R; Rℓ+ ; rational preference relations on Rℓ+ ; rational preferences
on Rℓ+ that can be represented by C ∞ utility functions; the Hilbert cube [0, 1]N with
P |xn −yn |
; the collection of Gδ ’s in the Hilbert cube; the collection
the metric d(x, y) =
2n
of Polish spaces; the collection of compact metric space games with I players.
5.4. Defining V (∗ S) inductively. Now would be a good time to recall the properties
of our {0, 1}-valued, purely finitely additive µ.
1. Let Gn be a sequence in V0 (S), define (G1 , G2 , . . .) ∼ (H1 , H2 , . . .) if µ({n ∈ N :
Gn = Hn }) = 1 and for any sequence, let hG1 , G2 , . . .i denote its equivalence class.
V0 (∗ (S) is defined as the set of these equivalence classes. If G = hG, G, G, . . .i, then
G is a standard point, otherwise it is nonstandard point.
a. 0 = h0, 0, 0, . . .i, more generally r = hr, r, r, . . .i, r ∈ R, are typical standard
points.
b. h1, 12 , 13 , . . .i ≃ 0, hr + 1, r + 21 , r + 13 , . . .i, and h1, 4, 9, 16, 25, . . .i are nonstandard
points, an infinitesimal, a near-standard (aka limited) point, and an infinite (aka
unlimited) point.
29
2. Let Gn be a sequence in V1 (S) that is not a sequence in V0 (S). V1 (∗ S) is defined
as the union of V0 (∗ S) and the set of µ-equivalence classes of such sequences. An
element x = hxn i of V0 (∗ S) belongs to G = hGn i if µ{n ∈ N : xn ∈ Gn } = 1,
written x∗ ∈ G or x ∈ G. If G = hG, G, G, . . .i, then G is standard, otherwise it is
internal.
a. h[0, 1], [0, 1], [0, 1], . . .i is the standard set we denote ∗ [0, 1], ∗ R+ = hR+ , R+ , R+ , . . .i.
∗
[0, 1] contains the standard point hr, r, r, . . .i as long as 0 ≤ r ≤ 1, ∗ R+ contain
unlimited points such as the factorials hn!i. ∗ [0, 1] also contains the infinitesimal h1, 12 , 31 , . . .i, and the nearstandard point hr + 1, r + 12 , r + 13 , . . .i as long as
0 ≤ r < 1.
b. F = h{0, 1}, {0, 21 , 1}, {0, 41 , 24 , 34 , 1}, . . . is an internal set satisfying dH (F, ∗ [0, 1]) =
h 12 , 41 , 18 , . . .i ≃ 0. The function dH does not belong to V1 (∗ S), and we should be
able to figure out when it does appear.
3. Let Gn be a sequence in Vn+1 (S) that is not a sequence in Vn (S). And so forth and
so on . . . .
a. The Hausdorff metric for R is a function from pairs of compact subsets of R to
R+ . Every compact subset of R belongs to V1 (S). The class of compact sets
belongs to V2 (S). Every ordered triple of the form (K1 , K2 , r), r ∈ R, belongs
to V3 (S). dH is a particular subset of such triples, hence belongs to V4 (S).
Letting KR denote the compact subsets of R, for every pair Ka = hKa,1 , Ka,2 , . . .i
and Kb = hKb,1 , Kb,2 , . . .i in ∗ KR , we have set things up so that dH (Ka , Kb ) =
hdH (Ka,1 , Kb,1 ), dB (Ka,2 , Kb,2 ), . . .i.
b. If (Ω, F, P ) is a finite probability space with F = P(Ω), then RΩ is the set of
random variables on Ω. If (Ω, F, P ) = h(Ω1 , F1 , P1 ), (Ω2 , F2 , P2 ), (Ω3 , F3 , P3 ) . . .i,
Ω
Ω
∗
then RΩ = hRΩ
1 , R2 , R3 , . . .i is the set of -random variables on the internal set
Ω = hΩ1 , Ω2 , Ω3 , . . .i.
Some more examples.
Example 5.4.1. ∗ N is standard while F = h{k/2n : k = 0, . . . , n · 2n }i is an internal
subset of ∗ R+ with the property that for all limited r ∈ R, d(r, F ) ≃ 0.
Example 5.4.2. ∗ C([0, 1]) is standard while P oly = hspan ({xk : k = 0, . . . n})i is
an internal subset of ∗ C([0, 1]), and the Stone-Weierstrass theorem tells us that every
standard f ∈ ∗ C([0, 1]), d(f, P oly) ≃ 0.
5.5. Internal Sets for Stochastic Processes. To recognize when we have an internal set, it is useful to know when we don’t.
5.6. Some External Sets.
Theorem 5.6.1. The following sets are external.
a. A1 = {n ∈ ∗ N : n is standard }.
b. A2 = {n ∈ ∗ N : n is nonstandard }.
c. A3 = {r ∈ ∗ R : r is limited }.
d. A4 = {r ∈ ∗ R : r is unlimited }.
e. A5 = {r ∈ ∗ R : r is infinitesimal }.
30
Proof. If A1 = hA1,1 , A1,2 , A1,3 , . . .i, set an = max A1,n and let a = ha1 , a2 , . . .i. For
any limited n ∈ N, a > n, and a ∈ A1 . If A2 is internal, then so is Ac2 , but Ac2 = A1 .
If A4 = h|A4,1 |, |A4,2 |, |A4,3 |, . . .i, set an = inf |A4,n | and let a = ha1 , a2 , . . .i. It
cannot be that a is limited because this would mean that |A4 | ∩ (a, a + 1) 6= ∅ so that
|A4 | contains a limited number. But if a is unlimited, then so is a/2, and a/2 < a
implying that A4 is missing some unlimited numbers. A3 is the complement of A4 so
cannot be internal either. If A5 is internal, then so is {1/|x| : x ∈ A5 , x 6= 0}, but
this is the set of all unlimited positive numbers, which cannot be internal by the same
arguments.
Corollary 5.6.1.1. If an internal subset of ∗ R+ or ∗ N contains arbitrarily small unlimited numbers, then it contains a limited number. If an internal subset of ∗ R+ contains
arbitrarily large infinitesimals, then it contains a limited non-zero number.
Here is an implication that will be useful many times.
Lemma 5.6.2 (Robinson). If n 7→ xn is an internal function (i.e. its graph is an
internal set) and xn ≃ 0 for all limited n, then there exists an unlimited m such that
xn ≃ 0 for all n ≤ m.
Proof. Consider the set S := {m ∈ ∗ N : (∀n ≤ m)[|xn | < 1/m] }. This is an internal
set (which you should figure out how to check), and contains arbitrarily large integers,
hence contains an infinite integer, m.
Pn
Pn
Lemma 5.6.3. If n is limited and for each i ≤ n, xi ≃ yi , then i=1 xi ≃ i=1 yi .
P
P
P
Proof. | ni=1 xi − ni=1 yi | ≤ ni=1 |xi − yi | ≤ n max{|xi − yi | : i ≤ n} ≃ 0.
5.7. Statements and the Transfer Principle. We are going to be interested in
Theorems/Lemmas/Propositions (TLPs) that have statements of the form (∀x ∈
X)[A(x) ⇒ B(x)] and (∃x ∈ X)[A(x)]. The set X will belong either to V (S) or
to V (∗ S), and statements A(·) can be identified with sets A = {x ∈ X : A(x)}, this
being a set in V (S) or V (∗ S). This means that the first kind of TLP is the statement
A ⊂ B, and the second kind of TLP is the statement X ∩ A 6= ∅.
The transfer principle has a deceptively simple formulation: A ⊂ B in V (S) iff
∗
A ⊂ ∗ B in V (∗ S); and X ∩ A 6= ∅ in V (S) iff ∗ X ∩ ∗ A 6= ∅ in V (∗ S).
Example 5.7.1. Let us examine the statement that subsets of R that are bounded
above have a supremum. The following statement is true in V (S):
(∀A ∈ P(R)[B(A) ⇒ S(A)]
(31)
(∀A ∈ ∗ P(R)[∗ B(A) ⇒ ∗ S(A)],
(32)
(∃B ∈ R)(∀a ∈ A)[a ≤ B].
(33)
where B(A) is the statement “A is bounded above” and S(A) is the statement “A has
a supremum.” If B ⊂ P(R) is the class of bounded sets and S ⊂ P is the class of sets
having a supremum, this is B ⊂ S. The statement in V (∗ S) is
or ∗ B ⊂ ∗ S.
In more detail, B(A) is the statement that
31
The ∗ ’d version is
(∃B ∈ ∗ R)(∀a ∈ A)[a ≤ B]
(34)
where the “∈” and the “≤” maybe ought to have ∗ ’s as well.
Note that the part “(∀A ∈ ∗ P(R)” means that A needs to be an internal set. This
yields another proof that the set of infinitesimals is not internal: the class of infinitesimal is certainly bounded above, e.g. by 1; if s ∈ ∗ R is its supremum, it is either
infinitesimal, in which case 2s is also infinitesimal and s was not the supremum, or s
is not infinitesimal, in which case s/2 is not infinitesimal but is an upper bound for
the set of infinitesimals.
6. Some Real Analysis
Continuity, compactness, uniform continuity, the standard part map, the Theorem
of the Maximum, limit games and limit equilibria, the Riesz representation theorem,
Glicksberg-Fan fixed point theorems, more equilibrium refinement for compact and
continuous games, infinite signaling games and cheap talk theorems, infinite extensive
form games.
6.1. Closed Sets and Closure. Internal sets have a form of the finite intersection
property. Implications of this are many.
6.1.1. The standard part mapping.
6.1.2. Closedness of Refined Sets of Equilibria.
6.2. Continuity and Uniform Continuity.
6.2.1. C(X; R), X compact. f (x) = x2 , x ∈ M versus x ∈ ∗ M .
6.2.2. Near continuity.
6.2.3. The Riemann-Stieltjes Integral.
6.2.4. Some near interval control theory. Include sufficient conditions for near interval
solution to have a continuous standard part [Fleming and Rishel, 1975, Ch. 1] plus
stuff from [Clarke, 2013]. Basically, concavity of U (t, x, ẋ) in its third argument.
6.3. Theorem of the Maximum.
6.3.1. In Control Theory. Re-prove closedness of eq’m sets, continuity of t 7→ U (t, x∗ (t), ẋ∗ (t))
in control theory.
6.3.2. Single person problems.
6.3.3. Limit games and limit equilibria.
6.4. Compactness. Robinson’s theorem.
6.4.1. Existence of optima.
6.4.2. Existence of equilibria.
32
6.4.3. Existence of extened equilibrium outcomes. Material from finitistic games work
goes here.
6.4.4. Compact sets of probabilities on R. dH (Fn , F ) → 0 iff weak convergence. The
robustnik interpretation.
Ulam’s theorem.
Tightness = compactness.
The Gaussian CLT.
The problem of moments and distributions infinitesimally close to the standard
normal.
6.5. Probabilities on Metric Spaces.
6.5.1. Loeb Measures.
6.5.2. Riesz representation theorem.
6.5.3. Denseness of finitely supported probabilities.
6.5.4. Tightness and compactness.
6.6. Derivatives.
6.6.1. Basics.
A. Some exercises with derivatives
and related. Throughout, dx 6= 0.
P∞
n
r
1. For r ∈ R, we define e = n=0 rn! . Show that if m, m′ ∈ ∗ N \ N and r ∈ ∗ R is
Pm′ r n
P
rn
finite, then m
n=0 n! .
n=0 n! ≃
2. Show that if dx ≃ 0, then edx ≃ 1 and (edx − 1)/dx ≃ 1. From this show that
x+dx
x
for any x ∈ R, e dx−e ≃ ex .
n −xn
3. Show that for x ∈ R and dx ≃ 0, (x+dx)
≃ nxn−1 .
dx
4. If f and g are continuously differentiable at 0, g ′ (0) 6= 0, and f (0) = g(0) = 0,
′ (0)
(dx)
(x)
≃ fg(dx)
≃ fg′ (0)
.
then limx→0 fg(x)
√
√
∗
B. Show that
√ if h ∈ √R+ \ R+ , then ( h + 1 − h) ≃ 0. From this conclude that
limx→∞ ( x + 1 − x) = 0.
C. For every r ∈ R, there exists q ∈ ∗ Q such that ◦ q = r. In particular, {◦ q : q ∈
∗
Q, q finite }, is much larger than Q, while {◦ r : r ∈ ∗ R, r finite } = R.
6.6.2. The implicit function theorem. Uses, proof.
6.6.3. Lebesgue’s density theorem. Use st −1 (A) characterization of measurable sets ...
33
6.7. Completions. The next set of problems ask you to push yourself further through
the patterns of “putting ∗ ’s on everything.”
A. n 7→ sn is a Cauchy sequence in R iff ∗ sn ≃ ∗ sm for all n, m ∈ ∗ N \ N.
B. The continuous functions on [0, 1] are denoted C([0, 1]), the metric we use on them
is d∞ (f, g) = maxt∈[0,1] |f (t) − g(t)|.
1. A function f : [0, 1] → R belongs to C([0, 1]) iff for all t1 ≃ t2 ∈ ∗ [0, 1], ∗ f (t1 ) ≃
∗
f (t2 ).
2. If T ∈ ∗ PF ([0, 1]), ∗ dH (T, ∗ [0, 1]) ≃ 0, and t ∈ T solves ∗ maxt∈T ∗ f (t) for f ∈
C([0, 1]), then ◦ t solves maxt∈[0,1] f (t).
3. Suppose that f ∈ C([0, 1]) and that f (0) > 0 > f (1). Using a set T as in the
previous problem, show that f (c) = 0 for some c ∈ (0, 1).
34
6.8. A Duality Approach to Patience. We begin with the overview that will only
make sense if you already know a fair amount of functional analysis. If you are not
such a person, it would be a good come back to this after each of the subsequent
developments.
Let (X, k·k) be a normed vector algebra of R-valued functions containing
the constant function 1, and partially ordered by ≤; let (X† , k · k† ) be
the dual space of X with the associated dual norm and partial order
x† ≥ 0 if hx, x† i ≥ 0 for all x ≥ 0; the “duality approach” begins
with a linear subspace, L ⊂ X, uses this to define a subset DL of nonnegative elements of X† having norm 1 and annihilating L, specifically,
DL = {x† ∈ X† : x† ≥ 0, x† (1) = 1, x† (L) = 0}; with this set, we
define the concave, homogenous of degree 1 utility function UL (x) =
min{hx, x† i : x† ∈ DL }; this automatically has the property that [(x −
y) ∈ L] ⇒ [UL (x) = UL (y)]; alternatively, one can start with a set D,
of non-negative, norm 1 elements of X† , define LD = D⊥ , and observe
that ULD (x) = min{hx, x† i : x† ∈ D}; when X is the set of bounded
sequences of utilities, different choices of L having to do with long run
behavior give different definitions of patience.
6.8.1. Preliminaries. Bounded sequences of utilities, (ut )∞
t=1 belong to ℓ∞ := {u ∈
N
R : (∃B ∈ R+ )(∀n ∈ N)[ |un | ≤ B ]}, and we use the sup-norm, kuk = supt |ut |. This
means that (ℓ∞ , k · k), is the Banach space (Cb (N), k · k). We are interested in the
properties of “patient” and “time invariant” preferences over ℓ∞ . We first gather the
pertinent definitions.
Definition 6.8.1. A (complete transitive) preference relation, %, on ℓ∞ is continuous if it can be represented by a continuous utility function, it is monotonic if
[u ≥ v] ⇒ [u % v], and it respects intertemporal smoothing if for all u ∼ v ∈ ℓ∞
and all α ∈ (0, 1), αu + (1 − α)v % u.
The last condition gives quasi-concavity of the utility function, but the kinds of
preferences we will be looking at can all be represented by a concave function on ℓ∞
that is homogenous of degree 1.
6.8.2. Infinite Patience. There are various ways to approach the idea of a preference
ordering on ℓ∞ being infinitely patient that can be expressed using some useful linear
subspaces, L, of ℓ∞ . The idea will be that [(u − v) ∈ L] ⇒ [u ∼ v], which means that
the larger is L, the more restrictive the condition. In the following, F is mnemonic for
F inite, K is mnemonic for Kronecker, and A is mnemonic for Average:
LF = {u ∈ ℓ∞ : uP
t 6= 0 for only finitely many t};
ut
}; and
LK = {u ∈ ℓ∞ : ∞
t=1 t exists
PT
1
LA = {u ∈ ℓ∞ : limT →∞ T t=1 ut = 0 }.
We will show that LF ( LK ( LA ( LT where LT is as of yet, undefined.
Definition 6.8.2. We have the following kinds of patience:
35
(a) A preference relation is F -patient if [(u − v) ∈ LF ] ⇒ [u ∼ v], that is, if any
changes on any finite set of time indexes leaves the decision maker indifferent.
(b) A more restrictive condition is K-patience, which is [(u − v) ∈ LK ] ⇒ [u ∼ v],
so every F -patient preference relation is K-patient because LF ⊂ LK .
(c) A yet more restrictive condition is A-patience, which is [(u − v) ∈ LA ] ⇒ [u ∼ v],
so every K-patient preference relation is A-patient because LK ⊂ LA .
(d) The most restrictive condition is T -patience, which replaces LA with the as-yetundefined larger linear subspace LT .
Let (X, k · k) be a normed linear space like (ℓ∞ , k · k∞ ) and let (X† , k · k) be the space
of continuous linear functionals on X with the norm kx† k = sup{|x† (x)| : kxk ≤ 1}.
Recall that the kernel of a linear mapping, x† , is the set of x such that x† (x) = 0.
Further, given a subset of L ⊂ X, its “perpendicular complement” is L⊥ = {x† ∈ X† :
x† (L) = 0}, that is, the set of continuous linear functionals that have L as a subset of
their kernel.
We can now say what the larger linear subspace is: LT consists of the u’s in the
kernel of every T ranslation invarient probability (and all we will need to do is to
define translation invariant probabilities). The harder part of the following is the
second inclusion, which is a version of Kronecker’s lemma, the example showing
the inclusion is strict is rather painful.
⊥
⊥
Lemma 6.8.3. LF ( LK ( LA , all three are all linear subspaces, and L⊥
A ( LK ( LF .
Proof. The second part, “all three are all linear subspaces,” is immediate.
LF ( LK : If u ∈ LF , then ut = 0 for all sufficiently large t, hence u ∈ LK . Taking
e.g. ut = 1/t gives an element of LK that does not belong to LF .
LK ( L: : Suppose now that u ∈ LK and pick an arbitrary infinite T ′ . We must
P ′
show that T1′ Tt=1 ∗ ut ≃ 0. Because u ∈ K, we know that for all infinite T ′ > T ,
P T ∗ ut
P T ′ ∗ ut
t=1 t and both are nearstandard. Being infinitely close to each other,
t=1 t ≃
PT ′
Pτ ∗
∗u
1
1
′
t
t=T +1 t ≃ 0. Consider the internal set T := {τ : T ′
t=1 | ut | < τ . Because T is
infinite and u ∈ ℓ∞ , T contains arbitrarily large finite elements, hence by overspill, an
infinite element, call it T .
i
hP
i
hP ′
PT ′ ∗
T ∗
T
1
1
1
∗
(35)
t=1 ut = T ′
t=T +1 ut .
t=1 ut + T ′
T′
Now, the first term is infinitesimal because its absolute value is bounded above by
PT ′
PT ∗
∗u
1
1
t
≃
0.
For
the
second
term,
we
know
that
|
u
|
<
t
′
t=T +1 t ≃ 0 and for
t=1
T
T
1
1
′
each t ∈ {T + 1, . . . , T }, T ′ ≤ t .
To show the inclusion is strict, we will check that (i) ut = min{1, 1/ log(t)} does not
P
P
1
1
≃ ∞ for any infinite T , but (ii) T1 Tt=2 log(t)
≃ 0 for any
belong to LK , i.e. Tt=2 t log(t)
infinite T .
RT 1
(i) Consider the integral 2 x log(x)
dx. Using the change of variable y = log(x), we
R
R log(T )
T
1
dx = log(2) y1 dy, and this is log(log(T )) minus a
have dy = x1 dx so that 2 x log(x)
constant, hence goes to 0 (albeit very slowly).
36
RT 1
dx. The integral has a nasty form, log(log(T ))+
(ii) Consider T1 times the integral 2 log(x)
P∞ log(T )k
)
))
log(T ) + k=2 k k! minus a constant. Now log(T
→ 0 so that log(log(T
→ 0, so all
T
T
1
that we need to show is that T times the last term goes to 0. Ignoring the constant,
we can use l’Hôpital’s rule, which tells us that limT
derivatives top and bottom,
limT
P∞
k=2
log(T )k−1 (1/T )
k!
1
= limT
elog(T ) −(1+log(T ))
T log(T )
P∞
k=2
log(T )k
k k!
T
= limT
is the limit of the
T −1−log(T )
T log(T )
= 0.
(36)
⊥
⊥
Finally, to show that L⊥
A ( LK ( LF , it is sufficient to show that the strict inclusion
relations holds for the closurures, LF ( LK ( LA .
To see that there is an element of LK at distance 1 fromP
every element
P 1 of LF , let A
ut
2
be the infinite set {t : t ∈ N}, set u = 1A , and note that t t = t t2 < ∞.
To see that there is an element of LF at distance 1 from every element P
of LA , let
B = {⌊t · log(t)⌋ : t ∈ N, t ≥ e2 } and set u = 1B . For large enough τ , t≥τ utt ≥
PT
PT
P
1
1
=
∞
while
u
≤
0.9 ∞
t
t=2
t=2 log(t) .
t=τ t log(t)
6.8.3. Continuous Linear Functionals. We will study monotonic preferences that that
can be represented by continuous concave functions on ℓ∞ . Continuous concave functions are the lower envelope of the continuous affine functions that majorize them.
This makes knowing how to represent continuous linear functions a crucial first step.
†
N
PThe dual space of ℓ∞ is denoted ℓ∞ . It is strictly larger than ℓ1 := {p ∈ †R :
each p ∈ ℓ1 defines/is identified with an element of ℓ∞ by
t |pt | < ∞}, where
P
fp (u) = hu, pi = t ut pt .
P
Example 6.8.1. For δ ≃ 1 pick T ∈ ∗ N\N such that (1−δ) Tt=1 δ t−1 ≃ 1 and consider
P
the preference relation represented by the function f (u) = ◦ (1 − δ) Tt=1 ut δ t−1 ≃ 1.
For T ∈ ∗ N \ N, consider the preference relation represented by the function g(u) =
PT
◦1
†
t=1 ut . These are continuous linear functionals on ℓ∞ , i.e. f, g ∈ ℓ∞ , and neither
T
is an element of ℓ1 as can be seen by considering the unit vectors ek = (ut )∞
t=1 with
uk = 1, and ut = 0 for t 6= k. For each k ∈ N, f (ek ) = g(ek ) = 0, but hek , pi = pk ,
and this is equal to 0 for all k ∈ N iff p = 0 in ℓ1 .
All continuous linear functionals on ℓ∞ can be represented as the standard part of
inner products like those in this example.
Theorem 6.8.4. If f ∈ ℓ†∞ , then there exists a star-finite set {1, . . . , T } ⊂ ∗ N and
P
η1 , . . . , ηT with Tt=1 |ηt | limited and f (u) = ◦ h∗ u, ηi.
Proof. Let N denote the class of all subsets of N. For any u ∈ ℓ∞ and any n ∈ N,
consider the two sequences of simple functions,
+∞
+∞
X
X
k
k−1
n
n
1
1u−1 (( k−1
U (t) :=
(t)
:=
(t),
and
U
(37)
k−1
k
−1 ((
u
,
])
, k ]) (t).
n
n
2n 2n
2n 2 n
2
2
k=−∞
k=−∞
n
n
We have U n < u ≤ U and kU − U k ≤
functions is dense in ℓ∞ .
1
2n
37
so that either of these classes of simple
If f ∈ ℓ†∞ , then it is Lipschitz with Lipschitz constant kf k† , hence determined
by its values on any dense subset, hence determined by its values on the class of
simple functions. For any finite set of simple functions, U := {Ua : a = 1, . . . , A},
let PU denote the partition of N generated by sets of the form Ua−1 (r), r ∈ R. By
basic linear
P algebra, there exists
P a finitely supported measure, η = η(U ), on N with
f (Ua ) = t∈N Ua (t)η(t) and t |η(t)| ≤ kf k† . Let U ′ be an exhaustive, ∗ -finite set of
∗
-simple functions, and let η = η(U ′ ).
Another way to understand this theorem is to recall that ℓ∞ = Cb (N), the Riesz
representation theorem tells us that the dual space of Cb (N) is the set of bounded,
finitely additive measures on N, and this result is telling us that such measures have
a star-finite representation.
6.8.4. Star-finitely Supported Probabilities. For our purposes, the probability measures
are enough.
P
Definition 6.8.5. η ∈ ∗ ℓ1 is a star-finite probability (sfp) if η ≥ 0, t ηt ≃ 1,
and {t ∈ ∗ N : ηt > 0} is star-finite. An sfp η is remote η(A) ≃ 0 for all finite A.
Let L(η) denote the Loeb measure generated by η. Given that L(η) is countably
additive, η(A) ≃ 0 for all finite A is equivalent to L(η)(N) = 0. It turns out that the
remote sfp’s are the ones that let us get at F -patience.
Lemma 6.8.6. An sfp η is remote iff for all u, v ∈ ℓ∞ that differ in only finitely many
time periods, h∗ u, ηi ≃ h∗ v, ηi.
Remember, u and v differing in only finitely many time periods is (u − v) ∈ LF .
Proof. For any finite A ⊂ N, let u = 1A and set v = 0 so that ◦ h∗ u, ηi ≃ h∗ v, ηi =
L(η)(A) = 0. Since N is the countable union of finite sets and L(η) is countably
additive, L(η)(N) = 0.
6.8.5. Preferences with a lim inf Representation. Every concave function is the lower
envelope of the affine functions that majorize it, to put it another way, if A is a set
of affine functions and g(u) = inf a∈A a(u), then g(·) is concave. If we majorize using
linear functionals, we get functions that are concave and homogenous of degree 1 (hd1).
The following is the first of our interesting concave hd1 functionals on ℓ∞ .
Theorem 6.8.7. For u ∈ ℓ∞ , lim inf t→∞ ut = inf η∈R ◦ h∗ u, ηi where R is the set of
remote sfp’s.
We will see in just a little bit that we can replace “inf” with “min” in this result.
Proof. Let u = lim inf t→∞ ut . For any remote sfp, ◦ h∗ u, ηi ≥ u, so it is sufficient to
find a remote sfp with ◦ h∗ u, ηi = u.
For each n ∈ N, let Tn be the internal set {t ∈ ∗ N : t ≥ n, ∗ ut < u + n1 } so that
Tn ⊃ Tn−1 and Tn 6= ∅. By the internal extension principle, the mapping n 7→ Tn has
an internal extension to a mapping from ∗ N to ∗ P(N). For that extension, the internal
set N of all n such that Tn ⊃ Tn−1 and Tn 6= ∅ contains all finite n, hence contains
an infinite n′ . Because any t ∈ Tn′ is greater than or equal to n′ , any η ∈ ∆(Tn′ ) is
remote, and for any such η, |h∗ u, ηi − u| ≤ n1′ ≃ 0.
38
An Interpretation: If one values a sequence u by h∗ u, ηi for a remote sfp, then
one cares only about the far future. Using the utility function l(u) = lim inf t→∞ ut to
judge rewards corresponds to judging u by its worst behavior in the far future. This
seems unreasonably pessimistic.
Example 6.8.2. Let u be the sequence that begins with a single +1, then has its next
22 entries being −1, then has a single +1, then has the next 33 entries being −1, then
has a single +1, then has the next 44 entries being −1. This u is overwhelmingly often
negative while v := −u is overwhelmingly often positive, yet l(u) = l(v) = −1.
One way to get around such examples is the pay attention to the long run average
of the payoffs, and we will turn to this very soon.
Another Interpretation: Let ∆f a denote the set of finitely additive probabilities
∗
fa
on the integers. A sub-basis for
R topology on ∆ is given by sets of the
R the weak
fa
form G(µ : u, ǫ) := {ν ∈ ∆ : | N u dν − N u dµ| < ǫ}, u ∈ Cb (N), ǫ > 0. This means
that all weak∗ -open
of finite intersections of such sets, and µα → µ iff
R sets are unions
R
for all u ∈ Cb (N), u dµα → u dµ. The next Theorem and Corollary will show that
l(u) = minµ∈R hu, µi where R = st (R). This kind of preference has a Choquet integral
representation. For E ⊂ N, define c(E) = min{µ(E) : µ ∈ R}.
R
Lemma 6.8.8. For all u ∈ ℓ∞ , l(u) = u dc where the integral is in the sense of
Choquet.
The following is a famous consequence of Alaoglu’s theorem.
Theorem 6.8.9. ∆f a is compact in the weak∗ -topology.
Proof. Let η ∈ ∗ ∆f a . We must show that st (η) ∈ ∆f a . But it is immediate that ν
defined by ν(A) = st (η)(A) belongs to ∆f a . Finally, by linearity and the k·k-denseness
of the simple functions, ν = st (η).
ca
∗
Note that ∆ (X) is not weak compact unless X is a well-behaverd compact space.
Definition 6.8.10. µ ∈ ∆f a is weightless or purely finitely additive if for all
finite A ⊂ N, µ(A) = 0.
Here is the relation between weightless probabilities and the remote sfp’s.
Corollary 6.8.10.1. R = st (R) is a compact convex subset of ∆f a and µ ∈ R iff µ
is weightless.
Proof. Finite intersection property.
6.8.6. Concave F -Patient Preferences. Being the infimum of a collection of linear
functionals on ℓ∞ , the utility function l(u) = lim inf t→∞ ut is concave. In this context, concavity indicates a preference for intertemporal smoothing of utilities, i.e.
l(αu + (1 − α)v)) ≥ αl(u) + (1 − α)l(v) ≥ min{l(u), l(v)}.
We can turn the analysis on its head a bit too: we just showed that having tangents
that are remote sfp’s means that we respect F -patience; being concave and respecting
F -patience means that the tangents are, up to a scalar factor, remote sfp’s.
Theorem 6.8.11. If V : ℓ∞ → R is continuous, monotonic, concave and F -patient,
then for any g ∈ DV(u), g = λf for some λ ≥ 0 and remote sfp f .
39
Proof. Straightforward.
6.8.7. Preferences with a lim inf-Average Representation.
Definition 6.8.12. A remote sfp is of Cesaro type if η1 = · · · = ηT = T1 for
T ∈ ∗N \ T .
P
Theorem 6.8.13. For u ∈ ℓ∞ , L(u) := lim inf T →∞ T1 Tt=1 ut = inf η∈C ◦ h∗ u, ηi where
C is the set of remote sfp’s of Cesaro type.
P k
Proof. Let Tk be a sequence with T1k Tt=1
ut → L(u). Let T be the equivalence class
of the sequence (T1 , T2 , . . .), and let ηT be the uniform distribution on {1, . . . , T }. It
is immediate that h∗ u, ηT i ≃ U , and no η of Cesaro type can have a lower integral
against ∗ u.
It is possible in ℓ∞ , that on average, one is living in good times, but that there will
be infinitely many bad times of arbitrarily long length. One could be more pessimistic
than the L(·) utility function by overwhelmingly paying attention to those longer and
longer stretches of bad news.
Example 6.8.3. Let u have the first 22 entries being +1, next 2 entries being −1,
then next 33 entries being +1, then next 3 entries being −1, the next 44 entries being
+1, the next 4 entries being −1, etc.
6.9. HERE LIE DRAGONS. It is worth bearing in mind the observation (due to
Choquet?) that a Banach space is separable if and only if one can define a strictly
concave R-valued function on it. The space ℓ∞ is not separable, and the concave
functions that we have been looking at have very big flat spots. This is part of the
explanation for what we found in Theorem 6.8.11.
The shift operators on ℓ∞ are defined by
s1 (u1 , u2 , u3 , . . .) = (u2 , u3 , u4 , . . .) and sj+1 (u) = s1 (sj (u)).
(38)
Note that if A ⊂ N then sj (1A ) = 1A⊖j where A ⊖ j := {t − j : t − j ≥ 1, t ∈ A}.
Definition 6.9.1. f ∈ ℓ†∞ is a Banach-Mazur limit if
1. [u ≥ 0] ⇒ [f (u) ≥ 0],
2. (∀u ∈ ℓ∞ )(∀j ∈ N)[f (u) = f (sj (u))], and
3. f (1) = 1 (where 1 = (1, 1, 1, . . .)).
P
Theorem 6.9.2. f is a Banach-Mazur limit iff f (u) = Tt=1 ∗ ut ηt where T ∈ ∗ N \ N,
P
P
η1 , . . . , ηT ≥ 0, Tt=1 ηt ≃ 1 and Tt=1 |ηt − ηt−1 | ≃ 0 (where η0 := 0).
Proof. To be inserted.
1
Notice that the η’s of Example 6.8.1 are uniform, hence have |eta1 − η0 | = T and
for T ≥ t ≥ 2, ηt − ηt−1 = T1 − T1 = 0.
The η’s that appear in Theorem 6.9.2 are exactly the translation invariant probabilities.
Definition 6.9.3. η is translation invariant if for all j ∈ N and all A ⊂ N,
h1∗ A , ηi ≃ hsj (1∗ A ), ηi, that is, if η(∗ A) ≃ η(∗ (A ⊖ j)).
40
One direction of the following is easy, I am pretty sure that the other direction is
true (from some standard results), but this still has the official status of a conjecture.
P
Theorem 6.9.4. An sfp η is translation invariant iff Tt=1 |ηt − ηt−1 | ≃ 0.
P
P
P
Proof. If Tt=1 |ηt −ηt−1 | ≃ 0, then |η(∗ A)−η(∗ (A⊖j)) ≤ t∈A |ηt −ηt−j | ≤ j · t |ηt −
ηt−1 | ≃ 0 so that η is translation invariant.
Now suppose that η is translation
invariant. Let A′ = {2k ∈ N : η(2k − 1) < η(2k)}
P
so that |η(A′ ) − η(A′ ⊖ 1)| = t |ηt − ηt−1 |. Some variant of over/underspill (and the
Loeb space construction?) looking at the sets A′ ∩ {1, . . . , N } should complete the
argument.
Definition 6.9.5. An sfp is of remote shifted Cesaro type if for some j ∈ ∗ N,
ηt = T1 for all t ∈ {j + 1, j + 2, j + 3, . . . , j + T }.
P
Theorem 6.9.6. For u ∈ ℓ∞ , LS (u) := lim inf T →∞ inf j≥1 T1 Tt=1 uj+t = inf η∈Cs ◦ h∗ u, ηi
where CS is the set of remote sfp’s of shifted Cesaro type.
P k
ujk +t → LS (u), let (T, j) be the
Proof. Let (Tk , jk ) be a sequence such that T1k Tt=1
equivalence class of the sequence ((T1 , j1 ), (T2 , j2 ), . . .) and let ηT,j be the uniform
distribution on {j + 1, j + 2, . . . , j + T } so that ◦ h∗ u, ηT,j i = LS (u). There cannot be
a remote shifted Cesaro η with lower integral.
The big result (to be covered about here) is that the set of remote sfp’s of the shifted
Cesaro type are the set of extreme points of the translation invariant probabilities. The
implication: if we know something about what is optimal when the utility function
is given by any one of these, we know what is true for all of the infinitely patient
preferences.
41
6.10. Some Hints. For the problems on Cauchy sequences: A sequence in X is a
function s : N → X, denoted above as n 7→ sn . The ∗ ’d version of a function is what
one uses to think about the values of sn , sm for infinite m, m′ , that is m, m′ ∈ ∗ N \ N.
For a given sequence n 7→ sn , and M ∈ N, define δM = sup{d(xm , xm′ ) : m, m′ ≥ M }.
Note that δM +1 ≤ δM , and that the sequence is Cauchy iff δM ↓ 0.
• Therefore, for arbitrary ǫ ∈ R++ , ∗ {M ∈ NδM < ǫ} contains all infinite m, m′ when
n 7→ sn is Cauchy. This means that for any ǫ > 0 and any pair of infinite integers
m, m′ , d(sm , sm′ ) < ǫ, i.e. d(sm , sm′ ) ≃ 0.
• Now suppose that ∗ sn ≃ ∗ sm for all n, m ∈ ∗ N \ N. For arbitrary ǫ ∈ R++ , the
internal set {M ∈ ∗ N : (∀m, m′ ≥ M )[d(sm , sm′ ) < ǫ] } contains arbitrary small
infinite elements, hence contains finite elements.
For the problems on Continuous functions: By definition, a function f : [0, 1] → R
is continuous at a ∈ [0, 1] if for all sequences xn → a, f (xn ) → f (a).
• Let x = hx1 , x2 , . . .i be the equivalence class of any sequence converging to a, for
any ǫ ∈ R++ , {n ∈ N : d(f (xn ), f (a)) < ǫ} has only a finite complement, hence
d(∗ f (x), f (a)) ≃ 0.
• Since [0, 1] is compact, for any t1 ≃ t2 ∈ ∗ [0, 1], there is a unique a ∈ [0, 1] such that
a = ◦ t1 = ◦ t2 , and d(∗ f (t1 ), f (a)) ≃ 0 and d(∗ f (t2 ), f (a)) ≃ 0.
• If T ∈ ∗ PF ([0, 1]) with dH (T, ∗ [0, 1]) ≃ 0, and f : [0, 1] → R is continuous and
f (0) > 0 > f (1), consider the internal set T++ = {t ∈ T : ∗ f (t) > 0}, set t′ = max T++ .
Let a = ◦ t′ and note that f (a) = 0.
Theorem 6.10.1 (Robinson). A metric space (X, d) is compact iff for every x ∈ ∗ X,
there exists an a ∈ X such that d(a, x) ≃ 0.
Proof. Recall that a metric space (X, d) is compact iff it is both totally bounded and
complete.
⇒: Let x = hx1 , x2 , . . .i ∈ ∗ X with X complete and totally bounded. By total
boundedness, there exists a finite subset F1 = {a1,1 , . . . , a1,M1 } such that for all a ∈ X,
d(a, F1 ) < 211 . Disjointify the finite open cover of X given by {B1/21 (a1,m : m ≤ M1 }
into the sets A1,m . The sets E1,m := {n ∈ N : xn ∈ A1,m } partition N, hence exactly
one, say E1,m1 , of them has µ-mass 1. Let n1 be the first element in E1,m1 .
Disjointify an open cover of A1,m1 by 212 balls and repeat, letting n2 be the first
element of the set of integers E2,m2 that has µ-mass 1.
Continuing gives a Cauchy subsequence xnk , which, by completeness has a limit in
X, call it a. For every k ∈ N, d(a, x) < 21k , hence d(a, x) ≃ 0.
⇐: If X is not totally bounded, then there exists ǫ ∈ R++ such that for every finite
set F , there exists an x ∈ X such that d(x, F ) ≥ ǫ. Let F1 = {x1 }, pick xn+1 such that
d(xn+1 , Fn ) ≥ ǫ, and set Fn+1 = Fn ∪ {xn+1 }. Let x ∈ ∗ X = hx1 , x2 , . . .i. There can be
no a ∈ X that is the standard part of x because this would mean that d(a, x) < ǫ/3,
which would imply that along a subsequence xnk , d(xnk , xnk′ ) < d(xnk , a)+d(a, xnk′ ) <
ǫ/3 + ǫ/3 < ǫ.
If X is not complete, then there exists a Cauchy sequence xn that is not converging
to any a ∈ X. Let x ∈ ∗ X be the equivalence class hx1 , x2 , . . .i. If a ∈ X is the
42
standard part of x, then d(xnk , a) → 0 for some subsequence, but if any subsequence
of a Cauchy sequence converges, the whole sequence converges.
6.11. Related Readings. For infinitesimals and nonstandard analysis more generally, see [Lindstrøm, 1988]. For what we have used so far, see Ch. 11.1 - 11.2 in
[Corbae et al., 2009].
For the game theory material, see also Ch. 8.3 - 8.4 in [Fudenberg and Tirole, 1991].
7. Moderately Elementary Stochastic Process Theory
We’ll begin with a useful dynamic optimization problem with no stochastics, then
turn to the basics of random variables on ∗ -finite probability spaces, then to stochastic
processes, which are collections of random variables indexed by time, in our case, by a
∗
-finite time set with infinitesimal increments. The readings for this part of the course
are Chapters 1 - 8 of [Nelson, 1987].
7.1. Collections of Random Variables. Fix a ∗ -finite probability space (Ω, F, P )
with F = P(Ω) and P ∈ ∗ ∆◦ (Ω).
Definition 7.1.1. A random variable is an element of RΩ , typically denoted X or
Y . If T is an interval, then t 7→ Xt from T to RΩ is a stochastic process.
To put it slightly differently, the set of random variables is RΩ , while a set of random
variables indexed by a time set is a stochastic process.
7.1.1. Some Examples. The first example gives a random variable inducing the uniform
distribution on [0, 1]. This is a good starting point for many reasons.
Example 7.1.1. Ω = {1, . . . , n}, P (A) = #A
. It is often useful to take n = m! for
n
∗
some infinite integer m. Define X(k) ∈ [0, 1] by X(k) = nk . For any 0 ≤ a < b ≤ 1,
P (◦ X ∈ (a, b]) ≃ (b − a), which looks like the uniform distribution.
Comment: if (M, d) is any complete separable metric space and µ is any probability
on M , then there exists a measurable f : [0, 1] → M such that µ(E) = U nif (f −1 (E)).
We will see that this means that this probability space allows us to model all probabilities on all complete separable metric spaces.
Example 7.1.2 (A hyperfinite Brownian motion). Let T = {0, N1 , N2 , . . . , NN−1 , 1} be a
∗
-finite set infinitely close to ∗ [0, 1], i.e. with N an infinite integer. Let Ω = {−1, +1}T
and define P so that the canonical projection mappings projt (ω) := ωt are an i.i.d.
collection with P (ωt = −1) = P (ωt = +1) = 12 . From this, define X(t, ω) as follows:
P
X(0, ω) ≡ 0, X(1, ω) = √1n ω1 , X(2, ω) = √1N (ω1 + ω2 ), . . ., X( Nk , ω) = √1N ki=1 ωi .
This is a random walk model that moves through time in step sizes dt := N1 , and moves
p
up and down ± 1/N .
Comment: if r ∈ (0, 1] and Nk ≃ r, then X( Nk , ·) is the sum of infinitely many
i.i.d. random variables that have been scaled so that Var(X( Nk , ·)) ≃ r, and the oldest
(deMoivre) arguments for the central limit theorem should tell you that X( Nk , ·) is
infinitely close to being a Gaussian distribution. Further for k < k ′ < k ′′ , the random
43
′
′′
′
k
k
, ·) − X( Nk , ·)) and (X( kN , ·) − X( N
, ·)) are independent. If you’ve
increments, (X( N
seen a definition of a Brownian motion, this looks awfully close.
Example 7.1.3 (A hyperfinite Poisson process). Let T = {0, N1 , N2 , . . . , NN−1 , 1} be
a ∗ -finite set infinitely close to ∗ [0, 1] as before. Let Ω′ = {0, 1}T and define Q so
that thet canonical projection mappings projt (ω ′ ) := ωt′ are an i.i.d. collection with
P (ωt′ = 1) = λdt where dt := N1 is the infinitesimal size of the incremental steps in the
time set, and λ is limited and strictly positive. Define Y (0, ω ′ ) ≡ 0, Y ( N1 , ω ′ ) = ω1′ ,
P
Y ( Nk , ω ′ ) = i≤k ωi′ .
Comment: for r ∈ (0, 1] and Nk ≃ 0, Y ( Nk , ·) is infinitely close to having a Poisson(λr)
k′
, ·) − Y ( Nk , ·)) and
distribution. Further, for k < k ′ < k ′′ , the random increments, (Y ( N
′
′′
k
, ·)) are independent. If you’ve seen a definition of a Poisson process,
(Y ( kN , ·) − Y ( N
this looks awfully close.
Example 7.1.4. We can glue the previous two examples as Ω × Ω′ so that P and Q
are independent. After doing that, we can define Z( Nk , (ω, ω ′ )) = X( Nk , ω) + Y ( Nk , ω ′ ).
Comment: the central limit theorem has two parts; the one you are most likely to be
used to is like the X process, composed of infinitely many identical random pieces, all
of them very small, indeed infinitesimal; the other one is like the Y process, it allows
the largest of the infinitely many identical random pieces to have a non-infinitesimal
probability of being far away from 0, that is, not infinitesimal. This is the beginnings
of the study of infinitely divisible distributions and Levy processes.
7.1.2. Time Paths. We are going to be particularly interested in properties of the set
of time paths that arise. In the last three examples, pick an ω, an ω ′ , or a pair (ω, ω ′ ).
The time paths are the functions t 7→ X(t, ω), t 7→ Y (t, ω ′ ) and t 7→ Z(t, (ω, ω ′ )). After
taking care of the fact that T is a strict subset of ∗ [0, 1], we will see that the X paths
are nearstandard in C([0, 1]), and that the Y paths and the Z paths are nearstandard
in D([0, 1]) (the cadlag paths with a version of the Skorohod metric). In order to do
this, we need to understand what continuity looks like when stretched onto a set such
as T , what bounded fluctuations look like, and how infinite sums behave. This last
will get us into the ∗ -finite versions of integration theory.
A big part of the arguments behind these results when we work in V (S) is Ulam’s
theorem: every countably additive probability on a complete separable metric space is
tight, i.e. for all ǫ > 0, there is a compact Kǫ carrying at least mass 1 − ǫ. This means
that understanding probabilities on compact (subsets of) metric spaces is part of the
background we need. Behind that is what is called the Riesz representation theorem,
and we will see a lovely proof of it using nonstandard techniques.
7.2. Monitoring and Likelihood Ratios. We are now going to compare a Poisson
process model of monitoring with different intensities/arrival rates to a Brownian
process model of monitoring with different drifts. In both cases, the information
content of the signals is continuous. In the next subsection, we will consider Brownian
models with different volatilities, and in these models, the information content is
discontinuous.
44
For our ∗ -finite models, we take T = {0, N1 , N2 , . . . , NN−1 , 1} with increments dt =
1
.
N
Example 7.2.1 (Poisson process monitoring). If a high effort is being exerted, then
in any time interval the probability of excellent news arriving is λh dt, if low effort it is
λl dt where λh > λl and both are limited. We have two hypotheses: high effort all the
time versus low effort all the time. Now, suppose that at time Nk with ◦ Nk = r where
r is non-zero, we have observed 0, 1, 2, or . . . m instances of really excellent news.
We are interested in the likelihood ratios, i.e. the factors by which we would update
our prior probability that it was high effort or low effort. Let Ykλ count
P the number of
instances of excellent news if λdt is the probability, that is Y λ = i≤k Xi where the
Xi are iid Bernoulli(λdt) random variables.
!k
k
k
λ
λ
= 1− N
P (Ykλ = 0) = 1 −
≃ e−λr
(39)
N
k
k−1 1
λ
λ
λ
≃ (rλ)e−λr
(40)
P (Yk = 1) = k 1 −
N
N
k−2 2
λ
λ
k!
1
λ
1−
(41)
≃ (rλ)2 e−λr
P (Yk = 2) =
(k − 2)!2!
N
N
2!
k−m m
k!
λ
λ
1
λ
P (Yk = m) =
1−
(rλ)m e−λr .
(42)
≃
(k − m)!m!
N
N
m!
−rλ
This means that the likelihood ratios are ( λλhl )m+1 ee−rλhl if Yk = m. This has two
interesting properties: first, the more events happen by time r, the more information
there is in the signal; second, for a given number of events, the larger is r, the less
information there is. It also means that the information content of the signal is both
k′
limited and is a continuous function of time, that is, Nk ≃ N
implies that the likelihood
ratios are infinitely close to each other and imperfectly informative. To put it another
way, the contrast between observing in nonstandard time and observing in standard
time is infinitesimal, there is, for all practical purposes, more or less the same amount
of information to be had observing the nonstandard details of the paths as there is in
observing the standard parts of the path.
To get at Brownian monitoring as in Example 7.1.2 above, we change the distribution of the iid Xt so that at any non-infinitesimal τ := ◦ Nk , the distribution is
≃ N (τ · µ, τ ), which is called a Brownian motion with a drift µ. The contrast between
the nonstandard and the standard is again infinitesimal. Here, if we are trying to
distinguish (say) a positive drift µ > 0 from a 0 drift, the larger the number of upward
jumps by time τ = ◦ Nk is a stronger signal for the positive drift, and the larger the
drift or the longer the time interval, the more information we have.
Example 7.2.2 (Brownian monitoring: I). Suppose that high effort leads to a positive
drift µ > 0 and low effort leads to a drift of 0. If the Xt are iid with P (Xt = 1) =
1
(1 + γ) and P (Xt = −1) = 21 (1 − γ), then E Xt = γ so that the average slope of the
2
√
√
√
process is γ dt/dt = γ/ dt so that γµ = µ dt delivers a Brownian drift of µ and a
variance k( N1 − r2 N12 ) ≃ τ := ◦ Nk at time point Nk .
45
P
A sufficient statistic for the value of the process at Nk is n≤k It where In = 1Xn >0 .
This is counting the number of upward movements in k steps. To compare the hypothesis that the drift is 0 to the hypothesis that it is µ, we are comparing two binomial distributions, Bin(k, 12 ) versus Bin (k, 21 (1 + γµ )). The difference in means is kγµ = kµ √1N ,
√
p
√
the standard deviation is 2k , and the ratio of the two is 2µ k/N ≃ 2µ τ where
τ := ◦ Nk .
Brownian monitoring in V (S) has the property that at a time r > 0, we are comparing a N (rµ, r) to a N (0, r), and if we observe that the process is equal to x at time
r, then the likelihood ratio is
(x−r)2
κ e− 2r
x2
κ
e− 2r
1
= e− 2r [x
2 −(x−r)2 ]
1
r
= e 2r [r(2x−r)] = ex− 2
(43)
where κ is the constant that makes the densities integrate to 1. This is continuous in
x and r, and is less than or greater than 1 as x is less than or greater than 2r . Given
the symmetry of the Gaussian density, this makes sense — 0 is the mean if effort is
low, r the mean if it is high, 2r is half way between the two. Therefore, observations
below 2r tend to support the hypothesis that effort is low, while observations above 2r
tend to support the opposite hypothesis.
7.3. Monitoring Volatilities. The contrast with the case that changes in effort lead
to changes in the Brownian motion volatility is extreme — here there is an infinite
amount of information in any time interval [0, ǫ) if ǫ > 0 is non-infinitesimal. We
first go through how this works in V (S), then give two different ∗ -finite models with
the same property. A crucial piece of knowledge from what follows is that the fourth
moment of a standard normal distribution is 3σ 4 .
Example 7.3.1 (Brownian monitoring: II). Working in V (S), suppose that we have a
usual Brownian motion, at every t ∈ [0, 1], ξt ∼ N (0, t), or we have a Brownian motion
with a different volatility, ξtr ∼ N (0, rt) where r 6= 1. Divide the interval [0, ǫ) into n
equally sized pieces, and compute the squares of the increments Y1 := (ξ1·ǫ/n −ξ0 )2 , Y2 :=
(ξ2·ǫ/n − ξ1·ǫ/n )2 , . . . , Yk := (ξk·ǫ/n − ξ(k−1)·ǫ/n )2 as well as the squares of the increments
r
r
Ykr := (ξk·ǫ/n
− ξ(k−1)·ǫ/n
)2 , k = 1, . . . , n. Under the standard Brownian motion, the
P
2
average of the Yk , n1 k≤n Yk , is nǫ 2 , with the changed volatility, the average of the Ykr is
P
4
r 2 ǫ2
, and the difference is (1 − r2 )ǫ2 n12 . The variance of n1 k≤n Yk is n12 n3 nǫ 4 = 3ǫ2 n15 .
n2
The ratio of the difference in means to the standard deviation is
√
2
(r2 −1)ǫ2 (1/n2 )
√
= ǫ r√−1
n →n↑∞ ∞.
(44)
3ǫ(1/n2.5 )
3
In the previous example, for fixed ǫ > 0, all that matters is the number of times
we subdivide the interval. So, if we take the continuity of the time set at which we
can choose to make observations seriously enough, that is, if we believe that we can
subdivide time arbitarily finely, we are certain of any change in drift by any ǫ > 0.4
4For
those who have seen the notation, the change of volatility event belongs to F0+ where
(Ft )t∈[0,1] is the Brownian filtration.
46
A different summary of the previous example is that if W ∈ ∆(C([0, 1])) is the probability measure of the standard Brownian motion and W r is the probability measure
of the Brownian motion with a different drift, then there exists a set of continuous
function, E, with the property that W (E) = 1 and W r (E) = 0. See [Kakutani, 1948]
for the grand-daddy of such results.
Example 7.3.2 (Brownian monitoring: III). One way to p
change the p
volality of the
∗
-finite Brownian walk is to change the increments from ± 1/N to ± r/N , r 6≃ 1.
This means that after one observation, at time N1 , the change in volatility is revealed.
p
Here the set of
time
paths
changes
from
those
with
increments
±
1/N to those with
p
increments ± r/N , and the disjointness of the supports is built in from the first step.
An alternative, ternary random walk does not build in a different set of time paths
from the first step, but still has, as it must, the property
learns of a change
p that one
1
s
in volatility infinitely fast. Suppose that P (Xt = ± 1/N ) = 2 (1 − s) and P (Xts =
p
0) = s, while P (Xtr = ± 1/N ) = 12 (1 − γ) and P (Xtr = 0) = γ. The volatility of
P
Xts is given by Var(ξ s (k/N, ·)) ≃ (1 − s)◦ Nk , while the volatility
ξ s (k/N, ·) :=
t≤k
P
r
r
◦k
of ξ r (k/N, ·) :=
t≤k Xt is given by Var(ξ (k/N, ·)) ≃ (1 − r) N . Supposing that
r 6≃ s, i.e. that the volatilities are not infinitesimally close to each other, the sufficient
statistic for the difference between ξ r (k/n, ·) and ξ s (k/N, ·) is the number of size-0
increments by time k. This is revealed by any k ≃ ∞, and if Nk ≃ 0 at the same time,
then we have learned the volatility difference before any standard time has elapsed.
7.4. A Brief Detour Through Queueing Theory. In studying queues, the nonhomogenous Poisson process has a central place. This involves the arrival rate, λ,
varying with time. In our context, this would correspond to non-constant effort.
Example 7.4.1 (Non-constant effort). Suppose that effort at each t ∈ T yields a
probability λ(t)dt of therePbeing excellent news in the t’th interval. We can show that
1
Πi≤k
(1 − λ(i)/N
) ≃ e N i≤k λ(i) and, provided that t 7→ λ(t) is fairly well-behaved,
R
P
r◦
1
i≤k λ(i) ≃ 0 λ(x) dx.
N
7.5. Expectations, Norms, Inequalities. Recall that we have in mind a ∗ -finite
probability space (Ω, F,
PP ) with P strictly positive.
For A ∈ F, P (A) = ω∈A P (ω). P
For a random variable X, E X := ω X(ω)P (ω).
7.5.1. Expectations of some classic functions. Bilinear forms: for random variables X
and Y , E XY = E Y X, and E XX > 0 unless X = 0. The norm of X is defined as
√
E XX and denote kXk2 .
The class of constant random variables is a linear subspace of RΩ . The mapping
X 7→ E X is orthogonal projection onto that subspace, and εX := X − E X to the
projection.
For p
random variables X and Y : Var(X) := E (X − E X)2 = E (εX )2 is the variance
of X; Var(X) is the standard deviation of X; Cov(X, Y ) := E (X − E X)(Y − E Y )
√ )
is the covariance of X and Y ; and ρX,Y := √ Cov(X,Y
is the correlation, also
Var(X)
Var(Y )
known as the cosine of the angle between the vectors X and Y .
47
7.5.2. Some norms. The Lp -norms, p ∈ [1, ∞) are kXkp := (E |X|p )1/p and kXk∞ :=
maxω∈Ω |X(ω)|. Recall Jensen’s inequality, for any convex f : R → R,
f (Σω X(ω)P (ω)) ≤ Σω f (X(ω))P (ω), equivalently
f (E X) ≤ E f (X),
(45)
(46)
provable from the definition of convexity and induction.
Lemma 7.5.1. For all random variables X and ∞ ≥ p > q ≥ 1, kXkp ≥ kXkq .
Proof. The case p = ∞ is immediate, so we suppose that ∞ > p.
Suppose first that p > q = 1, from Jensen’s inequality using the convex function
f (r) = rp for r ≥ 0 on the random variable |X|, we have (E |X|)p ≤ E |X|p . Taking
p’th roots on both sides, E |X| ≤ (E |X|p )1/p = kXkp .
For the last case, p > q ≥ 1, using the convex function f (r) = rp/q on the random
variable |X|q , we have (E |X|q )p/q ≤ E(|X|q )p/q = E |X|p . Taking p’th roots on both
sides, (E |X|q )1/q ≤ (E |X|p )1/p , that is, kXkp ≥ kXkq .
This means that p 7→ kXkp is an increasing function of p ∈ [1, ∞), strictly increasing unless X is a constant random variable. We know that all bounded monotonic functions on subsets on R have a supremum. We now ask what that supremum
is. Let ω0 solve the problem maxω |X(ω)|. Because kXkp ≥ (|X(ω0 )|p P (ω0 ))1/p =
kXk∞ P (ω0 )1/p and P (ω0 )1/p ↑ 1 as p ↑ ∞, we have limp↑∞ kXkp = kXk∞ , hence
limp↑∞ kXkp = kXk∞ .
7.5.3. √
The triangle inequality for norms. Recall that for√vectors x, y ∈ Rℓ ,√xy =
√
√
cos(θ) xx yy. From this, one can conclude that |xy| ≤ xx yy,
p and that xx =
√
max
√ {xy
√ : yy = 1}. From this, we find for any vectors r, s, (r + s)(r + s) ≤
rr + ss. This is the basis of the triangle inequality — take r = x−y and s = y −z
and find that the distance between x andp
z is less than the sum
p of the distances
between
x and y and between y and z, (x − z)(x − z) ≤
(x − y)(x − y) +
p
(y − z)(y − z). We are after the same triangle inequality result for the k · kp norms.
It is called Minkowski’s inequality. The starting point is the following, notice the part
where the conditions for equality versus strict inequality appear.
Lemma 7.5.2 (Hölder). For any random variables X, Y , and any p ∈ (1, ∞), if
1
+ 1q = 1, then |E XY | ≤ kXkp kY kq .
p
Proof. If X = 0 or Y = 0, the inequality is satisfied. For the other cases, we can divide
each X(ω) by κx := kXkp and each Y (ω) by κy := kY kq . After we have done that, the
left-hand side is also divided by κx κy , and we have
that
P to showing
P reducedpthe problem
P
q
ω |Y (ω)| P (ω) =
ω |X(ω)| P (ω) =
ω |X(ω)Y (ω)|P (ω) ≤ 1 when we know that
1. Here is an odd-looking observation that will make the argument go, p1 + 1q = 1 so
that we need only show that
Σω |X(ω)Y (ω)|P (ω) ≤ p1 Σω |X(ω)|p P (ω) + 1q Σω |Y (ω)|q P (ω).
(47)
Since the logarithm strictly concave, for any non-zero pair X(ω), Y (ω), we have
(48)
log p1 |X(ω)|p + 1q |Y (ω)|q ≥ p1 log (|X(ω)|p ) + 1q log (|Y (ω)|q )
48
with equality iff |X(ω)|p = |Y (ω)|q . Now, p1 log(|X(ω)|p )+ 1q log(|Y (ω)|q ) = log(|X(ω)Y (ω)|),
so we have
log p1 |X(ω)|p + 1q |Y (ω)|q ≥ log (|X(ω)Y (ω)|) .
(49)
Since the logarithm is strictly monotonic, this means that p1 |X(ω)|p + 1q |Y (ω)|q ≥
|X(ω)Y (ω)|. Taking the probability weighted convex combination of these inequalities
yields what we were after,
1
Σ |X(ω)|p P (ω)
p ω
+ 1q Σω |Y (ω)|q P (ω) ≥ Σω |X(ω)Y (ω)|P (ω)
(50)
because now the inequality holds even if X(ω) = 0 or Y (ω) = 0.
Let us return to the part of the proof where we said that we P
have “equality iff
|X(ω)|p = |Y (ω)|q .” In more detail, what we showed is that
i |X(ω)Y (ω)| ≤
kXkp kY kq with equality when, for each i, we have X(ω) = sgn (Y (ω))|Y (ω)|q/p . Combining yields the following.
Lemma 7.5.3. For each X ∈ Rℓ , kXkp = maxkY kq =1 E XY .
From which we have the triangle inequality for the k · kp -norms.
Lemma 7.5.4 (Minkowski). For any R, S ∈ Rℓ and any p ∈ (1, ∞), kR + Skp ≤
kRkp + kSkp .
Proof. Same logic as the vector case.
7.5.4. The Markov/Chebyshev Inequality. For X ≥ 0 and r > 0, X ≥ r1X≥r for every
ω, hence E X ≥ rE 1X≥r , turning it around we have Markov’s inequality,
P (X ≥ r) ≤ 1r E X.
For any random variable X, r > 0, and p > 0, {|X| > r} = {|X|p > rp } so that
P (|X| ≥ r) ≤
1
E
rp
|X|p .
(51)
(52)
Often, the follow variant of this is called Chebyshev’s inequality,
P (|X − E X| ≥ r) ≤
1
r2
Var(X).
(53)
7.6. Vector Algebras of RandomPVariables. Recall for a random variable X
1
1
and non-empty A, E (X|A) = P (A)
ω∈A X(ω)P (ω) = P (A) EX · 1A . When A =
{A1 , . . . , AK } is a partition of Ω, E (X|A) is, by definition, the random variable
P
k E (X|Ak ) · 1Ak (ω). Letting X = 1B , we have E (X|A) = P (B|A) and P (B|A) =
E (1B |A). This is another random variable.
The trick with vector algebras of functions is that they always take the form
span {1Ak : k = 1, . . . K} where {A1 , . . . , AK } is a partition of Ω. This will mean
that conditional expectations are orthogonal projections.
We are going to abuse notation and also use A ⊂ RΩ to be a vector algebra (of
functions). Contrary to some usages, we will always assume that our vector algebras
contain the constant functions.
Definition 7.6.1. A ⊂ RΩ is a vector algebra if for all X, Y ∈ A and all α, β ∈ R,
a. α · 1Ω ∈ A,
49
b. αX + βY ∈ A, and
c. XY ∈ A.
Atoms are, historically, the indissolubly small objects.
Definition 7.6.2. An atom of a vector algebra A is a maximal event on which all
elements of A are constant.
For any atom A and ω 6∈ A, let Y ∈ A have the property that Y (ω) 6= Y (A), define
Xω (·) =
Y (·)−Y (ω)
.
Y (A)−Y (ω)
Note that Xω (ω) = 0, Xω (A) = 1, and Xω ∈ A. Now consider the function
R(·) = Πω6∈A Xω (·).
(54)
(55)
What we have is that R = 1A . That is the hard part of the argument behind the
following.
Lemma 7.6.3. If A is a vector algebra and {A1 , . . . , AK } is its collection of atoms,
then A = span ({1Ak : k = 1, . . . , K}).
The following is a nearly immediate corollary.
Lemma 7.6.4. The mapping X 7→ E (X|A) is orthogonal projection.
7.6.1. Algebras and Information. Often, A will represent available information for a
decision problem. Specifically, if one ‘knows’ A = span ({1Ak : k = 1, . . . , K}), then
one solves the maximization problem
Z X
max
u(ak , ω)1Ak (ω) dP (ω).
(56)
a1 ,...,aK
k
The bridge-crossing logic is at work here, a∗ = (a∗1 , . . . , a∗K ) solves this problem iff each
a∗k solves
Z
max
a
u(a, ω) dP (ω|Ak ).
(57)
The larger is A, the smaller are the atoms, and the more functions ω 7→ a(ω) =
(a1 (ω), . . . , aK (ω)) one can choose.
7.7. Adapted Stochastic Processes. The time set, T = {t0 < t1 < t2 < · · · < tN }
is a member of ∗ PF (R) with the property that (tn+1 − tn ) ≃ 0 for n = 0, . . . , N − 1.
We often take t1 = 0 and tN = 1, another frequent option has tN unlimited.
Definition 7.7.1. A filtration is an increasing mapping t 7→ At from T to vector
algebras in RΩ . A stochastic process ξ : T → RΩ is adapted to the filtration (At )t∈T
if for each t ∈ T , ξ(t, ·) ∈ At .
It is often useful to think of stochastic process as being in the form ξ : T × Ω → R:
time paths are functions t 7→ ξ(t, ω), ω ∈ Ω; picking ω according to P gives us the
random time path ξ(·, ω). Given a stochastic process ξ : T × Ω → R, the canonical
filtration associated with ξ has At defined as the smallest vector algebra of functions
containing {ξ(s, ·) : s ≤ t}.
50
8. Some Convergence Results in Probability Theory
8.1. The Weak Law of Large Numbers (WLLN). The weak law says that if we
are averaging over a large number of independent/uncorrelated random variables, then
it is “very likely” that the average is near the mean. The strong laws says that it is
“very likely” that the average settles down at the mean.
8.1.1. The easiest WLLN. Here it is.
Theorem 8.1.1 (WLLN: I). If {ξ(n, ·) : n ∈ {1, . . . , N } is an iid collection of random
variables with mean 0 and variance
1, then for any non-infinitesimal r > 0, for any
P
unlimited n ≤ N , P ({ω : n1 i≤n ξ(n, ω) < r}) ≃ 1.
P
Proof. Var( n1 i≤n ξ(n, ω)) = n12 · n = n1 . By Chebyshev’s inequality,
P
(58)
P ({ω : n1 i≤n ξ(n, ω) ≥ r}) ≤ r21n ≃ 0.
Notation: let us replace ξ(n, ·) with ξn and stop being so finicky about talking about
the set of ω such that something is true.
Some easy extensions:
P
a. If r is an infinitesimal with r2 n infinite, then P ( n1 i≤n ξn < r) ≃ 1.
b. If the ξn are not iid but merely a collection of uncorrelated random variables with
mean 0 and variance 1, the same conclusion holds.
8.1.2. Triangular arrays. The immediate impulse is to compare Theorem 8.1.1 to the
following WLLN argument in V (S):
P if X1 , X21, . . . is an iid sequence of mean
P0 variance
1
1
1 random variables, then Var( n i≤n Xi ) = n so that for any r > 0, P (| n i≤n Xi | ≥
r) ≤ r21n → 0. In fact, we have proved something much more general, formulated in
terms of triangular arrays of random variables.
The unlimited N in {1, . . . , N } is N = hN1 , N2 , N3 , . . .i. This means that
{1, . . . , N } = h{1, . . . , N1 }, {1, . . . , N2 }, . . .i.
(59)
This in turn means that {ξ1 , . . . , ξN } is the equivalence class of a sequence of sets of
random variables arrayed in the following triangular fashion,
S1 = {ξ1,1 , . . . , ξ1,N1 }
(60)
S2 = {ξ2,1 , . . . , ξ2,N1 , . . . , ξ2,N2 }
(61)
S3 = {ξ3,1 , . . . , ξ3,N1 , . . . , ξ3,N2 , . . . , ξ3,N3 }
..
.
(62)
Sk = {ξk,1 , . . . , ξk,N1 , . . . , ξk,N2 , . . . , ξk,N3 , . . . , ξk,Nk }
..
.
(63)
The random variables in the set S1 are defined on (Ω1 , F1 , P1 ) and have law L1 , the
random variables in the set S2 are defined on (Ω2 , F2 , P2 ) and have law L2 , the random
variables in the set Sk are defined on (Ωk , Fk , Pk ) and have law Lk . The random
variables ξk,n , n = 1, . . . , Nk are iid with law Lk , but Lk need not equal Lk′ , and the
random variables ξk,n need not be independent of ξk′ ,n′ .
51
One can go a bit further in allowing the distributions to be different across the sets.
The proof of the following is also directly from Chebyshev’s inequality.
Theorem 8.1.2 (WLLN: II). If {ξn : n ∈ {1, . . . , NP
} is an uncorrelated set of random
variables with Var(ξn ) = σn2 , then, defining sn = i≤n σi2 , for all n ≤ N with sn is
P
unlimited, there exists an infinitesimal r > 0 such that P (| s1n i≤n (ξi −E ξi )| < r) ≃ 1.
P
P
Defining µn = i≤n E ξi , this can be re-written as P (| s1n (( i≤n ξi ) − µn )| < r) ≃ 1.
P
8.2. Almost Everywhere. Defining An (ω) = n1 i≤n ξi . What the WLLN says is
P
that, provided i≤n σi2 is unlimited, P (An ≃ 0) ≃ 1. This does not imply that the
sequence A1 (ω), A2 (ω), . . . , AN (ω) is near converging to 0 for all of, or even most of,
the ω ∈ Ω. However, this is what the Strong Law of Large Numbers (SLLN) says, and
this is what we are building toward.
Let us say that a property A(ω) holds almost everywhere (a.e.) if for every
non-infinitesimal ǫ > 0, there a set N with P (N ) < ǫ, and A(ω) holds for all ω ∈ N c .
(The capital “N” in the set N is a mnemonic for “null set.”) The reason to formulate
things this way is that we will often have the null set, N ′ , of exceptions to the property
we care about not being an internal set. However, it will contained in a larger internal
set N with P (N ) ≃ 0, or for any non-infinitesimal ǫ > 0, it will be contained in an
internal set Nǫ with P (Nǫ ) < ǫ. The simplicity of the following should not lead you to
underestimate it.
Theorem 8.2.1. For any random variable X ∈ RΩ , the following are equivalent:
(a) X ≃ 0 a.e.,
(b) for any non-infinitesimal r > 0, P (|X| ≥ r) ≃ 0, and
(c) for some infinitesimal r > 0, P (|X| ≥ r) ≃ 0.
Proof. If X ≃ 0, then for any non-infinitesimal ǫ > 0, there exists N with P (N ) < ǫ
such that X(ω) ≃ 0 for all ω 6∈ N . Therefore, for any non-infinitesimal r > 0,
{|X| ≥ r} ⊂ N . This being true for any non-infinitesimal ǫ > 0, P (|X| ≥ r) ≃ 0.
If for any non-infinitesimal r > 0, P (|X| ≥ r) ≃ 0, then the set of r such that
P (|X| ≥ r) ≤ r contains arbitrarily small non-infinitesimal elements, hence contains
an infinitesimal element.
If for some infinitesimal r > 0, P (|X| ≥ r) ≃ 0, then N c := {ω : |X(ω)| ≤ r} has
P (N c ) ≃ 1.
8.3. Converging
Almost Everywhere. We are after a statement that the averages
P
1
An (ω) := n i≤n ξi converge to 0 almost everywhere when the ξi have mean 0. The
following tells us that knowing about the probabilities of the maxima of collections of
random variables is going to matter a huge amount.
Theorem 8.3.1. The random variables X1 , . . . , XN converge to 0 almost every iff for
all non-infinitesimal r > 0 and all unlimited n ≤ N , P (maxi∈{n,N } |Xi | ≥ r) ≃ 0.
To understand what is going on back in V (S), N = hN1 , N2 , N3 , . . .i and n =
hn1 , n2 , n3 , . . .i with both sequences, Nk and nk , going to ∞, though with nk ≤ Nk .
52
Going back to the triangular array, we are looking at, in the k’th row of (63), at the
comparison of
Xk,1 , . . . ,Xk,nk and
(64)
Xk,1 , . . . ,Xk,nk , . . . Xk,Nk .
(65)
One way to say that the random sequence Xnk has “settled down” is to say that the
stuff between Xk,nk and Xk,Nk contributes nothing but infinitesimals.
Proof of Theorem 8.3.1. Define M (n, r) = {ω : maxi∈{n,N } |Xi (ω)| ≥ r}.
Suppose that X1 , . . . , XN converges to 0 a.e. Pick non-infinitesimal r > 0 and
ǫ > 0. There exist N such that P (N ) < ǫ and for all ω ∈ N c , X1 (ω), . . . , XN (ω)
converges to 0. By the definition of convergence, Xn ≃ 0 for all unlimited n. Therefore,
M (n, r) ⊂ N so that P (M (n, r)) < ǫ. Since ǫ was arbitrary, P (M (n, r)) ≃ 0, i.e.
P (maxi∈{n,N } |Xi | ≥ r) ≃ 0.
Now suppose that P (M (n, r)) ≃ 0 for all non-infinitesimal r > 0 and all unlimited
n ≤ N . Pick arbitrary non-infinitesimal ǫ > 0. For j ∈ ∗ N, define nj ∈ ∗ N as the
smallest integer such that
P (M (nj , 1j )) ≤
ǫ
.
2j
(66)
If j is limited, then overspill tells us that nj must be finite. Let N = ∪j∈∗ N M (nj , 1j ) so
that P (N ) ≤ ǫ. If ω ∈ N c , then for any unlimited n ≤ N , |Xn (ω)| ≤ 1j for any limited
j ∈ N, hence |Xn (ω)| ≃ 0.
8.4. Weak Laws Versus Strong Laws. The reason the argument for the weak law
does not deliver the strong law is that we could have P (|Ai | ≥ r) ≃ 0 for all unlimited
i without having the probability that the maximum is above r being really small. The
Poisson case is an example.
Example 8.4.1. Let X1 , . . . , XN be iid with P (Xn = 1) = λdt and P (Xn = 0) =
1 − λdt where dt = N1 ≃ 0. For each i ∈ {1, . . . , N } and any non-infinitesimal r > 0,
P (|Xi | ≥ r) ≃ 0, but P (maxi∈{N/2,N } |Xi | ≥ 1) is not infinitesimal because it is the
probability that a Poisson λ/2 is non-zero.
8.4.1. The Maximum of a Sum. Kolmogorov’s inequality talks about the probability
that the maximum of a sum of independent random variables is large. This will make
possible some uses of Theorem 8.3.1.
Theorem 8.4.1 (Kolmogorov’s inequality). If X1 , . . . , Xn is a P
collection for independent mean 0 random variables with finite variance and Sk := i≤k Xi , then for any
r > 0,
P (max1≤k≤n |Sk | ≥ r) ≤
1
r2
Var(Sn ).
(67)
This should be compared to Chebyshev’s inequality, P (|Sn | ≥ r) ≤ r12 Var(Sn ). We
expect that max1≤k≤n |Sk | ≥ |Sn |, so one version of what this is telling us is that if
max1≤k≤n |Sk | is large, then |Sn | is also likely to be large.
53
Proof of Kolomogorov’s inequality. Define τ as the minimum of the k such that |Sk | ≥
r, so that Ak := {τ = k}, k = 1, . . . , n, partitions the event {max1≤k≤n |Sk | ≥ r}.
XZ
2
E Sn ≥
Sn2 dP
(68)
Ak
k
=
XZ
k
=
XZ
k
≥
Ak
Ak
XZ
k
Ak
[Sk + (Sn − Sk )]2 dP
Sk2 + (Sn − Sk )2 + 2Sk (Sn − Sk ) dP
Sk2 + 2Sk (Sn − Sk ) dP.
Note that Ak and Sk depend on X1 , . . . , Xk , while (Sn − Sk ) depends on Xk+1 , . . . , Xn
and has mean 0. The two sets of random variables are independent. Therefore,
E 1Ak Sk (Sn − Sk ) = E (E (1Ak Sk (Sn − Sk ) X1 , . . . , Xk )) = E 1Ak Sk · 0 = 0.
Using the fact that for every ω ∈ Ak , Sk2 (ω) ≥ r2 , this yields
XZ
X
2
E Sn ≥
Sk2 dP ≥ r2
P (Ak )
k
= r2 P
Ak
(69)
(70)
k
max |Sk | ≥ r .
1≤k≤n
P
8.4.2. Sums with Random Signs. We know that for any unlimited N , N
n=1
limited. Compare that result to the following consequence of Theorem 8.3.1.
1
n
is un-
Corollary 8.4.1.1. P
If R1 , . . . , RN is an iid sequence with P (Rn = +1) = P (Rn =
−1) = 21 , then Yn := i≤n Ri 1i , n = 1, . . . , N , converges a.e. to an a.e. limited random
variable Y .
Proof. Define Y = YN so that Var(Y ) ≃ π 2 /6. Having a finite variance, the probability
that |Y | is unlimited is infinitesimal, i.e. Y is a.e. limited. We want to show is that
Xn := YN − Yn−1 converges to 0 a.e. It is sufficient to show that P (maxi∈{n,N } |Xi | ≥
P
1
r) ≃ 0 for all unlimited n and non-infinitesimal r > 0. Each Xn is equal to N
i=n Ri i
so we are after the maximum of a set of sums of independent mean 0 random variables
with finite variances. To make it fit exactly, we reverse the order of summation,
define
P
′
Y1′ = RN −1 N 1−1 , Y2′ = RN −2 N 1−2 , . . ., Yk′ = RN −k N 1−k and Sk′ =
Y
for
k =
i≤k i
1
′
′
1, . . . (N − n). From Kolmogorov’s inequality, P (max1≤k≤n |Sk | ≥ r) ≤ r2 Var(SN −n ).
PN 1
′
Now, Var(SN
−n ) =
i=n i2 ≃ 0.
P
|xn | unlimited and
PThis gives us a huge number of examples of sequences xn with
xn limited. One can show that Y has full support, which givesPus the result that
for any limited x ∈ R, there is a sequence of ± signs rn such that n≤N rn n1 → x.
54
8.4.3. A First Strong Law.
Theorem 8.4.2 (Strong Law: I). If X1 , . . . , XN is a sequence of mean 0 random
PN Var(Xn )
P
variables, N unlimited, and
then AN := N1 n≤N Xn converges to 0
n=1
n2
almost everywhere.
Proof. We first show that AN ≃ 0 almost everywhere, that is, P (|AN | > r) ≃
√ 0:
since N is unlimited, so is n′ defined as the largest integer less than or equal to N ;
P
Var(Xk )
≃ 0; therefore AN can be expressed as the sum of two
this means that N
k=n′
k2
Pn′ −1
n′ −1
Xk and
:= N1 k=1
independent random variables with infinitesimal variance, AN
PN
n′ +
1
AN := N k=n Xk .
Pick unlimited n ≤ N , and for m ∈ {n, . . . N }, define Ym = AN − Am so that
P
Pm−1
Ym = ( N1 − m1 ) k=1
(71)
Xk + N1 N
k=m Xk .
The first term and the second term converge to 0.
The usual proof uses the Borel-Cantelli lemmas to show that for all unlimited n ≤ N
and non-infinitesimal r > 0, P (maxm∈{n,...,N } |Am | > r) ≃ 0. They are useful also in
understanding how much experimentation there must be in learning models so as to get
past the Fudenberg and Levine self-confirming equilibria [Fudenberg and Levine, 1993].
8.5. The Borel-Cantelli Lemmas. In V (S) the Borel-Cantelli lemma has the following form. If An is a sequence of events, define [An i.o.] = ∩N ∪n≥N An and
[An a.a.] = ∪N ∩n≥N An . Note that ([An i.o.])c = [Acn a.a.] and ([An a.a.])c = [Acn i.o.].
P
Lemma 8.5.1 (First Borel-Cantelli). If An is a sequence of events and n∈N P (An ) <
∞, then P ([An i.o.]) = 0.
P
Proof. P ([An i.o.]) ≤ n≥N P (An ) ↓ 0.
Lemma
8.5.2 (Second Borel-Cantelli). If An is a sequence of independent events and
P
n∈N P (An ) = ∞, then P ([An i.o.]) = 1.
Proof. ([An i.o.])c = [Acn a.a.] ⊂ ∩n≥N Acn so it is sufficient to show that P (∩n≥N Acn ) = 0
for any N . Using (1 − x) ≤ e−x ,
PN +j
N +j
N +j −P (Ak )
+j c
c
= e− k=n P (Ak ) .
(72)
A
P ∩N
k=N k = Πk=N (1 − P (Ak )) ≤ Πk=N e
PN +j
Since k=n
P (Ak ) → ∞ as j ↑, the right-hand side goes to 0.
Here are the nonstandard versions of these results. They are key ingredients in the
proofs of some of the forms of strong law of large numbers. They’re also pretty cool,
and useful too.
The weak law told us that An ≃ 0 almost everywhere. By contrast, the strong law
tells us that An is near convergent to 0 almost everywhere.
8.6. Limited Fluctuation. Because R is complete in the usual metric, the convergent
sequences in V (S) are the Cauchy sequences. Cauchy sequences are the ones that do
not vary by more than any ǫ ∈ R++ more than finitely many times.
Suppose that ξ : [0, ∞) → R in V (S). Here is a Cauchy-style reformulation of ξ
converging: ξ has k ǫ-flucations if there exist time points t0 , . . . , tk ∈ [0, ∞) such that
55
|ξ(t1 ) − ξ(t0 )| ≥ ǫ, |ξ(t2 ) − ξ(t1 )| ≥ ǫ, . . ., |ξ(tk ) − ξ(tk−1 )| ≥ ǫ; ξ converges if for all
ǫ > 0, it has only finitely many ǫ-fluctations.
Near converging and having a limited number of ǫ-fluctuations are different for
internal ξ : T → ∗ R.
Definition 8.6.1. An internal ξ : T → ∗ R is of limited fluctuation if for all
non-infinitesimal ǫ > 0, ξ does not have an unlimited number of ǫ-fluctations.
The set of limited fluctuation ξ’s is not an internal set. We are going to want to
assign a probability to the set of ω’s for which n 7→ ξ(n, ω) has limited fluctuation.
Example 8.6.1. For a sequence n 7→ ξ(n), n ∈ {1, . . . , N }, N unlimited, and for an
unlimited even m < N , let ξ(n) = 0 for n ≤ m/2, ξ(n) = 1 for m/2 < n ≤ N . On
the initial interval {1, . . . , m/2}, n 7→ ξ(n) near converges to 0, but it does not near
converge on {1, . . . , N }.
Initial intervals converging to 0 is the general pattern for sequences having limited
fluctuations. To prove it, we are going to map from V (∗ S) to N: let A(·) be a statement,
internal or external, about n ∈ N; if {n ∈ N : A(n)} =
6 ∅, then it has a least element.
The following is about the set of limited fluctuation time paths, that is, it is about a
set of time paths that is not internal. That’s okay, we’re only working with one at a
time.
Theorem 8.6.2. If ξ : T → R is of limited fluctuation, then for some unlimited m,
ξ(t0 ), ξ(t1 ), . . . , ξ(tm ) converges.
Proof. The essential intution is that looking at the limited parts of the index set, the
sequence has to be Cauchy. The actual argument is a bit trickier.
8.7. Versions of Regularity for Time Paths.
8.7.1. Time Path Properties.
8.7.2. Asymptotic similarities. One way to talk about asymptotic similarity of infinite
time paths, xt , yt ≥ 0 in V (S) is to say that x = O(y), read as “x is big Oh of y,” if
lim supt↑∞ xytt < ∞, x = o(y), read as “x is little Oh of y,” if lim supt↑∞ xytt = 0, and “x
is asymptotic to y” if limt↑∞ xytt = 1.
Definition 8.7.1. For x, y ∈ ∗ R, x is asymptotic to y if
x
y
≃ 1.
The “big O” and “little o” relations are xy being limited and being infinitesimal.
The following two lemmas and example tell us that an infinitesimal percentage error
in an unlimited quantity can be infinitely large, but that an infinitesimal percentage
error of a limited quantity is not large.
Lemma 8.7.2. For limited x, y ∈ ∗ R, x is asymptotic to y iff x ≃ y.
Example 8.7.1. If m is an unlimited integer, x = m! + m, and y = m!, then x is
asymptotic to y and |x − y| is unlimited.
∗
Lemma 8.7.3. For all
{1, . . . n}, xi > 0, yi > 0 and xi is
P n ∈ N, if for all i ∈P
asymptotic to y, then i≤n xi is asymptotic to i≤n yi .
56
Proof. For
R++ , xi ∈ [(1−ǫ)yi , (1+ǫ)yi ], summing over i yields
P arbitrary ǫ ∈P
[(1 − ǫ) i≤n yi , (1 + ǫ) i≤n yi ].
P
i≤n
xi ∈
9. Time Paths, Near Continuity, Integrals, and Control
Throughout, we are going to work with a time set T = {t1 , . . . , tN } ⊂ ∗ R that is a
near interval, that is, one that satisfies dH (T, ∗ [a, b]) ≃ 0 for a limited interval [a, b]
where dH (·, ·) is the Hausdorff metric.5 This entail T having infinitesimal increments,
which in turn entails N being unlimited. Further, we are going to assume that ◦ t1 = a
and ◦ tN = b.
To save on levels of subscripts in notation, a time path (ξ(t1 , ω), ξ(t2 , ω), . . . , ξ(tN , ω))
for ω ∈ Ω will be denoted (ξ(1, ω), ξ(2, ω), . . . , ξ(N, ω)), and even (ξ1 , . . . , ξN ).
9.1. Near Convergence. A first useful properties of a time paths is that it “converges.”
Definition 9.1.1. For unlimited N , a sequence n 7→ ξn , n ∈ {1, . . . , N } is near
convergent if for some x ∈ ∗ R, ξn ≃ x for all unlimited n.
This means that from n to N , there is not very much movement in the sequence.
In more detail, since the set {n, n + 1, . . . , N } is ∗ -finite, there is an n◦ that solves
max{|ξm − x| : m ∈ {n, . . . , N } }, and by assumption, |ξn◦ − x| ≃ 0.
Lemma 9.1.2. If a sequence n 7→ ξn is both near convergent to an x ∈ ∗ R and limited
for limited n, then x is nearstandard.
Proof. If n 7→ ξ(n), n ∈ {1, . . . , N }, is near convergent to x, then {n ∈ {1, . . . , N } :
|ξ(n) − x| < 1 contains arbitrarily small infinite integers. Since this set is internal, it
must contain finite integers. For any of those finite integers, ξ(n) is limited, hence x
is limited.
Taking ξn ≡ x for some unlimited x tells you that you need the assumption that ξn
is limited for limited n in the previous.
Lemma 9.1.3. If ξn near converges to an unlimited x, then it is unlimited for some
finite n.
Proof. {n ∈ {1, . . . , N : (∀m ∈ {n, . . . , N })|ξm − x| < 1} contains arbitrarily small
infinite integers, hence contains a finite integer.
The following non-convergent example will help the understanding of triangular
arrays of random variables.
Example 9.1.1. Let N = h1, 2, 4, 8, 16, . . .i, for k ∈ N, let define ξk (n) = min{k n , nn }
for n = 1, . . . , 2k , and let n 7→ ξ(n) be hξ1 (·), ξ2 (·), ξ3 (·), . . .i. To describe this class of
examples without using equivalence classes, let k be an infinite integer, set N = 2k ,
and define ξ(n) = min{k n , nn } for n ∈ {1, . . . , N }.
5Let
K(R) denote the non-empty compact subsets of R, for K ∈ K(R) and ǫ > 0, define the ǫ-ball
around K as K ǫ = ∪x∈K Bǫ (x), and define the Hausdorff metric by dH (K1 , K2 ) = inf{ǫ > 0 : K1 ⊂
K2ǫ , K2 ⊂ K1ǫ }.
57
9.2. Near Continuity. Convergence and continuity go together like beer and tequila.
Definition 9.2.1. A function ξ : T → ∗ R is near continuous at t if [s ≃ t] ⇒
[ξ(s) ≃ ξ(t)], and it is near continuous on T if it is near continuous at every t ∈ T .
After rearranging the indexing a bit, this is a statement about the behavior of ξ(·)
along ∗ -finite sequences in T that converge to t. We will use this observation in the
proof of the following.
Lemma 9.2.2. If ξ(·) is limited for some t ∈ T and is near continuous, then it is
uniformly bounded by a limited number, that is, there exists a limited B such that
|ξ(t)| ≤ B for all t ∈ T .
Proof. Let tK solve maxt∈T |ξ(t)| and suppose that ξ(tK ) is unlimited, and let ξ(tk ) be
limited. If tk < tK , consider the sequence ξ(tk ), ξ(tk+1 ), . . . , ξ(tK ), if tk > tK , consider
the sequence ξ(tk ), ξ(tk−1 ), . . . , ξ(tK ). Near continuity and Lemma 9.1.2 tell us that
ξ(tk+m ) is unlimited for some finite m, contradicting at tk .
We want to know that the nonstandard function ξ : T → R is behaving like a
sensible standard function on [a, b]. The problem is that every τ ∈ [a, b] corresponds
to the many many t ∈ T that have ◦ t = τ . We are now going to show that a near
continuous ξ that is limited at some tT has a graph infinitesimally close to a standard
f ∈ C([a, b]). Note that the boundedness guaranteed in Lemma 9.2.2 is necessary for
this since any standard continuous function on [a, b] is bounded.
Theorem 9.2.3. If T is a near standard interval, ξ : T → ∗ R is nearly continuous on
T , and ξ(t) is limited for some t ∈ T , then the standard part of the graph of ξ is the
graph of a continuous function.
Another way to put this is that we can define a function f in V (S) by defining it
as f (t) = ◦ ξ(t′ ) for any t′ ≃ t, equivalently, f (t) = st (ξ(st −1 (t))) where st : ∗ R → R
is the standard part mapping.
Proof. gr(ξ), the graph of ξ, is an internal subset of ∗ [a, b] × ∗ [r, s], hence ◦ gr(ξ) is
a closed set, that is, it is the graph of an upper hemicontinuous, non-empty valued
correspondence, call it Ξ. All that is needed for it to be the graph of a continuous
function is that the correspondence be single-valued. Suppose not, i.e. suppose that
for some τ ∈ [a, b], x 6= x′ ∈ Ξ(τ ). Pick a non-infinitesimal ǫ > 0 such that |x − x′ | > ǫ.
Let
Dǫ = {δ ∈ ∗ R++ : (∀t′ ∈ T )[ [|t − t′ | < δ] ⇒ [|ξ(t′ ) − ξ(t)| < ǫ] ]}.
(73)
Dǫ contains arbitarily large infinitesimals, hence contains a non-infinitesimal δ > 0.
This means that if x ∈ Ξ(τ ) then x′ 6∈ Ξ(τ ) because for every τ ′ ∈ Bδ (τ ), Ξ(τ ′ ) ⊂
Bǫ (x).
Definition 9.2.4. If (M, d) is a metric space in V (S) and E is an interval subset of
M , then the standard part of E is st (E) := {x ∈ M : d(x, E) ≃ 0}.
∗
Since we identify functions with their graphs, and their graphs are sets, we can write
the conclusion of Theorem 9.2.3 as st (ξ) ∈ C([a, b]).
58
9.3. Paths and Integrals I. Recall the first integrals you saw, from calculus class.
A typical example
is a continuous f : [0, 1] → R, and we are interested in properties
Rx
of F (x) := 0 f (t) dt, x ∈ [0, 1]. Remember that for any subdivision 0 = t0 < t1 <
· · · < tN = 1,
R1
P
PN
F− := N
n=1 minx∈[tn−1 ,tn ] (tn −tn−1 ) ≤ 0 f (t) dt ≤ F+ :=
n=1 maxx∈[tn−1 ,tn ] (tn −tn−1 ),
(74)
that the upper and lower bounds converge to each other as the maximal length of the
subdivisions goes to 0, and that the limit is
sequence of subdivisions.
R bindependentPof the
∗
This means that for any f ∈ C([a, b]), a f (t) dt ≃ t∈T f (t) dt where the symbol
“dt” is being used in two different, but very similar ways in the integral and the ∗ -finite
sum.
Lemma 9.3.1. If T is a near standard interval with endpoints a, b, ξ : T → ∗ R is
near continuous and limited on T , and f = st (ξ), then
Z b
X
f (t) dt.
(75)
ξ(t) dt ≃
a
t
Rb
Proof.
From
the
discussion
of
the
first
integrals
seen
in
calculus
classes,
f (t) dt ≃
a
P
P ∗
P
P ∗
∗
|ξ(t)
−
f
(t)|
dt,
and
this
is turn
f
(t)
dt|
≤
ξ(t)
dt
−
f
(t)
dt.
Now,
|
t
t
t
t∈t
1
∗
∗
less than or equal to N · maxt∈T |ξ(t) − f (t)| N . Since T is -finite, the maximum is
achieved, and is ≃ 0.
We will later be interested in two generalizations of Lemma 9.3.1.
R1
P Rτ
1. If f is a piecewise-continuous function, just replace 0 f (t) dt with Ii=0 τii+1 f (t) dt
where 0 = τ0 < τ1 < · · · < τI = 1 and τ1 , . . . , τI−1 are the discontinuity points of f .
2. If f is a measurable function, then a result called Lusin’s theorem tells us that
it is “nearly” a continuous function. This will allow us to show that integrals of
measurable functions are infinitesimally close to ∗ -finite sums over near intervals
for the case that t 7→ f (t) is a measurable function. This will be useful when we
consider f (t) = ẋ(t) for a calculus
R t of variations or a control problem, and in these
cases, we will have x(t) = x0 + 0 ẋ(s) ds.
9.4. A First Control Problem. Here is one of the simplest control problems we will
ever see in V (S),
R1
min 0 [c1 ẋ2 (t) + c2 x(t)] dt s.t. x(0) = 0, x(1) ≥ B, ẋ(t) ≥ 0.
(76)
The idea is that, starting with none on hand, we need to produce a total amount B
of a good by time 1. There is a storage cost, c2 , per unit stored for a unit of time,
and producing at a rate r, i.e. having x′ (t) = dx/dt = ẋ = r, incurs costs at a rate
c1 (x′ (t))2 . The tradeoff between producing fast productino rate and storage costs leads
us to believe that the solution must involve starting production at a low level at some
point in the interval and increasing the rate at which we produce as we near the end
of the interval.
59
9.4.1. A Near Interval Formulation. Let us turn to the near interval formulation of
the problem. We replace [0, 1] by a near interval T with increments dt, and to make
life simpler we suppose that the increments have equal size, dt ≃ N1 for some unlimited
N . Now x′ (t) is the action, at , chosen at t, at = x(t+dt)−x(t)
, that is, the discrete slope
dt
of the amount on stock over the interval of time between t and t + P
dt. This means
that if we choose actions a0 , a1 , . . . , aN −1 , then by any t ∈ T , x(t) = s<t as ds. The
problem is replaced by
P
P
P 2
(77)
mina0 ,a1 ,...,aN −1
s<T as ds = B, at ≥ 0.
s<t as ds dt s.t.
t c 1 at + c 2
9.4.2. Solving the Near Interval Formulation. Writing the Lagrangean, we have
#
"
X
X
X
L(a0 , . . . , aN −1 ; λ) =
as ds dt + λ(B −
c1 a2t + c2
as ds).
(78)
t
s<t
s<T
With the non-negativity constraints, the Kuhn-Tucker conditions are ∂L/∂at ≤ 0,
at ≥ 0, at · (∂L/∂at ) = 0. When the optimal a∗t are positive, we have
i
h
PN −1
∂L
(79)
= 2c1 at + c2 k=t 1 ds dt − λ dt = 0.
∂at
Now, the dt’s are all equal to N1 , and being a common factor, we can take them out.
P −1
∗
Further, N
k=t 1 ds ≃ (1 − t). Putting these together, for the positive at ’s, we have
a∗t =
λ−c2 (1−t)
.
2c1
(80)
This means that the slope is an affine function of t with slope 2cc21 : the larger is the
storage cost, c2 , relative to c1 , the steeper the slope and the less time will be spent
producing; the larger is the production cost, c1 , relative to c2 , the lower the slope, and
the more time will be spent producing. All that is left is to clear up the remaining
details.
P
To find the value of λ, we substitute back into the constraint s<T a∗s ds = B, and
use our result about integrals, Lemma 9.3.1, to find that we determine λ by solving
R 1 λ−c2 (1−t)
dt = B,
(81)
2c1
0
which yields λ∗ = B · 2c1 + 21 c2 . hFinding the
i t at which we start producing involves
2Bc1
1
∗
setting λ − c2 (1 − t) = 0, i.e. t = 2 − c2 as long as this is non-negative, and t = 0
otherwise, which agrees with the Kuhn-Tucker non-negativity versions of (80).
9.4.3. Checking that We’ve Solved the Original Problem. Now we check that we have
indeed solved the original problem in V (S), the one given in (1). The standard part
of the near continuous function t 7→ a∗t is 0 until max{0, t} and it increases linearly
with slope 2cc21 until t = 1. Suppose that there exists a piecewise continuous alternative
strategy b(t) = ẋ(t) that yields more than st (a∗ (·)) in (1). This means that we can
do better by a non-infinitesimal. Consider the strategy ∗ b : T → ∗ R+ , it must give
within an infinitesimal of the strategy b in (1), which means that ∗ b must beat a∗ by
a non-infinitesimal, a contradiction.
60
9.5. A Control Problem Without a Solution. Everything worked very smoothly
in the first problem. Here is the simplest example that I know of about what can go
wrong.
Consider the problem in V (S),
R1
max 0 [ẋ2 (t) − x2 (t)] dt s.t. − 1 ≤ ẋ(t) ≤ +1
(82)
where the maximum is taken over piecewise continuous functions t 7→ ẋ(t). The first
term in the integrand tells us that we want to be moving as fast as possible, the
second term tells us that we want to minimize our displacement. These have a rather
contradictory feel to them. Let us examine just how contradictory.
9.5.1. Nonexistence. Divide the interval [0, 1] into N equally sized sub-intervals, N ∈
N, and consider the path that over each interval [ Nk , k+1
] has ẋ = +1 for the first half
N
and ẋ = −1 for the second half. This means that x(t) goes up at slope +1 over the
first half of each interval and down with slope −1 over the second half of each interval.
An N ↑, the value to this path in (82) converges up to 1. However, the value 1 cannot
be achieved by any path — that requires that ẋ(t) alway be either ±1 and x(t) always
be 0, contradictory requirements.
9.5.2. A Near Interval Formulation. Replace [0, 1] with a near interval T = {0, N1 , . . . , NN−1 , 1}
with N unlimited and even. Reformulate (82) as
2 i
P
P h 2
dt s.t. − 1 ≤ at ≤ +1.
(83)
a
ds
maxa0 ,a1 ,...,aN −1
a
−
s
t
s<t
t
Notice that there is a pair of multipliers for each t ∈ T , one for the constraint −1 ≤ at
and one for the constraint at ≤ +1. Only one of each of these constraints can be binding
at any point in time. Often the time path of the multipliers is very informative about
when and where constraints are most strongly pinching the solution. Here, there is so
much symmetry that the pattern of the multipliers looks like the pattern we will see
in the solutions and has no further information.
One of the two solutions to this is a∗k/N = +1 for the even k and a∗k/N = −1 for
the odd k (the other solution reverses the signs). This gives a utility of 1 − 12 dt2 ≃ 1.
We see a continuation of the pattern of approach the supremum value of 1 — since
dt = N1 , larger N yields smaller dt, yielding a higher value. Thus, the near interval
formulation has a solution, it gives a value ≃ 1.
9.5.3. Trying to Take the Near Interval Solution Back to V (S). The optimal path
t 7→ a∗t is not near continuous, it move up or down by 2 between each t and t + dt.
This is a phenomenom known as chattering. Not only is there not any continuous
function that behaves like this, there is not any measurable function.
To see why, look up Lebesgue’s density theorem, it tells us that for any measurable
A ⊂ [0, 1], there is an A′ ⊂ A such U nif (A \ A′ ) = 0 and for each x ∈ A′ ,
limǫ↓0
U nif (A∩(x−ǫ,x+ǫ))
2ǫ
= 1.
(84)
If A = [a, b], then A′ = (a, b), we just get rid of the end-points. The amazing part of
Lebesgue’s result is that this simple intuition of getting rid of end points works for all
measurable sets. In particular, this means that for each x ∈ A′ , the derivative of the
61
function H(x) = U nif (A∩[0, x]) is equal to 1. Applying Lebesgue’s density theorem to
B := Ac , for x ∈ B ′ , the derivative of H(x) is equal to 0. Since U nif (B)+U nif (B ′ ) =
1, this means that the derivative of H is, for Lebesgue almost every x ∈ [0, 1], either
equal to 0 or equal to 1.
Now, if there is a measurable function representing t 7→ a∗t , then we can partition
[0, 1] into a set A on which ẋ(t) = +1 and B := Ac on which ẋ(t) = −1. However,
for every non-infinitesimal x ∈ [0, 1], the proportion of the t ∈ T with t < x and
a∗t = +1 is, up to an infinitesimal, equal to 21 . This means that for our measurable
function we would have to have U nif (A ∩ [0, x]) = U nif (B ∩ [0, x]) = 21 x for each
x ∈ (0, 1]. Lebesgue’s density theorem tells us that this cannot happen, we must have
the derivative equal to 1 or 0 almost everywhere rather than equal to 12 everywhere.
9.5.4. Another Represenation. If we look at subsets E := (a, b] × (c, d] in [0, 1] ×
[−1, +1], we could ask what proportion of the time is the path (t, a∗t ) in the set ∗ E?
The answer gives a probability distribution that is half times the uniform distribution
on [0, 1] × {+1} and half times the uniform distribution on [0, 1] × {−1}. This is one
version of what is often called a Young measure. We are not going to spend a lot of
time on these right now, we will come back to them in some control problems, and a
little bit more extensively while doing continuous time games.
9.6. The Euler-Lagrange Necessary Conditions. We are going to be studying
two kinds of problems,
Z 1
U (t, x(t), ẋ(t)) dt s.t. x(0) = b0 , x(1) = b1 , and
(85)
maxx(·)
0
Z 1
U (t, x(t), u(t)) dt s.t. x(0) = b0 , x(1) = b1 , (∀t ∈ [0, 1])[a(t) ∈ A],
maxa(·)
0
and (∀t ∈ [0, 1])[ẋ(t) = f (t, x(t), a(t))],
(86)
where the x(·) should be picked with ẋ(·) piecewise
R t continuous, and the a(·) should
also be piecewise continuous. Since x(t) = x0 + 0 ẋ(s) ds, we could reformulate the
first problem as a maximization over piecewise continuous a(·)’s with ẋ(t) = a(t). To
put it another way, taking f (t, x(t), a(t)) ≡ a(t) and A = R, the first problem becomes
a special case of the second problem.
We will, mostly, study these problems by studying their near interval formulations
and then either take the standard part of the optimal path, or work with the path
on the near interval. The necessary conditions for an optimum for the first class
of problems are called the Euler-Lagrange conditions, or sometimes the Euler
equation. We will set up the second class of problems with an infinite set of constraints
and a corresponding infinite set of multipliers, one on two for each t in a near interval
T depending on whether the a(t) ∈ A constraint is binding. When we use Lagrangean
multipliers to set up a constrained problem, the necessary conditions are those of an
unconstrained problem, so the Euler-Lagrange conditions will be useful there too.
9.6.1. The Near Interval Formulations. Replace [0, 1] with T = { Nk : k = 0, 1, . . . , N },
N ≃ ∞. We will adopt the following notations for the first class of problems: dt or
62
P
ds denotes N1 ; xk denotes x( Nk ); ak denotes a( Nk ); for k = 1, 2, . . . , N , xk = j<k ak dt;
and ak = xk+1dt−xk . This uses xk as the state of the system at the beginning of the time
interval [ Nk , k+1
), and has the action, ak , determine the value of xk+1 , with ak by being
N
, xk+1 ), k ≤ (N − 1).
the slope of the line segment joining ( Nk , xk ) and ( k+1
N
With these notations, the first of the problems can be formulated either in terms of
the optimal path of states or in terms of the optimal path of actions, that is, either as
P −1
xk+1 −xk
k
maxxk :k=1,...,N −1 N
) dt s.t. x0 = b0 , xN = b1 , or as
(87)
k=0 U ( N , xk ,
dt
PN
P
P
maxak :k=0,...,N −1 k=0 U ( Nk , j<k aj ds, ak ) dt s.t. xN = b0 + j<N aj ds = b1 . (88)
The first formulation focuses our attention on the path that the state takes, t 7→ xt for
t ∈ T , the second focuses our attention on the pathh that the actions take. In looking
at the associated first order conditions, each xk appears twice, while each ak appears
in each of the summation terms from k + 1 onwards.
Note that we are omitting the point 1 from the summation, but that this does not
matter, at time 1, decisions have no effect because there is no future left for any action
to change.
In a similar fashion, the second class of problems can be formulated as
P −1
k
maxak :k=0,...,N −1 N
k=0 U ( N , xk , ak ) dt s.t. x0 = b0 , xN = b1 , ak ∈ A
and xk+1 = xk + f ( Nk , xk , ak ) dt, k = 0, . . . , N − 1.
(89)
Again, taking f (t, x, a) ≡ a and A = R recovers the first class of problems.
9.6.2. Necessary Conditions. There are many ways to arrive at the necessary conditions for problem 87. The easiest is the most direct, at an optimal path x∗k , k =
1, . . . , N − 1, the derivative of the objective function must equal 0. Since xk appears
three times in the summation, this yields, after removing the common factor of dt,
∂U ( k−1
,x∗k−1 ,
N
∂ ẋ
∗
x∗
k −xk−1
dt
) 1
dt
x∗
+
−x∗
k
k
,x∗k , k+1
∂U ( N
dt
∂x
)
x∗
−
−x∗
k
k
,x∗k , k+1
∂U ( N
dt
∂ ẋ
) 1
dt
= 0.
(90)
Rearranging and supressing some arguments, this is
∂U
∂x
=
d ∂U
,
dt ∂ ẋ
(91)
and this has to hold at every point along the optimal path.
9.7. Some Examples of Using the Euler-Lagrange Conditions.
9.7.1. A Squared Law of Resistance.
P
P
(92)
minak :k=0,...,N −1 a2k dt s.t.xN = 0 + j<N aj ds = B.
P 2
P
The Lagrangean is L(a, λ) =
ak + λ(B − j<N aj , and from the FOC, one sees that
at every k, 2ak = λ, that is, the ak are constant. The E-L necessary conditions are, at
every point along the optimal path,
d ∂U
0=
,
(93)
dt ∂ ẋ
that is, ẋ∗ (t) = a∗ (t) cannot change with t.
63
9.8. Control Problems Bang-Bang Solutions. So far the solutions t 7→ a∗t have
either been near continuous, or they have been wildly discontinuous. Intermediate
between these are solutions where the graph of a∗t looks like the graph of a piecewise
continuous function in V (S). When the solution bounces between extreme points of
the constraint set, we call it a bang-bang solution. The idea is that the optimal way
to control the process is to slam the controls between extremes at optimally chosen
points in time, and the slamming results in a banging sound (at least inside our heads).
9.9. Savings/Investment Problems.
64
10. The Basics of Brownian Motion
For any infinite n set N = n!, let T = { Nk : k ∈ {0, 1, . . . , N 2 }} with tk denoting the
k’th time point in T , i.e. tk = Nk , and dt, as usual, denoting the time increment N1 .
2
Let Ω = {−1, +1}T so that |Ω| = 2N +1 , F the class of internal subsets of Ω, and let
2
P be the uniform distribution, P (A) = 2N|A|
2 +1 . For each ω ∈ Ω and k ∈ {0, 1, . . . , N },
define ωk = projtk (ω).
10.1. Two Versions of the Time Lines. We have two ways to define the nonstandard process for Brownian motion, one that specifies the value at each t ∈ T , and
one that specifies the value for each t in the ∗ -infinite, internal interval [0, N 2 ]. The
second is just linear interpolation of the first. The first is defined at each tk ∈ T by
k
1 X
χd (tk , ω) = √
ωj ,
N j=1
(94)
the second is defined at each t ∈ [0, N 2 ] by
⌊N t⌋
1 X
ωj + (N t − ⌊N t⌋)ωj+1
χc (t, ω) = √
N j=1
(95)
where for x ∈ ∗ R+ , ⌊x⌋ is the largest integer less than or equal to x.
We will now show that either of the following defines a Brownian motion on (Ω, L( F), L(P )),
for standard t ∈ [0, ∞) and ω ∈ Ω, either
b(t, ω) = ◦ χd (t+, ω) or β(t, ω) = ◦ χc (t, ω)
(96)
where for any standard t ∈ [0, ∞), t+ is the first t ∈ T greater than or equal to t,
t+ := min{tk ∈ T : t ≤ tk }.
10.2. Showing Brownian-ness. An implication of the proofs is that for a set of ω
having probability 1, the paths t 7→ b(t, ω) and equal to the paths t 7→ β(t, ω).
Lemma 10.2.1. For standard 0 ≤ s < t, b(t, ·) − b(s, ·) ∼ N (0, (t − s)).
Proof. From the theory of characteristic functions, a random variable, Y , has a N (0, σ)
1 2
distribution iff E eirY = e− 2 r σ . Now,
Z
R
eir(b(t,ω)−b(s,ω) dL(P )(ω) ≃ ∗ eir(χd (t+,ω)−χd (s+,ω) d(P )(ω)
(97)
Z
P
ir √1
ω
= ∗ e N k:s+<tk ≤t+ k dP (ω)
(98)
R
= Πk:s+<tk ≤t+ ∗ eirωk dP (ω)
(99)
(100)
= Πk:s+<tk ≤t+ ∗ cos √rN
where the switch to cos(·) comes from eix = cos(x) + i sin(x), and the symmetry of the
distribution of the ωk , which implies that E sin(ωk ) = 0 because sin(x) = − sin(−x).
2
4
Now, the Taylor expansion of cos(x) is cos(x) = 1 − x2! + x4! + o(x5 ). This means that
1 2
Πk:s+<tk ≤t+ ∗ cos( √rN ) ≃ e 2 r (t−s) as required.
65
Lemma 10.2.2. For any s < t, s, t ∈ T , E (χd (t, ·) − χd (t, ·)2 = (t − s)2 and
E (χd (t, ·) − χd (t, ·)4 = 3(t − s)2 − 2(t − s)dt.
Proof. For the standard parts, these are just the second and fourth central moments
of a N (0, t − s). Direct calculations, using the independence of increments to cancel
out the odd power terms, delivers these results.
√
Now, if ω1 = ω2 = · · · = ωN = 1, then χd (tN , ω) = N · √1N = N ≃ ∞, for this
ω, the time path has no standard part. The next result is crucial, it says that the
probability that this kind of oddity happening is 0.
Theorem 10.2.3. L(P )({ω : t 7→ b(t, ω) is continuous }) = 1.
The observation behind the proof is that a standard function f : [0, ∞) → R
is continuous iff for all integers n and k, for large enough m, we do not have an
] ⊂ [0, k] such that (f ( mi ) − f (s))4 ≥ n1 . This is so because a continuous
s ∈ [ mi , i+1
m
function on [0, ∞) is uniformly continuous on each [0, k].
Proof. For each standard k ∈ N, define Tk = [0, k] ∩ T . For any standard triple of
integers, m, n, k, here is a set that we would like to have small probability,
S
1
1
4
|χ
(
.
(101)
,
ω)
−
χ
(s,
ω)|
≥
Ωm,n,k = i∈N ω : ∃s ∈ (Tk ∩ mi , i+1
d
d
m
m
n
We show that limm↑∞ P (Ωm,n,k ) = 0, the limit being taken along N.
The lovely (or sneaky) part of the argument is the observation that
1
1
, ω) − χd (s, ω)|4 ≥ })
m
n
1
, ω)|4 ≥ n1 }).
≤ 2 · P ({ω : |χd ( m , ω) − χd ( i+1
m
P ({ω :|χd (
(102)
This comes from a “reflection” argument for random walks. Pick any ω such that
, ω)|4 < n1 but for some s ∈ T ∩ [ mi , i+1
], |χd ( m1 , ω) − χd (s, ω)|4 ≥ n1 .
|χd ( m1 , ω) − χd ( i+1
m
m
For this ω, let sω be the smallest time in T ∩[ mi , i+1
] at which |χd ( m1 , ω)−χd (s, ω)|4 ≥ n1 .
m
Look at the path that comes from switching the sign of each ωk for tk ≥ sω . The new
, ω)|4 < n1 , we
ω ′ has exactly the same probability, and because |χd ( m1 , ω) − χd ( i+1
m
know that |χd ( m1 , ω ′ ) − χd ( i+1
, ω ′ )|4 > n1 . Hence, if we ‘double count’ the number
m
1
i+1
of ω’s with |χd ( m , ω) − χd ( m , ω)|4 ≥ n1 , we must have counted all of the ω’s with
|χd ( m1 , ω) − χd (s, ω)|4 ≥ n1 for some s ∈ T ∩ [ mi , i+1
].
m
Now we use a counting argument, the moments above, and Chebyshev,
P (Ωm,n,k ) ≤ 2 ·
≤2·
km−1
X
i=0
km−1
X
≤ 6n ·
i=0
P ({ω : |χd (
nE |χd (
km−1
X
i=0
i+1
1
1
, ω) − χd (
, ω)|4 ≥ })
m
m
n
i
i+1
, ·) − χd (
, ω)|4
m
m
i
6kn
≤
.
2
m
m
As (6kn)/m → 0 as m ↑ ∞, L(P )(∪n,k ∩m Ωm,n,k ) = 0.
66
(103)
(104)
(105)
10.3. Derivatives and Bounded Variation. For a standard function f : [0, ∞) →
R and t ∈ [0, ∞), the upper and lower right derivatives at t are
f (s)−f (t)
s−t
f (s)−f (t)
limǫ↓0 inf s∈(t,t+ǫ) s−t .
D+ f (t) := limǫ↓0 sups∈(t,t+ǫ)
(106)
D+ f (t) :=
(107)
If these two are equal and finite, f has a right derivative at t. The following tells us
that Brownian motion paths are never differentiable.
Lemma 10.3.1. L(P )({ω : D+ b(t, ω) = ∞}) = L(P )({ω : D+ b(t, ω) = −∞}) = 1.
The variation of a standard function
P f : [0, ∞) → R over an interval [s, t] is the
supremum of the sums of the form
|f (ti ) − f (si )| where s = s1 < t1 ≤ s2 < t2 ≤
· · · ≤ sI < tI = t. A function has locally bounded variation if its variation over any
compact interval is finite, and it has some locally bounded variation if it has bounded
variation of some compact non-degenerate interval The following tells us that Brownian
motion paths have, with probability 1, infinite variation over every compact interval.
Lemma 10.3.2. L(P )({ω : b(t, ω) has some locally bounded variation }) = 0.
10.4. Itô’s Lemma. The relation between bounded variation and integral is taught
in high-end calculus classes. For a piecewise continuous function h : [0, 1] → R and a
G : [0, 1] → R having bounded variation on [0, 1], the Riemann-Stjeltjes integral is
R1
P
supx∈[si ,ti ] h(x) (G(ti ) − G(si ))
(108)
h(t) dG(t) := lim
0
the limits being
kinds of si , ti pairs. The basic result is that
Rwith
Ptaken over the same
1
inf x∈[si ,ti ] h(x) (G(ti ) − G(si )) for all piecewise continuous h(·)
h(t) dG(t) = lim
0
and does not depend on the sequence of subdivisions iff G(·) is of bounded variation.
Despite this, we would like to be able to define
R1
R1
h(t)
db(t,
ω)
and
h(t, ω) db(t, ω).
(109)
0
0
67
References
[Bergin and Macleod, 1993] Bergin, J. and Macleod, W. B. (1993). Continuous time repeated games.
International Economic Review, 34(1):21–37.
[Blume et al., 1991a] Blume, L., Brandenburger, A., and Dekel, E. (1991a). Lexicographic probabilities and choice under uncertainty. Econometrica, 59(1):61–79.
[Blume et al., 1991b] Blume, L., Brandenburger, A., and Dekel, E. (1991b). Lexicographic probabilities and equilibrium refinements. Econometrica, 59(1):81–98.
[Clarke, 2013] Clarke, F. (2013). Functional analysis, calculus of variations and optimal control, volume 264 of Graduate Texts in Mathematics. Springer, London.
[Corbae et al., 2009] Corbae, D., Stinchcombe, M. B., and Zeman, J. (2009). An introduction to
mathematical analysis for economic theory and econometrics. Princeton University Press, Princeton,
NJ.
[Fleming and Rishel, 1975] Fleming, W. H. and Rishel, R. W. (1975). Deterministic and stochastic
optimal control. Springer-Verlag, Berlin. Applications of Mathematics, No. 1.
[Fudenberg and Levine, 1986] Fudenberg, D. and Levine, D. (1986). Limit games and limit equilibria.
Journal of Economic Theory, 38(2):261–279.
[Fudenberg and Levine, 1993] Fudenberg, D. and Levine, D. K. (1993). Self-confirming equilibrium.
Econometrica, 61(3):523–545.
[Fudenberg and Tirole, 1985] Fudenberg, D. and Tirole, J. (1985). Preemption and rent equalization
in the adoption of new technology. The Review of Economic Studies, 52(3):383–401.
[Fudenberg and Tirole, 1991] Fudenberg, D. and Tirole, J. (1991). Game theory. MIT Press, Cambridge, MA.
[Kakutani, 1948] Kakutani, S. (1948). On equivalence of infinite product measures. Ann. of Math.
(2), 49:214–224.
[Liberzon, 2012] Liberzon, D. (2012). Calculus of variations and optimal control theory. Princeton
University Press, Princeton, NJ. A concise introduction.
[Lindstrøm, 1988] Lindstrøm, T. (1988). An invitation to nonstandard analysis. In Nonstandard analysis and its applications (Hull, 1986), volume 10 of London Math. Soc. Stud. Texts, pages 1–105.
Cambridge Univ. Press, Cambridge.
[Manelli, 1996] Manelli, A. M. (1996). Cheap talk and sequential equilibria in signaling games. Econometrica, 64(4):917–942.
[McShane and Warfield, 1969] McShane, E. J. and Warfield, Jr., R. B. (1969). Addenda and corrigenda to “On Filippov’s implicit functions lemma”. Proc. Amer. Math. Soc., 21:496–498.
[Nelson, 1987] Nelson, E. (1987). Radically elementary probability theory, volume 117 of Annals of
Mathematics Studies. Princeton University Press, Princeton, NJ.
[Perry and Reny, 1993] Perry, M. and Reny, P. J. (1993). A non-cooperative bargaining model with
strategically timed offers. Journal of Economic Theory, 59(1):50–77.
[Simon and Stinchcombe, 1989] Simon, L. K. and Stinchcombe, M. B. (1989). Extensive form games
in continuous time: pure strategies. Econometrica, 57(5):1171–1214.
[Simon and Stinchcombe, 1995] Simon, L. K. and Stinchcombe, M. B. (1995). Equilibrium refinement
for infinite normal-form games. Econometrica, 63(6):1421–1443.
[Simon and Zame, 1990] Simon, L. K. and Zame, W. R. (1990). Discontinuous games and endogenous
sharing rules. Econometrica, 58(4):861–872.
[Stinchcombe, 1992] Stinchcombe, M. B. (1992). Maximal strategy sets for continuous-time game
theory. J. Econom. Theory, 56(2):235–265.
[Stinchcombe, 2005] Stinchcombe, M. B. (2005). Nash equilibrium and generalized integration for
infinite normal form games. Games Econom. Behav., 50(2):332–365.
68