Every model in this book has assumed rational agents — consumers who maximize expected utility, firms that minimize costs, traders with consistent time preferences and correct beliefs. These assumptions are powerful: they yield sharp predictions, clean welfare theorems, and elegant mathematics. But are they true?
This chapter confronts the evidence. Behavioral economics documents predictable, systematic deviations from the standard rational model. These are patterned biases that survive repetition, incentives, and even expertise, not random errors that wash out in aggregation.
We begin with the cracks in expected utility theory — the Allais and Ellsberg paradoxes — and build toward prospect theory, the leading descriptive alternative. We then examine intertemporal choice under present bias, social preferences that violate pure self-interest, bounded rationality and heuristics, experimental methodology, nudge theory, and behavioral finance. Throughout, the approach is formal: we write down utility functions, derive predictions, and test them against data.
Prerequisites: Expected utility theory (Ch. 6), game theory (Ch. 7), consumer theory (Ch. 6/10), econometrics basics (Ch. 9), mechanism design familiarity (Ch. 11).
Recall from Chapter 6 that under the axioms of completeness, transitivity, continuity, and the independence axiom, preferences over lotteries can be represented by expected utility:
Independence is elegant and normatively appealing. It says your preference between two gambles should not be swayed by an irrelevant common component. But as Maurice Allais demonstrated in 1953, most human beings violate it consistently.
Consider two pairs of lotteries:
Pair 1: Gamble 1A: \$1M with certainty. Gamble 1B: \$5M with prob 0.10, \$1M with prob 0.89, \$0 with prob 0.01.
Pair 2: Gamble 2A: \$1M with prob 0.11, \$0 with prob 0.89. Gamble 2B: \$5M with prob 0.10, \$0 with prob 0.90.
The modal pattern: most people choose 1A over 1B and 2B over 2A. This joint choice $\{1A, 2B\}$ violates the independence axiom.
By independence, replacing the common consequence (\$1M in Pair 1, \$0 in Pair 2) should not change the ranking. If $1A \succ 1B$, then $1A \succ 2B$. The reversal reveals a certainty effect.
Why it matters: The certainty effect is the whole story without the algebra: a sure thing feels categorically different from a 99% thing. Going from 99% to 100% buys you peace of mind that going from 10% to 11% does not — even though the extra one percentage point is identical. Expected utility says those two extra points should be worth exactly the same. Most people, faced with the Allais pairs, prove they are not. That gap between “almost certain” and “certain” is what the independence axiom cannot see and what every behavioral model after it is built to capture.
Consider an urn with 30 red balls and 60 balls that are black or yellow in unknown proportions. Gamble A: win \$100 if red (prob 1/3, known). Gamble B: win \$100 if black (prob unknown). Most choose A.
But then: Gamble C: win \$100 if red or yellow. Gamble D: win \$100 if black or yellow. Most choose D. Under EU, $A \succ B$ requires $C \succ D$. The joint choice $\{A, D\}$ violates the Sure-Thing Principle.
These paradoxes reveal that the independence axiom fails descriptively. We need a theory that accommodates these violations.
Figure 19.3. Allais Paradox Detector. Select your preferred gamble in each pair, then check whether your choices violate the independence axiom.
Pair 1
Pair 2
Problem. Two lottery pairs. Assume CRRA utility $u(x) = x^{0.5}$ (with $x$ in millions). (a) Compute EU of each gamble. (b) Which does EU recommend? (c) Show {1A, 2B} violates independence.
Solution.
(a) $EU(1A) = 1.0 \times 1^{0.5} = 1.000$. $EU(1B) = 0.89(1) + 0.10(2.236) + 0.01(0) = 1.1136$. $EU(2A) = 0.11(1) = 0.11$. $EU(2B) = 0.10(2.236) = 0.2236$.
(b) EU recommends 1B (1.114 > 1.000) and 2B (0.224 > 0.110). EU-consistent pairs: {1A, 2A} or {1B, 2B}.
(c) $1A \succ 1B$ requires \$1.11 \, u(1) > 0.10 \, u(5) + 0.01 \, u(0)$. \$1B \succ 2A$ requires \$1.10 \, u(5) + 0.01 \, u(0) > 0.11 \, u(1)$. These directly contradict. No $u(\cdot)$ satisfies both.
Kahneman and Tversky (1979) proposed prospect theory as a descriptive alternative, later refined as cumulative prospect theory (1992). It modifies EU in four ways: reference dependence, loss aversion, diminishing sensitivity, and probability weighting.
The value function replaces $u(x)$ defined over final wealth with $v(x)$ defined over gains and losses relative to a reference point:
The parameters estimated by Tversky and Kahneman (1992) are $\alpha = \beta = 0.88$ and $\lambda = 2.25$.
Three properties: (1) Reference dependence — outcomes are coded as gains or losses relative to $r$. (2) Diminishing sensitivity — $\alpha, \beta < 1$ gives concavity for gains and convexity for losses. (3) Loss aversion — $\lambda > 1$ makes the value function steeper for losses.
Why it matters: Two facts about how outcomes feel do all the work here. First, losses loom larger than gains — losing \$100 stings about 2.25 times as much as winning \$100 feels good — which is why the curve below the reference point drops faster than it rises above it. Second, what counts as a gain or a loss depends entirely on where you start: the same \$50,000 salary is a triumph after \$30,000 and a wound after \$70,000. The kink at the reference point is loss aversion; the bend of each arm is the fading of sensitivity as you move away from it. Drag the sliders on the figure and watch the steepness of the loss arm change — that steepness is $\lambda$.
Figure 19.1. Prospect theory value function. The S-shaped curve is concave for gains and convex for losses, with a steeper slope for losses (loss aversion). At $\alpha = \beta = \lambda = 1$ it collapses to linear (EU). Drag sliders to explore.
The Tversky-Kahneman (1992) parameter $\delta \approx 0.65$. When $\delta = 1$, $w(p) = p$ (EU). When $\delta < 1$, the function overweights small probabilities and underweights large ones. Crossover at $p \approx 0.37$.
Why it matters: People do not treat probabilities the way a calculator does. A tiny chance feels bigger than it is — which is why we buy lottery tickets and insurance against rare disasters — while a near-sure thing feels less sure than it is, draining the value out of the last few percentage points before certainty. The inverse-S curve on the figure is just this: small probabilities lifted up, middling-to-large ones pushed down. It is the probability-side companion to loss aversion, and together the two produce the fourfold pattern of risk-taking that follows.
Figure 19.2. Tversky-Kahneman (1992) probability weighting function. The inverse-S curve overweights small probabilities and underweights large ones. At $\delta = 1$ it collapses to the 45-degree line (EU). Drag the slider.
Note: This is the original Prospect Theory formulation (Kahneman & Tversky, 1979), which applies decision weights to individual probabilities. Cumulative Prospect Theory (Tversky & Kahneman, 1992) applies decision weights to cumulative probabilities of ranked outcomes, resolving certain anomalies such as violations of stochastic dominance.
The fourfold pattern: small $p$ + gains = risk seeking (lotteries); small $p$ + losses = risk aversion (insurance); large $p$ + gains = risk aversion (certainty effect); large $p$ + losses = risk seeking (desperate gambling).
Problem. A gamble offers $+\$1{,}000$ with prob 0.5 and $-\$800$ with prob 0.5. Reference point $r = 0$. (a) CE under EU with CRRA $u(x) = x^{0.5}$, $W = \$10{,}000$. (b) PT valuation with standard parameters. (c) Why does loss aversion reverse the evaluation?
Solution.
(a) $EU = 0.5(11{,}000)^{0.5} + 0.5(9{,}200)^{0.5} = 0.5(104.88) + 0.5(95.92) = 100.40$. CE: \$100.40^2 = 10{,}080$. CE change $= +80.2$. Agent accepts.
(b) $v(+1000) = 1000^{0.88} = 436.5$. $v(-800) = -2.25 \times 800^{0.88} = -2.25 \times 358.7 = -807.1$. With $w(0.5) \approx 0.439$: $V = 0.439(436.5) + 0.439(-807.1) = -162.6$. Agent rejects.
(c) Loss aversion ($\lambda = 2.25$) makes the \$800 loss weigh far more than the \$1,000 gain, flipping the evaluation.
Standard theory assumes exponential discounting with discount factor $\delta \in (0,1)$. The key property is time consistency: a plan made at $t=0$ remains optimal at every future date.
Experimental evidence overwhelmingly rejects constant discounting. People exhibit declining impatience: the discount rate between today and tomorrow is much higher than between day 100 and day 101.
The quasi-hyperbolic discount factors are $\{1, \beta\delta, \beta\delta^2, \ldots\}$. The immediate period receives weight 1, but all future periods are additionally discounted by $\beta$. When $\beta < 1$, there is a discrete drop between “now” and “the future.”
At $t=0$, the FOC for $c_1$ is $\beta\delta u'(c_1) = u'(c_0)$. At $t=1$, re-optimization gives $u'(c_1) = \beta\delta u'(c_2)$. The $\beta$ has shifted — the plan is time-inconsistent.
Why it matters: You discount the gap between today and tomorrow far more steeply than the gap between a year out and a year-and-a-day out — even though both are one-day delays. That single extra penalty on “now versus not-now” is present bias, and it is why you set the alarm for 6 a.m. and then hit snooze, why you plan to start the diet Monday and break it Monday night. The plan your today-self makes is not the plan your tomorrow-self wants to keep. A sophisticated person who sees this coming pays for a commitment device — the locked retirement account, the gym contract — to bind the future self the present self cannot trust. Drag the present-bias slider toward 1 and the conflict disappears.
A naive agent procrastinates indefinitely. A sophisticated agent uses backward induction and may employ commitment devices.
Figure 19.4. Beta-delta discounting explorer. The naive agent perpetually delays; the sophisticated agent uses backward induction. At $\beta = 1$, all lines collapse (no present bias). Drag sliders.
Problem. A student must complete a project. Cost today = 6 utils, benefit in 2 periods = 10 utils. $\beta = 0.7$, $\delta = 0.95$, 5 periods. (a) When does a naive agent act? (b) A sophisticated agent?
Solution.
(a) Naive: At each $t$, net of acting now $= -6 + 0.7 \times 0.95^2 \times 10 = -6 + 6.32 = +0.32$. Net of waiting (perceived) $= 0.7 \times 0.95 \times (-6) + 0.7 \times 0.95^3 \times 10 = -3.99 + 6.00 = +2.01$. Since \$1.01 > 0.32$, always delays. Procrastinates until the deadline.
(b) Sophisticated: Backward induction. At $t = 2$ (last feasible), net $= +0.32 > 0$, so the $t=2$ self acts. At $t = 1$: net now $= +0.32$, net of waiting for $t=2$ to act $= +2.01 > 0.32$, so waits. At $t = 0$: same, waits. Sophisticated agent acts at $t = 2$ — earlier than the naive agent's deadline.
Problem. Agent with $\beta = 0.7$, $\delta = 0.95$, log utility, income $Y = 100$ over 3 periods. (a) Savings without commitment. (b) With commitment. (c) Welfare gain.
Solution.
(a) Without: $t=0$ allocates $c_0 = 100/(1+0.665+0.632) = 43.54$, leaving 56.46. At $t=1$ re-optimization: $c_1 = 56.46/1.665 = 33.91$, $c_2 = 22.55$.
(b) With: $c_1 = 0.665 \times 100/2.297 = 28.95$, $c_2 = 0.632 \times 100/2.297 = 27.51$.
(c) Without: $U = 3.774 + 2.344 + 1.967 = 8.085$. With: $U = 3.774 + 2.237 + 2.095 = 8.106$. Gain $= 0.020$ utils. The committed agent achieves a smoother consumption path.
Decades of experimental evidence show people systematically deviate from pure self-interest: rejecting unfair offers, giving to strangers, cooperating in one-shot games, and punishing free-riders.
The constraints $\alpha_i \geq \beta_i$ and $\beta_i < 1$ are empirically motivated: envy hurts more than guilt, and no one destroys money just to equalize.
In the ultimatum game, the minimum acceptable offer $s^*$ satisfies $s - \alpha_R(100-2s) \geq 0$, giving $s^* = 100\alpha_R / (1+2\alpha_R)$. For $\alpha_R = 2$: $s^* = 40$.
Why it matters: People will pay, out of their own pocket, to punish someone who treats them unfairly. Offer a stranger a lopsided split of \$100 — \$80 for you, \$20 for them — and they will often reject it, walking away with nothing just to deny you the \$80. Pure self-interest says take the \$20; fairness says refuse. The model adds two feelings to the payoff: the sting of getting less than someone else (envy), which hurts more, and the discomfort of getting more (guilt), which hurts less. That asymmetry is why real offers cluster near a fair split rather than the textbook prediction of one cent. Drag the responder's envy slider and watch the minimum acceptable offer climb.
Figure 19.6. Fehr-Schmidt inequality aversion. Higher $\alpha$ (envy) raises the minimum acceptable offer. At $\alpha = \beta = 0$, standard theory: any positive offer is accepted. Drag sliders.
Figure 19.5. Ultimatum Game Simulator. Play as the proposer against different responder strategies. Track your earnings over rounds.
In dictator games, the average allocation is 20-30%. In public goods games, adding punishment sustains cooperation.
The trust game and these Fehr-Schmidt social-preference experiments are the microdata behind a walkthrough on whether trust functions as a form of capital — Trust as capital.
Problem. \$100 ultimatum game. Proposer: $\alpha_P = 0.5$, $\beta_P = 0.3$. Responder: $\alpha_R = 2.0$, $\beta_R = 0.6$. (a) Min acceptable offer. (b) Optimal offer. (c) Compare to standard Nash.
Solution.
(a) $U_R = s - 2.0(100-2s) = 5s - 200 \geq 0 \Rightarrow s^* = 40$.
(b) $U_P = (100-s) - 0.3(100-2s) = 70 - 0.4s$, decreasing in $s$. Minimize $s$ subject to $s \geq 40$: optimal offer $s^* = 40$. $U_P = 54$, $U_R = 0$.
(c) Standard preferences ($\alpha = \beta = 0$): offer \$1, accepted. Fehr-Schmidt: offer \$10. Much closer to experimental modal offers of 40-50%.
Herbert Simon (1955) argued that agents satisfice rather than optimize: searching until they find an acceptable option, then stopping.
Tversky and Kahneman (1974) identified three core heuristics: representativeness (judging probability by resemblance), availability (estimating frequency by ease of recall), and anchoring (adjusting insufficiently from an initial value).
Gabaix (2014) formalized bounded rationality as an optimization problem: agents maximize utility subject to attention cost $\theta$ per dimension. The agent perceives $\hat{p}_k = \bar{p}_k + m_k(p_k - \bar{p}_k)$.
Why it matters: Attention is scarce, and thinking is costly, so people do not optimize over the real world — they optimize over a simplified mental cartoon of it, paying attention only to the dimensions that seem to matter and ignoring the small stuff. This is not stupidity; it is economizing on a genuinely limited resource. It explains why shoppers notice the sticker price but miss the shipping fee, why we anchor on the first number we hear, why a tax that is folded into the price changes behavior less than one added at the register. Simon called it satisficing — good enough, not perfect. Gabaix put a price tag on the attention the perfect version would have required.
Lab experiments feature real monetary incentives, randomization, and control. Strength: internal validity. Weakness: external validity.
Field experiments embed manipulations in real-world settings: natural behavior, no awareness, large scale. Trade-off: less control for greater realism.
Demand effects: subjects may alter behavior because they know they are observed or infer experimenter intent. The deception debate: economics has a strong norm against deception, unlike psychology.
The replication crisis: only 36% of psychology studies replicated (Open Science Collaboration, 2015); economics is higher (~60%) but still concerning. Pre-registration addresses p-hacking and publication bias.
If choices depend on framing and defaults, then choice architecture — the way choices are presented — matters.
The most powerful nudge is the default. Organ donation: 15-20% in opt-in countries, 85-99% in opt-out. Retirement enrollment jumps from ~50% to over 90% with opt-out.
Under opt-in ($d=0$): $P = \Phi((v-k)/\sigma)$. Under opt-out ($d=1$): $P = \Phi(v/\sigma)$. The gap is largest when $v$ is positive but moderate and $k/\sigma$ is non-trivial.
Why it matters: Whatever the default is, most people keep it. Make retirement saving automatic-with-an-opt-out and enrollment jumps from about half to over ninety percent; flip organ donation from opt-in to opt-out and consent rates leap from the teens to the high nineties. The small effort of switching — finding the form, making the decision, acting — is enough to leave most people wherever they were placed. That hands enormous, quiet power to whoever sets the default, which is the entire premise of nudging. Drag the switching-cost slider toward zero and the opt-in and opt-out lines converge: when changing is truly effortless, the default stops mattering.
Figure 19.7. Default effect simulator. Higher switching costs widen the gap between opt-in and opt-out enrollment. At $k = 0$ the default does not matter. Drag the slider.
The EAST framework: Easy (reduce friction), Attractive (make salient), Social (leverage norms), Timely (prompt at receptive moments).
Sludge is friction that discourages desirable behavior. Reducing sludge is often as effective as introducing new nudges.
Bernheim and Rangel (2009): evaluate welfare based on choices free from behavioral distortions — when agents are well-informed, attentive, and undistorted.
When Thaler and Sunstein published Nudge in 2008, it seemed like a policy cheat code: redesign defaults and people save more, eat better, donate organs — all without restricting choice. Governments loved it. The UK created a "Nudge Unit," and Obama hired Sunstein as regulatory czar. But the backlash was fierce. Gilles Saint-Paul called it "the tyranny of utility" — technocrats deciding what's good for you while pretending to respect your freedom. Op-eds called nudging "manipulation by the state." Is libertarian paternalism a brilliant synthesis, or a contradiction in terms?
AdvancedThe efficient market hypothesis holds that prices fully reflect all information. Behavioral finance challenges this: many traders are not rational, and rational arbitrageurs face limits.
Overconfidence generates excess trading. Barber and Odean (2000): the most active traders earned 6.5 percentage points less per year than the least active.
The reference point is the purchase price. Gains in the concave region (risk-averse, sell early); losses in the convex region (risk-seeking, hold).
Stocks outperform over 3-12 months (momentum, Jegadeesh-Titman 1993) and underperform over 3-5 years (reversal, DeBondt-Thaler 1985).
Even rational traders may not correct mispricing: noise trader risk, implementation costs, and agency problems constrain them.
DeLong, Shleifer, Summers, and Waldmann (1990): higher $\mu$ pushes price from fundamentals; higher $\rho$ amplifies deviation; higher $\gamma$ (arbitrageur risk aversion) means less aggressive trading against mispricing, so the deviation increases.
Why it matters: The classic defense of efficient markets is that smart money fixes mistakes: if irrational traders push a price too high, arbitrageurs sell until it snaps back. This model shows why that defense leaks. Betting against a crowd of optimists is itself risky — the optimists can stay optimistic, and grow more so, long enough to wipe out the trader who bet against them (Keynes: markets can stay irrational longer than you can stay solvent). Knowing a price is wrong is not the same as being able to profit from it. So noise traders not only survive, they can move prices and earn high returns by bearing the very risk they create. Drag the arbitrageur-risk-aversion slider up and watch the price drift further from fundamentals — that gap is the “as if rational” story failing in the one market where it should have been strongest.
The paradox: noise traders can earn higher expected returns by bearing the risk they themselves created.
Figure 19.8. DSSW noise trader model. Noise trader sentiment pushes prices away from fundamentals. Risk-averse arbitrageurs cannot fully correct the mispricing. Drag sliders.
Problem. $f = 100$, $\rho = 0.30$, $\mu = 20$ (bullish), $r = 0.05$, $\gamma = 2$. (a) Compute equilibrium price. (b) Price deviation. (c) What if $\gamma = 0$?
Solution.
(a) $p = 100 + \frac{2 \times 0.30 \times 20}{1.05} = 100 + \frac{12}{1.05} = 100 + 11.43 = 111.43$.
(b) Deviation: $p - f = 11.43$. The asset is overpriced because noise traders push prices above fundamentals and risk-averse arbitrageurs don't fully counteract them.
(c) With $\gamma = 0$: $p = 100 + 0 = 100$. Risk-neutral arbitrageurs trade aggressively enough to eliminate mispricing entirely. The key DSSW insight: it is arbitrageur risk aversion ($\gamma > 0$) that allows noise-trader-driven deviations to persist.
The 2008 crisis is the event behavioral finance is most often applied to — the economic-history book narrates it: B-Ch.19 — The GFC and after.
Maya bundled a free cookie with every lemonade purchase as a summer promotion. Sales increased modestly — up 8%. When Maya removes the free cookie (returning to the original price), customer backlash is disproportionate: complaints, negative reviews, lost regulars. Sales drop 15% — below the pre-promotion baseline.
Prospect theory analysis. During the promotion, customers' reference point shifted from “lemonade” to “lemonade + cookie.” The gain from adding the cookie was $v(+\text{cookie}) = (\text{cookie\_value})^{0.88}$. But the loss from removing it is $v(-\text{cookie}) = -2.25 \times (\text{cookie\_value})^{0.88}$. The perceived loss is 2.25× the original gain. The promotion was a one-way ratchet: easy to give, painful to take away.
Maya designs a nudge experiment. For her loyalty program, Maya tests two enrollment designs as a field experiment: Treatment A (opt-in): customers can sign up at the counter. Treatment B (opt-out): every customer automatically gets a card; they can opt out. Using Eq. 19.9 with $v = 3$, $\sigma = 2$, $k = 2$: opt-in $P = \Phi(0.5) = 0.69$; opt-out $P = \Phi(1.5) = 0.93$. Maya's field experiment confirms the prediction. She switches to opt-out for the full rollout.
Kahneman and Tversky (1979). “Prospect Theory: An Analysis of Decision under Risk” is one of the most cited papers in economics. Published in Econometrica, it formalized experimental findings into a coherent mathematical framework. Kahneman received the Nobel Prize in 2002; Tversky had passed away in 1996.
Maurice Allais (1953). The French economist presented his paradox directly to Leonard Savage. Legend has it Savage himself fell into the Allais pattern. Allais received the Nobel Prize in 1988.
Richard Thaler (2017 Nobel). Thaler's “Anomalies” column systematically catalogued behavioral deviations. His 2008 book Nudge (with Sunstein) brought behavioral insights to policy, leading to “nudge units” worldwide.
David Laibson (1997). “Golden Eggs and Hyperbolic Discounting” formalized the beta-delta model and explained why people simultaneously hold credit card debt at 18% interest and illiquid savings at 5%.
Shleifer and Vishny (1997). “The Limits of Arbitrage” showed why rational traders cannot eliminate mispricing when they manage other people's money and face capital constraints.
| Label | Equation | Description |
|---|---|---|
| Eq. 19.1 | $EU(L) = \sum p_i u(x_i)$ | Expected utility |
| Eq. 19.2 | $v(x) = x^\alpha$ (gains), $-\lambda(-x)^\beta$ (losses) | Prospect theory value function |
| Eq. 19.3 | $w(p) = p^\delta / (p^\delta + (1-p)^\delta)^{1/\delta}$ | Tversky-Kahneman probability weighting |
| Eq. 19.4 | $V(L) = \sum w(p_i) v(x_i - r)$ | Prospect theory valuation |
| Eq. 19.5 | $U_0 = u(c_0) + \beta \sum \delta^t u(c_t)$ | Quasi-hyperbolic discounting |
| Eq. 19.6 | $\beta\delta u'(c_1) = u'(c_0) \neq \delta u'(c_1)$ | Time inconsistency |
| Eq. 19.7 | $U_i = x_i - \alpha_i \max(x_j-x_i,0) - \beta_i \max(x_i-x_j,0)$ | Fehr-Schmidt inequality aversion |
| Eq. 19.8 | $\max u(c) - \theta\|m\|_1$ s.t. $p \cdot c \leq w$ | Gabaix sparse maximization |
| Eq. 19.9 | $P_{\text{enroll}} = \Phi((v - k(1-d))/\sigma)$ | Default-sensitive enrollment |
| Eq. 19.10 | $p_t = f_t + \gamma \rho_t \mu_t / (1+r)$ | DSSW noise trader pricing |
Kahneman & Tversky (1979); Tversky & Kahneman (1992); Thaler (1980, 2015); Laibson (1997); Fehr & Schmidt (1999); Gabaix (2014); Shleifer & Vishny (1997); DeLong, Shleifer, Summers & Waldmann (1990).