Chapter 19Behavioral and Experimental Economics

Introduction

Every model in this book has assumed rational agents — consumers who maximize expected utility, firms that minimize costs, traders with consistent time preferences and correct beliefs. These assumptions are powerful: they yield sharp predictions, clean welfare theorems, and elegant mathematics. But are they true?

This chapter confronts the evidence. Behavioral economics documents predictable, systematic deviations from the standard rational model. These are patterned biases that survive repetition, incentives, and even expertise, not random errors that wash out in aggregation.

We begin with the cracks in expected utility theory — the Allais and Ellsberg paradoxes — and build toward prospect theory, the leading descriptive alternative. We then examine intertemporal choice under present bias, social preferences that violate pure self-interest, bounded rationality and heuristics, experimental methodology, nudge theory, and behavioral finance. Throughout, the approach is formal: we write down utility functions, derive predictions, and test them against data.

By the end of this chapter, you will be able to:
  1. Identify the Allais and Ellsberg paradoxes as formal violations of expected utility axioms
  2. State prospect theory's value function and probability weighting function with their parametric forms
  3. Model present bias using the quasi-hyperbolic (beta-delta) framework and derive time inconsistency
  4. Compute equilibrium outcomes under Fehr-Schmidt inequality aversion preferences
  5. Distinguish satisficing from optimizing and characterize sparse maximization
  6. Evaluate experimental design choices — lab vs field, demand effects, replication concerns
  7. Apply nudge theory and choice architecture to policy design
  8. Explain why rational arbitrage fails to eliminate behavioral mispricing in financial markets

Prerequisites: Expected utility theory (Ch. 6), game theory (Ch. 7), consumer theory (Ch. 6/10), econometrics basics (Ch. 9), mechanism design familiarity (Ch. 11).

19.1 Violations of Expected Utility

The Expected Utility Benchmark

Recall from Chapter 6 that under the axioms of completeness, transitivity, continuity, and the independence axiom, preferences over lotteries can be represented by expected utility:

Expected utility. A theory of decision under risk where an agent chooses the lottery $L$ that maximizes $EU(L) = \sum_{i=1}^{n} p_i \, u(x_i)$, where $p_i$ are objective probabilities, $x_i$ are outcomes, and $u(\cdot)$ is a Bernoulli utility function defined over final wealth. EU is the benchmark against which all behavioral deviations are measured.
$$EU(L) = \sum_{i=1}^{n} p_i \, u(x_i)$$ (Eq. 19.1)
Independence axiom. If lottery $A$ is preferred to $B$, then a mixture $pA + (1-p)C$ must be preferred to $pB + (1-p)C$ for any lottery $C$ and any probability $p \in (0,1)$. Mixing in a common component $C$ should not reverse your ranking of $A$ versus $B$. The Allais paradox demonstrates systematic violations of this axiom.

Independence is elegant and normatively appealing. It says your preference between two gambles should not be swayed by an irrelevant common component. But as Maurice Allais demonstrated in 1953, most human beings violate it consistently.

The Allais Paradox (1953)

Allais paradox. The empirical finding (Allais, 1953) that most people prefer a certain \$1 million over a risky gamble with higher expected value (the certainty effect), yet simultaneously prefer a riskier gamble when both options involve uncertainty. These joint preferences violate the independence axiom of expected utility theory.

Consider two pairs of lotteries:

Pair 1: Gamble 1A: \$1M with certainty. Gamble 1B: \$5M with prob 0.10, \$1M with prob 0.89, \$0 with prob 0.01.

Pair 2: Gamble 2A: \$1M with prob 0.11, \$0 with prob 0.89. Gamble 2B: \$5M with prob 0.10, \$0 with prob 0.90.

The modal pattern: most people choose 1A over 1B and 2B over 2A. This joint choice $\{1A, 2B\}$ violates the independence axiom.

Common consequence effect. A special case of the independence axiom violation in which preferences reverse when a common consequence shared by both options in a pair is altered. In the Allais paradox, the 0.89 probability component common to both gambles in each pair is the common consequence.

By independence, replacing the common consequence (\$1M in Pair 1, \$0 in Pair 2) should not change the ranking. If $1A \succ 1B$, then $1A \succ 2B$. The reversal reveals a certainty effect.

Intuition

Why it matters: The certainty effect is the whole story without the algebra: a sure thing feels categorically different from a 99% thing. Going from 99% to 100% buys you peace of mind that going from 10% to 11% does not — even though the extra one percentage point is identical. Expected utility says those two extra points should be worth exactly the same. Most people, faced with the Allais pairs, prove they are not. That gap between “almost certain” and “certain” is what the independence axiom cannot see and what every behavioral model after it is built to capture.

The Ellsberg Paradox (1961)

Ellsberg paradox. The empirical finding (Ellsberg, 1961) that people prefer gambles with known probabilities over gambles with unknown (ambiguous) probabilities, even when EU theory predicts indifference. This reveals ambiguity aversion.

Consider an urn with 30 red balls and 60 balls that are black or yellow in unknown proportions. Gamble A: win \$100 if red (prob 1/3, known). Gamble B: win \$100 if black (prob unknown). Most choose A.

But then: Gamble C: win \$100 if red or yellow. Gamble D: win \$100 if black or yellow. Most choose D. Under EU, $A \succ B$ requires $C \succ D$. The joint choice $\{A, D\}$ violates the Sure-Thing Principle.

Ambiguity aversion. The preference for known probabilities over unknown ones. An ambiguity-averse agent prefers a 50/50 gamble from a known urn over an equivalent gamble from an urn with unknown composition. This violates the Savage axioms underlying subjective expected utility.

These paradoxes reveal that the independence axiom fails descriptively. We need a theory that accommodates these violations.

Figure 19.3. Allais Paradox Detector. Select your preferred gamble in each pair, then check whether your choices violate the independence axiom.

Pair 1

1A: \$1M for sure
1B: 10% \$5M, 89% \$1M, 1% \$0

Pair 2

2A: 11% \$1M, 89% \$0
2B: 10% \$5M, 90% \$0
Example 19.1 — Allais Paradox Calculation

Problem. Two lottery pairs. Assume CRRA utility $u(x) = x^{0.5}$ (with $x$ in millions). (a) Compute EU of each gamble. (b) Which does EU recommend? (c) Show {1A, 2B} violates independence.

Solution.

(a) $EU(1A) = 1.0 \times 1^{0.5} = 1.000$. $EU(1B) = 0.89(1) + 0.10(2.236) + 0.01(0) = 1.1136$. $EU(2A) = 0.11(1) = 0.11$. $EU(2B) = 0.10(2.236) = 0.2236$.

(b) EU recommends 1B (1.114 > 1.000) and 2B (0.224 > 0.110). EU-consistent pairs: {1A, 2A} or {1B, 2B}.

(c) $1A \succ 1B$ requires \$1.11 \, u(1) > 0.10 \, u(5) + 0.01 \, u(0)$. \$1B \succ 2A$ requires \$1.10 \, u(5) + 0.01 \, u(0) > 0.11 \, u(1)$. These directly contradict. No $u(\cdot)$ satisfies both.


19.2 Prospect Theory

From EU to Prospect Theory

Kahneman and Tversky (1979) proposed prospect theory as a descriptive alternative, later refined as cumulative prospect theory (1992). It modifies EU in four ways: reference dependence, loss aversion, diminishing sensitivity, and probability weighting.

Prospect theory. A descriptive theory of decision under risk (Kahneman and Tversky, 1979) that replaces expected utility with four key modifications: reference dependence, loss aversion, diminishing sensitivity (concave for gains, convex for losses), and probability weighting.

The Value Function

The value function replaces $u(x)$ defined over final wealth with $v(x)$ defined over gains and losses relative to a reference point:

Value function (S-shaped, kinked). The prospect theory analog of the utility function, defined over deviations from a reference point rather than final wealth levels. It is concave for gains, convex for losses, and displays a kink at the reference point where the slope for losses exceeds the slope for gains by the loss aversion coefficient $\lambda$.
$$v(x) = \begin{cases} x^{\alpha} & \text{if } x \geq 0 \\ -\lambda(-x)^{\beta} & \text{if } x < 0 \end{cases}$$ (Eq. 19.2)

The parameters estimated by Tversky and Kahneman (1992) are $\alpha = \beta = 0.88$ and $\lambda = 2.25$.

Three properties: (1) Reference dependence — outcomes are coded as gains or losses relative to $r$. (2) Diminishing sensitivity — $\alpha, \beta < 1$ gives concavity for gains and convexity for losses. (3) Loss aversion — $\lambda > 1$ makes the value function steeper for losses.

Intuition

Why it matters: Two facts about how outcomes feel do all the work here. First, losses loom larger than gains — losing \$100 stings about 2.25 times as much as winning \$100 feels good — which is why the curve below the reference point drops faster than it rises above it. Second, what counts as a gain or a loss depends entirely on where you start: the same \$50,000 salary is a triumph after \$30,000 and a wound after \$70,000. The kink at the reference point is loss aversion; the bend of each arm is the fading of sensitivity as you move away from it. Drag the sliders on the figure and watch the steepness of the loss arm change — that steepness is $\lambda$.

Loss aversion. The empirical finding that losses loom larger than equivalent gains: $|v(-x)| > v(x)$ for $x > 0$. The loss aversion coefficient $\lambda \approx 2.25$ means losing \$100 feels about 2.25 times worse than gaining \$100 feels good.
Reference dependence. The principle that outcomes are evaluated as gains or losses relative to a reference point, not as final wealth states. The reference point is typically the status quo, but can be expectations, aspirations, or social comparisons.

Figure 19.1. Prospect theory value function. The S-shaped curve is concave for gains and convex for losses, with a steeper slope for losses (loss aversion). At $\alpha = \beta = \lambda = 1$ it collapses to linear (EU). Drag sliders to explore.

The Probability Weighting Function

Probability weighting function. The function $w(p)$ that transforms objective probabilities into subjective decision weights. It overweights small probabilities ($w(p) > p$ for small $p$), underweights moderate-to-large probabilities, and satisfies $w(0)=0$, $w(1)=1$.
$$w(p) = \dfrac{p^{\delta}}{(p^{\delta} + (1-p)^{\delta})^{1/\delta}}$$ (Eq. 19.3)

The Tversky-Kahneman (1992) parameter $\delta \approx 0.65$. When $\delta = 1$, $w(p) = p$ (EU). When $\delta < 1$, the function overweights small probabilities and underweights large ones. Crossover at $p \approx 0.37$.

Intuition

Why it matters: People do not treat probabilities the way a calculator does. A tiny chance feels bigger than it is — which is why we buy lottery tickets and insurance against rare disasters — while a near-sure thing feels less sure than it is, draining the value out of the last few percentage points before certainty. The inverse-S curve on the figure is just this: small probabilities lifted up, middling-to-large ones pushed down. It is the probability-side companion to loss aversion, and together the two produce the fourfold pattern of risk-taking that follows.

Figure 19.2. Tversky-Kahneman (1992) probability weighting function. The inverse-S curve overweights small probabilities and underweights large ones. At $\delta = 1$ it collapses to the 45-degree line (EU). Drag the slider.

The Prospect Theory Valuation

$$V(L) = \sum_{i} w(p_i) \, v(x_i - r)$$ (Eq. 19.4)

Note: This is the original Prospect Theory formulation (Kahneman & Tversky, 1979), which applies decision weights to individual probabilities. Cumulative Prospect Theory (Tversky & Kahneman, 1992) applies decision weights to cumulative probabilities of ranked outcomes, resolving certain anomalies such as violations of stochastic dominance.

The Endowment Effect

Endowment effect. The tendency to value an item more highly once you own it than you would pay to acquire it. Follows from loss aversion: giving up an owned item is coded as a loss.

The Fourfold Pattern of Risk Attitudes

Fourfold pattern of risk attitudes. The combination of the S-shaped value function and probability weighting generates four distinct risk attitudes: risk seeking for small-probability gains, risk aversion for small-probability losses, risk aversion for high-probability gains, and risk seeking for high-probability losses.

The fourfold pattern: small $p$ + gains = risk seeking (lotteries); small $p$ + losses = risk aversion (insurance); large $p$ + gains = risk aversion (certainty effect); large $p$ + losses = risk seeking (desperate gambling).

Framing Effects and Mental Accounting

Framing effect. The phenomenon where the way a choice is presented (framed) affects decisions, even when the objective outcomes are identical.
Mental accounting. The cognitive process of organizing financial decisions into separate “accounts” rather than treating wealth as fungible.
Example 19.2 — Prospect Theory vs EU Valuation

Problem. A gamble offers $+\$1{,}000$ with prob 0.5 and $-\$800$ with prob 0.5. Reference point $r = 0$. (a) CE under EU with CRRA $u(x) = x^{0.5}$, $W = \$10{,}000$. (b) PT valuation with standard parameters. (c) Why does loss aversion reverse the evaluation?

Solution.

(a) $EU = 0.5(11{,}000)^{0.5} + 0.5(9{,}200)^{0.5} = 0.5(104.88) + 0.5(95.92) = 100.40$. CE: \$100.40^2 = 10{,}080$. CE change $= +80.2$. Agent accepts.

(b) $v(+1000) = 1000^{0.88} = 436.5$. $v(-800) = -2.25 \times 800^{0.88} = -2.25 \times 358.7 = -807.1$. With $w(0.5) \approx 0.439$: $V = 0.439(436.5) + 0.439(-807.1) = -162.6$. Agent rejects.

(c) Loss aversion ($\lambda = 2.25$) makes the \$800 loss weigh far more than the \$1,000 gain, flipping the evaluation.


19.3 Intertemporal Choice and Present Bias

The Exponential Benchmark

Standard theory assumes exponential discounting with discount factor $\delta \in (0,1)$. The key property is time consistency: a plan made at $t=0$ remains optimal at every future date.

Hyperbolic and Quasi-Hyperbolic Discounting

Experimental evidence overwhelmingly rejects constant discounting. People exhibit declining impatience: the discount rate between today and tomorrow is much higher than between day 100 and day 101.

Hyperbolic discounting. A model of time preference in which the discount function takes the form $D(t) = (1+kt)^{-1}$ rather than the exponential $\delta^t$. Generates declining discount rates and time-inconsistent preferences.
Quasi-hyperbolic (beta-delta) discounting. A tractable model of present bias (Laibson, 1997) where the discount function is $\{1, \beta\delta, \beta\delta^2, \ldots\}$ with $\beta \leq 1$. When $\beta = 1$, reduces to exponential discounting.
$$U_0 = u(c_0) + \beta \sum_{t=1}^{T} \delta^t \, u(c_t)$$ (Eq. 19.5)

The quasi-hyperbolic discount factors are $\{1, \beta\delta, \beta\delta^2, \ldots\}$. The immediate period receives weight 1, but all future periods are additionally discounted by $\beta$. When $\beta < 1$, there is a discrete drop between “now” and “the future.”

Present bias. The tendency to give disproportionate weight to immediate payoffs relative to future payoffs, beyond what exponential discounting would imply. Captured by $\beta < 1$ in the beta-delta model.

Time Inconsistency

$$\beta\delta u'(c_1) = u'(c_0) \neq \delta u'(c_1)$$ (Eq. 19.6)

At $t=0$, the FOC for $c_1$ is $\beta\delta u'(c_1) = u'(c_0)$. At $t=1$, re-optimization gives $u'(c_1) = \beta\delta u'(c_2)$. The $\beta$ has shifted — the plan is time-inconsistent.

Intuition

Why it matters: You discount the gap between today and tomorrow far more steeply than the gap between a year out and a year-and-a-day out — even though both are one-day delays. That single extra penalty on “now versus not-now” is present bias, and it is why you set the alarm for 6 a.m. and then hit snooze, why you plan to start the diet Monday and break it Monday night. The plan your today-self makes is not the plan your tomorrow-self wants to keep. A sophisticated person who sees this coming pays for a commitment device — the locked retirement account, the gym contract — to bind the future self the present self cannot trust. Drag the present-bias slider toward 1 and the conflict disappears.

Naive vs Sophisticated Agents

Naive agent. A present-biased agent who incorrectly believes their future selves will behave as exponential discounters ($\beta=1$ in the future). Perpetually postpones costly actions.
Sophisticated agent. A present-biased agent who correctly anticipates future present bias. Uses backward induction and may seek commitment devices.
Commitment device. Any mechanism an agent voluntarily adopts to restrict their own future choice set. Examples: illiquid retirement accounts, deadline commitments, automatic payroll deductions.

A naive agent procrastinates indefinitely. A sophisticated agent uses backward induction and may employ commitment devices.

Figure 19.4. Beta-delta discounting explorer. The naive agent perpetually delays; the sophisticated agent uses backward induction. At $\beta = 1$, all lines collapse (no present bias). Drag sliders.

Example 19.3 — Beta-Delta Procrastination

Problem. A student must complete a project. Cost today = 6 utils, benefit in 2 periods = 10 utils. $\beta = 0.7$, $\delta = 0.95$, 5 periods. (a) When does a naive agent act? (b) A sophisticated agent?

Solution.

(a) Naive: At each $t$, net of acting now $= -6 + 0.7 \times 0.95^2 \times 10 = -6 + 6.32 = +0.32$. Net of waiting (perceived) $= 0.7 \times 0.95 \times (-6) + 0.7 \times 0.95^3 \times 10 = -3.99 + 6.00 = +2.01$. Since \$1.01 > 0.32$, always delays. Procrastinates until the deadline.

(b) Sophisticated: Backward induction. At $t = 2$ (last feasible), net $= +0.32 > 0$, so the $t=2$ self acts. At $t = 1$: net now $= +0.32$, net of waiting for $t=2$ to act $= +2.01 > 0.32$, so waits. At $t = 0$: same, waits. Sophisticated agent acts at $t = 2$ — earlier than the naive agent's deadline.

Example 19.4 — Commitment Device Value

Problem. Agent with $\beta = 0.7$, $\delta = 0.95$, log utility, income $Y = 100$ over 3 periods. (a) Savings without commitment. (b) With commitment. (c) Welfare gain.

Solution.

(a) Without: $t=0$ allocates $c_0 = 100/(1+0.665+0.632) = 43.54$, leaving 56.46. At $t=1$ re-optimization: $c_1 = 56.46/1.665 = 33.91$, $c_2 = 22.55$.

(b) With: $c_1 = 0.665 \times 100/2.297 = 28.95$, $c_2 = 0.632 \times 100/2.297 = 27.51$.

(c) Without: $U = 3.774 + 2.344 + 1.967 = 8.085$. With: $U = 3.774 + 2.237 + 2.095 = 8.106$. Gain $= 0.020$ utils. The committed agent achieves a smoother consumption path.


19.4 Social Preferences

Beyond Self-Interest

Decades of experimental evidence show people systematically deviate from pure self-interest: rejecting unfair offers, giving to strangers, cooperating in one-shot games, and punishing free-riders.

Inequality aversion. A preference model in which agents dislike unequal outcomes — both when behind (envy) and when ahead (guilt). Fehr and Schmidt (1999) formalized this with parameters $\alpha$ for envy and $\beta$ for guilt.

The Fehr-Schmidt Utility Function

Fehr-Schmidt utility. A utility function that modifies self-interested payoffs by subtracting disutility from inequality: $U_i = x_i - \alpha_i \max(x_j - x_i, 0) - \beta_i \max(x_i - x_j, 0)$.
$$U_i(x) = x_i - \alpha_i \max(x_j - x_i, 0) - \beta_i \max(x_i - x_j, 0)$$ (Eq. 19.7)

The constraints $\alpha_i \geq \beta_i$ and $\beta_i < 1$ are empirically motivated: envy hurts more than guilt, and no one destroys money just to equalize.

The Ultimatum Game

In the ultimatum game, the minimum acceptable offer $s^*$ satisfies $s - \alpha_R(100-2s) \geq 0$, giving $s^* = 100\alpha_R / (1+2\alpha_R)$. For $\alpha_R = 2$: $s^* = 40$.

Intuition

Why it matters: People will pay, out of their own pocket, to punish someone who treats them unfairly. Offer a stranger a lopsided split of \$100 — \$80 for you, \$20 for them — and they will often reject it, walking away with nothing just to deny you the \$80. Pure self-interest says take the \$20; fairness says refuse. The model adds two feelings to the payoff: the sting of getting less than someone else (envy), which hurts more, and the discomfort of getting more (guilt), which hurts less. That asymmetry is why real offers cluster near a fair split rather than the textbook prediction of one cent. Drag the responder's envy slider and watch the minimum acceptable offer climb.

Figure 19.6. Fehr-Schmidt inequality aversion. Higher $\alpha$ (envy) raises the minimum acceptable offer. At $\alpha = \beta = 0$, standard theory: any positive offer is accepted. Drag sliders.

Figure 19.5. Ultimatum Game Simulator. Play as the proposer against different responder strategies. Track your earnings over rounds.

0
Round
\$0
Your Earnings
\$0
Responder Earnings
0
Rejections
\$0
Avg Accepted Offer

Dictator Games and Public Goods

In dictator games, the average allocation is 20-30%. In public goods games, adding punishment sustains cooperation.

Example 19.5 — Fehr-Schmidt Ultimatum Game

Problem. \$100 ultimatum game. Proposer: $\alpha_P = 0.5$, $\beta_P = 0.3$. Responder: $\alpha_R = 2.0$, $\beta_R = 0.6$. (a) Min acceptable offer. (b) Optimal offer. (c) Compare to standard Nash.

Solution.

(a) $U_R = s - 2.0(100-2s) = 5s - 200 \geq 0 \Rightarrow s^* = 40$.

(b) $U_P = (100-s) - 0.3(100-2s) = 70 - 0.4s$, decreasing in $s$. Minimize $s$ subject to $s \geq 40$: optimal offer $s^* = 40$. $U_P = 54$, $U_R = 0$.

(c) Standard preferences ($\alpha = \beta = 0$): offer \$1, accepted. Fehr-Schmidt: offer \$10. Much closer to experimental modal offers of 40-50%.


19.5 Bounded Rationality and Heuristics

Simon's Satisficing

Herbert Simon (1955) argued that agents satisfice rather than optimize: searching until they find an acceptable option, then stopping.

Satisficing. A decision procedure (Simon, 1955) in which the agent sets an aspiration level and chooses the first option that meets it, rather than comparing all alternatives.
Bounded rationality. The recognition (Simon, 1955) that human decision-making is constrained by cognitive limitations — finite memory, limited attention, and computational costs.

Heuristics and Biases

Tversky and Kahneman (1974) identified three core heuristics: representativeness (judging probability by resemblance), availability (estimating frequency by ease of recall), and anchoring (adjusting insufficiently from an initial value).

Gabaix's Sparse Maximization

Sparse maximization. A model of bounded rationality (Gabaix, 2014) in which the agent maximizes utility minus the cost of attention. The agent allocates attention optimally, attending more to dimensions that matter.
$$\max_{\mathbf{c}} \, u(\mathbf{c}) - \theta \|\mathbf{m}\|_1 \quad \text{s.t. } \mathbf{p} \cdot \mathbf{c} \leq w$$ (Eq. 19.8)

Gabaix (2014) formalized bounded rationality as an optimization problem: agents maximize utility subject to attention cost $\theta$ per dimension. The agent perceives $\hat{p}_k = \bar{p}_k + m_k(p_k - \bar{p}_k)$.

Intuition

Why it matters: Attention is scarce, and thinking is costly, so people do not optimize over the real world — they optimize over a simplified mental cartoon of it, paying attention only to the dimensions that seem to matter and ignoring the small stuff. This is not stupidity; it is economizing on a genuinely limited resource. It explains why shoppers notice the sticker price but miss the shipping fee, why we anchor on the first number we hear, why a tax that is folded into the price changes behavior less than one added at the register. Simon called it satisficing — good enough, not perfect. Gabaix put a price tag on the attention the perfect version would have required.


19.6 Experimental Design and Methodology

Lab Experiments

Lab experiments feature real monetary incentives, randomization, and control. Strength: internal validity. Weakness: external validity.

Field Experiments

Field experiments embed manipulations in real-world settings: natural behavior, no awareness, large scale. Trade-off: less control for greater realism.

Methodological Challenges

Demand effects: subjects may alter behavior because they know they are observed or infer experimenter intent. The deception debate: economics has a strong norm against deception, unlike psychology.

The replication crisis: only 36% of psychology studies replicated (Open Science Collaboration, 2015); economics is higher (~60%) but still concerning. Pre-registration addresses p-hacking and publication bias.


19.7 Nudge Theory and Libertarian Paternalism

Choice Architecture

If choices depend on framing and defaults, then choice architecture — the way choices are presented — matters.

Choice architecture. The design of the environment in which people make decisions, including the order of options, default settings, information display, and physical arrangement.
Nudge. Any aspect of choice architecture that alters behavior predictably without forbidding options or significantly changing economic incentives (Thaler and Sunstein, 2008).
Libertarian paternalism. A philosophy preserving freedom of choice while steering choices toward welfare-improving outcomes through nudges rather than mandates.

Default Effects

Default effect. The disproportionate tendency to stick with the pre-selected option, even when switching is easy and costless.

The most powerful nudge is the default. Organ donation: 15-20% in opt-in countries, 85-99% in opt-out. Retirement enrollment jumps from ~50% to over 90% with opt-out.

$$P_{\text{enroll}} = \Phi\!\left(\frac{v - k \cdot (1-d)}{\sigma}\right)$$ (Eq. 19.9)

Under opt-in ($d=0$): $P = \Phi((v-k)/\sigma)$. Under opt-out ($d=1$): $P = \Phi(v/\sigma)$. The gap is largest when $v$ is positive but moderate and $k/\sigma$ is non-trivial.

Intuition

Why it matters: Whatever the default is, most people keep it. Make retirement saving automatic-with-an-opt-out and enrollment jumps from about half to over ninety percent; flip organ donation from opt-in to opt-out and consent rates leap from the teens to the high nineties. The small effort of switching — finding the form, making the decision, acting — is enough to leave most people wherever they were placed. That hands enormous, quiet power to whoever sets the default, which is the entire premise of nudging. Drag the switching-cost slider toward zero and the opt-in and opt-out lines converge: when changing is truly effortless, the default stops mattering.

Figure 19.7. Default effect simulator. Higher switching costs widen the gap between opt-in and opt-out enrollment. At $k = 0$ the default does not matter. Drag the slider.

The EAST Framework

EAST framework. A practical guide for nudge design: make desired behaviors Easy, Attractive, Social, and Timely.

The EAST framework: Easy (reduce friction), Attractive (make salient), Social (leverage norms), Timely (prompt at receptive moments).

Sludge

Sludge. Friction deliberately or inadvertently added to a process that discourages desirable behavior. The opposite of a nudge.

Sludge is friction that discourages desirable behavior. Reducing sludge is often as effective as introducing new nudges.

Behavioral Welfare Economics

Bernheim and Rangel (2009): evaluate welfare based on choices free from behavioral distortions — when agents are well-informed, attentive, and undistorted.


Take

'Libertarian paternalism is just paternalism with better PR' — Gilles Saint-Paul, The Tyranny of Utility

When Thaler and Sunstein published Nudge in 2008, it seemed like a policy cheat code: redesign defaults and people save more, eat better, donate organs — all without restricting choice. Governments loved it. The UK created a "Nudge Unit," and Obama hired Sunstein as regulatory czar. But the backlash was fierce. Gilles Saint-Paul called it "the tyranny of utility" — technocrats deciding what's good for you while pretending to respect your freedom. Op-eds called nudging "manipulation by the state." Is libertarian paternalism a brilliant synthesis, or a contradiction in terms?

Advanced

19.8 Behavioral Finance

Market Efficiency and Its Challengers

The efficient market hypothesis holds that prices fully reflect all information. Behavioral finance challenges this: many traders are not rational, and rational arbitrageurs face limits.

Overconfidence and Excess Trading

Overconfidence generates excess trading. Barber and Odean (2000): the most active traders earned 6.5 percentage points less per year than the least active.

The Disposition Effect

Disposition effect. The tendency to sell winning assets too early and hold losing assets too long. Follows from prospect theory: gains are in the concave (risk-averse) region, losses in the convex (risk-seeking) region.

The reference point is the purchase price. Gains in the concave region (risk-averse, sell early); losses in the convex region (risk-seeking, hold).

Momentum and Reversal Anomalies

Stocks outperform over 3-12 months (momentum, Jegadeesh-Titman 1993) and underperform over 3-5 years (reversal, DeBondt-Thaler 1985).

Limits to Arbitrage

Limits to arbitrage. The conditions under which rational arbitrageurs cannot fully eliminate mispricing: fundamental risk, noise trader risk, implementation costs, and agency problems.
Noise trader. An investor who trades on sentiment rather than fundamental analysis. Introduces unpredictable price distortions. In the DSSW model, noise traders can survive and even prosper.

Even rational traders may not correct mispricing: noise trader risk, implementation costs, and agency problems constrain them.

The DSSW Noise Trader Model

$$p_t = f_t + \frac{\gamma \, \rho_t \, \mu_t}{1+r}$$ (Eq. 19.10)

DeLong, Shleifer, Summers, and Waldmann (1990): higher $\mu$ pushes price from fundamentals; higher $\rho$ amplifies deviation; higher $\gamma$ (arbitrageur risk aversion) means less aggressive trading against mispricing, so the deviation increases.

Intuition

Why it matters: The classic defense of efficient markets is that smart money fixes mistakes: if irrational traders push a price too high, arbitrageurs sell until it snaps back. This model shows why that defense leaks. Betting against a crowd of optimists is itself risky — the optimists can stay optimistic, and grow more so, long enough to wipe out the trader who bet against them (Keynes: markets can stay irrational longer than you can stay solvent). Knowing a price is wrong is not the same as being able to profit from it. So noise traders not only survive, they can move prices and earn high returns by bearing the very risk they create. Drag the arbitrageur-risk-aversion slider up and watch the price drift further from fundamentals — that gap is the “as if rational” story failing in the one market where it should have been strongest.

The paradox: noise traders can earn higher expected returns by bearing the risk they themselves created.

Figure 19.8. DSSW noise trader model. Noise trader sentiment pushes prices away from fundamentals. Risk-averse arbitrageurs cannot fully correct the mispricing. Drag sliders.

Example 19.6 — Noise Trader Pricing

Problem. $f = 100$, $\rho = 0.30$, $\mu = 20$ (bullish), $r = 0.05$, $\gamma = 2$. (a) Compute equilibrium price. (b) Price deviation. (c) What if $\gamma = 0$?

Solution.

(a) $p = 100 + \frac{2 \times 0.30 \times 20}{1.05} = 100 + \frac{12}{1.05} = 100 + 11.43 = 111.43$.

(b) Deviation: $p - f = 11.43$. The asset is overpriced because noise traders push prices above fundamentals and risk-averse arbitrageurs don't fully counteract them.

(c) With $\gamma = 0$: $p = 100 + 0 = 100$. Risk-neutral arbitrageurs trade aggressively enough to eliminate mispricing entirely. The key DSSW insight: it is arbitrageur risk aversion ($\gamma > 0$) that allows noise-trader-driven deviations to persist.


Thread Example: Maya's Enterprise

Maya bundled a free cookie with every lemonade purchase as a summer promotion. Sales increased modestly — up 8%. When Maya removes the free cookie (returning to the original price), customer backlash is disproportionate: complaints, negative reviews, lost regulars. Sales drop 15% — below the pre-promotion baseline.

Prospect theory analysis. During the promotion, customers' reference point shifted from “lemonade” to “lemonade + cookie.” The gain from adding the cookie was $v(+\text{cookie}) = (\text{cookie\_value})^{0.88}$. But the loss from removing it is $v(-\text{cookie}) = -2.25 \times (\text{cookie\_value})^{0.88}$. The perceived loss is 2.25× the original gain. The promotion was a one-way ratchet: easy to give, painful to take away.

Maya designs a nudge experiment. For her loyalty program, Maya tests two enrollment designs as a field experiment: Treatment A (opt-in): customers can sign up at the counter. Treatment B (opt-out): every customer automatically gets a card; they can opt out. Using Eq. 19.9 with $v = 3$, $\sigma = 2$, $k = 2$: opt-in $P = \Phi(0.5) = 0.69$; opt-out $P = \Phi(1.5) = 0.93$. Maya's field experiment confirms the prediction. She switches to opt-out for the full rollout.

Historical Lens

Kahneman and Tversky (1979). “Prospect Theory: An Analysis of Decision under Risk” is one of the most cited papers in economics. Published in Econometrica, it formalized experimental findings into a coherent mathematical framework. Kahneman received the Nobel Prize in 2002; Tversky had passed away in 1996.

Maurice Allais (1953). The French economist presented his paradox directly to Leonard Savage. Legend has it Savage himself fell into the Allais pattern. Allais received the Nobel Prize in 1988.

Richard Thaler (2017 Nobel). Thaler's “Anomalies” column systematically catalogued behavioral deviations. His 2008 book Nudge (with Sunstein) brought behavioral insights to policy, leading to “nudge units” worldwide.

David Laibson (1997). “Golden Eggs and Hyperbolic Discounting” formalized the beta-delta model and explained why people simultaneously hold credit card debt at 18% interest and illiquid savings at 5%.

Shleifer and Vishny (1997). “The Limits of Arbitrage” showed why rational traders cannot eliminate mispricing when they manage other people's money and face capital constraints.

Summary

  1. Expected utility violations. The Allais paradox (certainty effect) and Ellsberg paradox (ambiguity aversion) demonstrate that EU axioms fail descriptively.
  2. Prospect theory. Kahneman and Tversky's alternative features reference dependence, loss aversion ($\lambda \approx 2.25$), diminishing sensitivity, and probability weighting. The fourfold pattern explains simultaneous lottery-ticket buying and insurance purchasing.
  3. Present bias. The quasi-hyperbolic model ($\beta < 1$) captures disproportionate weight on immediate payoffs, generating time inconsistency, procrastination, and demand for commitment devices.
  4. Social preferences. Fehr-Schmidt inequality aversion explains rejections of unfair offers, positive giving in dictator games, and conditional cooperation.
  5. Bounded rationality. Heuristics (representativeness, availability, anchoring) produce systematic biases. Gabaix's sparse maximization formalizes bounded rationality as optimal attention allocation.
  6. Experimental methodology. Lab experiments offer internal validity; field experiments offer external validity. The replication crisis has driven pre-registration and more rigorous standards.
  7. Nudge theory. Choice architecture is inevitable; libertarian paternalism uses defaults, framing, and simplification to improve welfare without restricting choice. The EAST framework operationalizes this.
  8. Behavioral finance. Overconfidence drives excess trading. The disposition effect follows from prospect theory. Limits to arbitrage (Shleifer-Vishny) and noise trader risk (DSSW) explain why mispricing persists.

Key Equations

LabelEquationDescription
Eq. 19.1$EU(L) = \sum p_i u(x_i)$Expected utility
Eq. 19.2$v(x) = x^\alpha$ (gains), $-\lambda(-x)^\beta$ (losses)Prospect theory value function
Eq. 19.3$w(p) = p^\delta / (p^\delta + (1-p)^\delta)^{1/\delta}$Tversky-Kahneman probability weighting
Eq. 19.4$V(L) = \sum w(p_i) v(x_i - r)$Prospect theory valuation
Eq. 19.5$U_0 = u(c_0) + \beta \sum \delta^t u(c_t)$Quasi-hyperbolic discounting
Eq. 19.6$\beta\delta u'(c_1) = u'(c_0) \neq \delta u'(c_1)$Time inconsistency
Eq. 19.7$U_i = x_i - \alpha_i \max(x_j-x_i,0) - \beta_i \max(x_i-x_j,0)$Fehr-Schmidt inequality aversion
Eq. 19.8$\max u(c) - \theta\|m\|_1$ s.t. $p \cdot c \leq w$Gabaix sparse maximization
Eq. 19.9$P_{\text{enroll}} = \Phi((v - k(1-d))/\sigma)$Default-sensitive enrollment
Eq. 19.10$p_t = f_t + \gamma \rho_t \mu_t / (1+r)$DSSW noise trader pricing

Practice

  1. A lottery pays $+\$500$ with probability 0.6 and $-\$300$ with probability 0.4. Compute the valuation under (a) expected utility with $u(x) = \ln(W+x)$, $W = 10{,}000$, and (b) prospect theory with $\alpha = \beta = 0.88$, $\lambda = 2.25$, $\delta = 0.65$. Does the agent accept or reject under each model?
  2. Prove algebraically that the choice pattern $\{1A, 2B\}$ in the Allais paradox violates the independence axiom. Write the expected utility of each gamble in terms of a general $u(\cdot)$ and show no $u$ can rationalize both preferences.
  3. An agent has $\beta = 0.8$, $\delta = 0.90$. Compare the present-discounted value of receiving \$100 at $t = 3$ under (a) beta-delta and (b) exponential discounting. By what percentage does present bias reduce the perceived value?
  4. In a \$200 ultimatum game, the responder has $\alpha_R = 1.5$, $\beta_R = 0.4$. Compute the minimum acceptable offer $s^*$. What fraction of the total is this?

Apply

  1. Using prospect theory, explain why the same person buys lottery tickets and insurance (the fourfold pattern). Compute subjective valuations of: (a) a \$5 lottery ticket with 1-in-10,000 chance of \$50,000, and (b) \$200/year insurance against 1-in-10,000 chance of losing \$500,000. Use standard parameters.
  2. A present-biased student ($\beta = 0.6$, $\delta = 0.95$) has three homework assignments due on days 1, 2, 3. Each costs 4 utils and yields 8 utils. Using backward induction for the sophisticated agent, determine the work schedule. Compare to the naive agent.
  3. Compute opt-in vs opt-out enrollment rates using Eq. 19.9 with $v=2$, $\sigma=3$, $k=4$. What is the enrollment gap? If there are 5,000 employees, how many additional enrollments from switching to opt-out?
  4. In the DSSW model with $f = 50$, $\rho = 0.20$, $\mu = 10$, $r = 0.04$: compute equilibrium price for $\gamma = 1$, $\gamma = 3$, $\gamma = 10$. What happens to the deviation as risk aversion increases?

Challenge

  1. Derive the disposition effect using prospect theory. An investor bought at $P_0 = 50$. The stock is at $P_1 = 70$ (gain) or $P_1 = 30$ (loss). Each can rise or fall \$10 with equal probability. Compute utility of selling vs holding for each. Show the investor sells the winner and holds the loser.
  2. A planner considers mandatory commitment (illiquid savings locking 20% of income). Population: 60% have $\beta = 1$, 40% have $\beta = 0.6$. All have $\delta = 0.95$, log utility. (a) Welfare change for each type. (b) When does the mandate reduce aggregate welfare? (c) Why might opt-in be superior?
  3. Apply Gabaix sparse maximization to $n=3$ goods with $p_1 = 10$, $p_2 = 10.50$, $p_3 = 50$, default $\bar{p} = 10$, $\theta = 0.5$, Cobb-Douglas utility, wealth $w = 100$. (a) Which dimensions get attention? (b) Demand under full vs sparse attention. (c) Why does the consumer overspend on good 3?
  4. Show that DSSW noise traders can earn higher expected returns than arbitrageurs. With mean misperception $\mu > 0$ and variance $\sigma_\mu^2$: (a) derive expected excess return; (b) show the condition for noise traders to outperform; (c) explain the paradox.

Sources

Kahneman & Tversky (1979); Tversky & Kahneman (1992); Thaler (1980, 2015); Laibson (1997); Fehr & Schmidt (1999); Gabaix (2014); Shleifer & Vishny (1997); DeLong, Shleifer, Summers & Waldmann (1990).