Superforecasting

  • Good forecasters aggregate perspectives
  • During periods of extreme volatility (randomness), forecasts should regress to the mean.

Introduction

Forecasting should have much more measurement than it currently does. A reasonable model is to forecast-measure-revise, but rarely are forecaster subjected to it. Author is an optimistic-skeptic, skeptical because chaos theory bounds forecasting, but optimistic because so much actually is predictable. Author feels that we likely see and need computer based forecasting blended with subjective judgment. Computer based forecasting will help overcome cognitive limitations and biases. Forecasting often has competing goals and biases besides predicting the future (partisan biases, entertainment, financial incentives, etc.).

Psychology

Measurement

  • Forecasts need a timeline
  • The wording of matters greatly because hedging words like ‘might’, ‘may’, ’likely’ can mean very different things to different people
  • Sherman Kent developed a probability / wording mapping that was not adopted by the intelligence community:
100% Certain
93% (+/- 6%) Almost certain
75% (+/- 12%) Probable
50% (+/- 10%) Chances about even
30% (+/- 10%) Probably not
7% (+/- 5%) Almost certainly not
0% Impossible

Calibration (forecasts)

Calibration is how well a forecast matches the actual occurrence of an event. An under-confident prediction will predict an event happen less frequently than it actually does vs. an over confident prediction which will predict that an event will happen more frequently than it does.

Resolution (forecasts)

Resolution is how often a decisive prediction is made that an event occurs or does not occur. When predictions are only made around the fifty/fifty mark, these are cautious estimates. When estimates are further in the binary (this will occur, this will not occur), these are decisive estimates.

Brier Score

A measurement of the distance between a forecast and what happened. Note that a brier score can be relative to the difficulty of the task. For example, in cases of stability or rare events, a simple algorithm like predict-no-change (or always predict the stable value) will do quite well. You need to benchmark against other forecasters.

0 is perfect. .5 is a hedged 50/50 call or random guessing 2 being perfectly wrong (every time something happens, it is predicted it won’t and vice versa)

Hedge hogs and foxes

“The fox knows many things but the hedgehog knows one thing” - Archilochus

Hedgehogs have “one big idea” and view the world through that idea. Hedgehogs views are easy to understand and they are often confident. Foxes views are eclectic and varied, they aggregate these views. The views are often complex and nuanced.

The Wisdom of Crowds

The crowd aggregates their individual pieces of information. All valid information points in one direction, invalid information cancels each other out in net. This is the power of aggregation.

IARPA Project and Finding Superforecasters

Reiterates much of intro and the formation of the IARPA tournament and the good judgment project.

Intelligence and Superforecasters

  • Superforecasters are in the top 80% of the general population, with forecasters (self-selectors to the program) being in the top 70%.
  • Superforecasters start with the “outside view”, i.e. the one not specific to the particular case. In most cases this means starting with base rate: how common something is in the broader class.
  • Aggregate several perspectives:
    • outside view
    • inside view
    • others outside/inside views
    • your own second opinions (by tweaking the question)

Order of Magnitude Estimation (Fermi)

Numeracy and Superforecasters

  • Superforecasters are typically highly numerate
  • Most people think in terms of: “going to happen”, “not going to happen”, “maybe”
  • In reality, there are few if any certainties (in either direction), and therefore most questions lie in the “maybe” region and require probabilistic thinking
  • Granularity predicts accuracy in forecasting

Reaction to new information

Under and over reaction are defined by commitment to your forecast. Underreacting to new information is often caused by stickiness to your original idea or forecast. Over reaction lack commitment to their ideas and are swayed too easily by potentially irrelevant information

  • Superforecasters update their forecasts frequently over times

Counter Arguments

Not everything that counts is countable. Not everything that can be counted counts.

Tetlock brings up the Black Swan Event and the originator Nassim Taleb who would posit that these critical, history altering events are exceedingly rare and also not predictable. The criticism here is that Superforecasters are good at predicting the mundane, short range events, but those don’t really matter in the context of black swan events rarity and importance. Tetlock’s response to this is two fold:

  1. Black swan events are defined by the acute event but also to the subsequent events. e.g. 9/11 was a dramatic acute event, but had Afghanistan handed over Bin Laden before US invasion, would the 2000s have been defined by the two wars of Iraq and Afghanistan? These trailing events are often the types of questions focused on by the forecasts.
  2. Black swan events are not as rare or as unpredictable as they seem, e.g. 9/11 was anticipated and several similar plots were foiled in the past

References