Memoization in Haskell

Vignesh Karthikeyan
3 min readJul 15, 2020
  1. How is caching not considered a side-effect(in the context of pure functions)?
  2. Aren’t all values functions?

All functions are values but not all values are functions.

This is really about operational semantics, which are sometimes tricky to talk about in Haskell because Haskell is only defined in terms of its denotational semantics — that is, what value an expression evaluates to, rather than how that evaluation happens. It’s not a side-effect because the “stateful” nature of memoization is still hidden behind the abstraction of purity: while there is some internal state (represented in the partial graph-reduction of the program), there is no way for your program to observe that state in a way that would distinguish it from the non-memoized version. A subtlety here is that these memoization strategies are not actually required to memoize — all that is guaranteed is the result they will give after some unspecified finite amount of time.

There is no requirement for a Haskell implementation to memoize anything — it could use pure call-by-name, for example, which doesn’t memoize values, instead recomputing everything. Here’s an example of call-by-name.

let f x = x * x in f (2 + 2)
= (2 + 2) * (2 + 2)
= 4 * (2 + 2)
= 4 * 4
= 16

Here 2 + 2 is evaluated twice. Most Haskell implementations (optimizations aside) would create a thunk so that it would be computed at most once (which is called call-by-need). But a call-by-name Haskell implementation that evaluated it twice would be technically conforming. Because Haskell is pure, there will be no difference in the result computed. In the real world though, this strategy ends up being much too expensive to be practical.

As for the choice not to memoize functions the logic is the same. It is perfectly conforming for a compiler to aggressively memoize every function, using something like optimal evaluation. Haskell’s purity means that there will be no difference in the result if this evaluation strategy were chosen. But again, in real-world applications, memoizing every function like this ends up taking a lot of memory, and the overhead of an optimal evaluator is too high to give good practical performance.

A Haskell compiler could also choose to memoize some functions but not others in order to maximize performance. This would be great — my understanding is that it is not really known how to do this reliably. It is very hard for a compiler to tell in advance which computations will be cheap and which will be expensive, and which computations will probably be reused and which will not.

So a balance is chosen, in which values, whose evaluated forms are usually smaller than their thunks, are memoized; whereas functions, whose memoized forms are usually larger than their definitions (since they need a whole memo table), are not. And then we get some techniques like those in the article to switch back and forth between these representations, according to our own judgment as programmers.

--

--