A Seriously Slow Fibonacci Function
I recently wrote an article which was ostensibly about the Fibonacci series but was really about optimization techniques. I wanted to follow up on its (extremely moderate) success by going in the exact opposite direction: by writing a Fibonacci function which is as slow as possible.
This is not as easy as it sounds: any program can trivially be made slower, but this is boring. How can we make it slow in a fair and interesting way? The answer is to use a model of computation which is not deliberately designed to be slow but which in practice is quite slow, usually because the designer had something quite different than performance in mind.
While there are several to choose from, I selected the \(\lambda\)-calculus (read “lambda calculus”) as being particularly easy to write an spectacularly inefficient implementation in.
Some of you no doubt will be having flashbacks at the mention of the name, while others have already started slowly edging their mouse towards the close tab icon. But don’t worry - if you done any programming before you’ve already seen all the “hard” ideas associated with it and what’s left is a “toy” language that can be learned in a few minutes. By the end of this article you will see clearly how you could write your own non-trivial programs directly in the \(\lambda\)-calculus.
In fact, it’s main problem is that it’s too simple: it is difficult at first to see how anyone could do anything with it. Luckily for us, there exists a set of macros which turn the \(\lambda\)-calculus into a much higher level language. While Alonzo Church was inventing the \(\lambda\)-calculus itself he also developed this set of macros in parallel so that he could convince himself and others that it really could compute. These macros provide a simple and concrete way of encoding numbers, mathematical operators, boolean logic, and even data structures like lists, maps, and trees. Today, this technique is called Church encoding. If \(\lambda\)-calculus is machine code, then the Church encoding is C.
Goal
David Allen says to begin with the goal in mind, so let’s take a look at what we’re shooting for. As a blueprint, let’s first look at a performance-naive implementation of a function which finds the \(n\)-th Fibonacci number, implemented in vanilla Python:
def fibonacci(n):
if n <= 1:
return n
else:
return naive_fib(n-1) + naive_fib(n-2)
Let’s put together a shopping list of features we need to implement this function:
- natural numbers
- addition
- subtraction
- less than comparison
- if/else branch
- recursion
Well, we certainly have our work cut out for us, don’t we? The first four require a model of the Peano axioms, the if/else branch requires a model of Boolean algebra, and recursion is usually regarded as a non-trivial language feature.
For the purposes of this article, we’re going to take a non-standard approach and not use the original notation. Instead, we’ll use a subset of Python which has a one-to-one correspondence with the \(\lambda\)-calculus but which is hopefully both more familiar and more accessible to readers: all of the code in this article will run in any Python 3 interpreter so that readers may follow along and try their own experiments if they like. I shall call this subset the “Pythonic” \(\lambda\)-calculus when I want to specifically refer to the lambda calculus implemented as a subset of the Python grammar.
In the next section I will describe this subset more formally. In this section, I’ll just do a quick high-level overview so you have some idea of where we are heading.
Basically, we’ll be writing Python code, but restricting ourselves to only
using the anonymous function syntax (e.g., lambda x: ...
) and function calls
(e.g., f(x)
).
In addition, we will expand our language with definitions, which are basically macros with a human readable name that expand out to \(\lambda\)-calculus expressions. Here is an example of a definition, the contents of which we will return to later:
plus = lambda m: lambda n: lambda f: lambda x: m(f)(n(f)(x))
Definitions get substituted into other expressions very much like running “Find and Replace All”:
s/plus/(lambda m: lambda n: lambda f: lambda x: m(f)(n(f)(x)))/g
By introducing definitions, we will gradually build up a surprisingly high-level and expressive language on top of the \(\lambda\)-calculus. We can then revert to proper \(\lambda\)-calculus at any time by simply expanding out all the definitions like macros.
With those preliminaries out of the way, let me show you are goal. Although we will the rest of this article to understand in detail how it works, we will end up with a Fibonacci function that looks something like this:
fibonacci = Y(lambda f: PIR(
# define function f(n) as:
lambda n: (
# if n < 2
less_than(n)(two))
# then n
(n)
# else f(n-1) + f(n-2)
(plus
(f(minus(n)(one)))
(f(minus(n)(two))))))
As always in Python, #
indicates an non-executable comment.
It shouldn’t be to hard to see the basic shape of the familiar, friendly
Fibonacci function in there. If you have any LISP, it will even look vaguely
familiar, although the parentheses are in slightly different places. (In LISP,
function application looks like (minus n 2)
instead of minus(n)(two)
.)
The Pythonic Lambda Calculus
In the \(\lambda\)-calculus there are only two operations: abstraction and application. These two can be composed to write any program (computable function) that has or ever will be written.
What do we mean by the term “abstraction?” Let’s say we have a concrete
calculation, such as the sums of squares of a vector like (3, 4)
. We could
type the following into a Python interpreter:
3*3 + 4*4
In an interactive programming session, this might suffice: you type something
in and hit ENTER to see the answer. But in most cases values 3
and 4
are too
hard-coded to be useful. So we can abstract this by replacing 3
with a
placeholder variable x
and introducing a function of x
:
def half_ss(x):
return x*x + 4*4
Now that we have an abstraction, a.k.a. a function, how do we use it? Before
we can do any concrete computation, we need to know what x
is supposed to be.
This is called function application or simply application, and we use the
familiar f(x)
syntax common to math and almost all programming languages.
half_ss(3)
But this “half” function isn’t very satisfactory. The 4
is still hard coded.
If we want to do the same thing again and abstract 4
into y
, we have a
couple of choices. We could greatly complicate our language and add another
syntactic primitive ,
:
def half_ss(x, y):
return x*x + y*y
ss(3, 4)
But if we want to keep things simple, why don’t we simply perform abstraction twice?
def ss(x):
def half_ss(y):
return x*x + y*y
return half_ss
Note that when we call ss(3)
, what is returned is again a function, so
we can use a second application to pass the second argument:
ss(3)(4)
The two applications “cancel out” the two abstractions, and we are left with
the concrete value 3*3 + 4*4
, more commonly known as 25.
This works because we call ss
with the first argument and instead of resolving
to a value, it instead returns a function that we can call with the second argument
to get the result. We can repeat this operation as many times as necessary, but
it get’s inconvenient to make up a silly name like half_ss
each time; instead
we’ll use an anonymous lambda function:
ss = lambda x: lambda y: x*x + y*y
ss(3)(4)
So to recap, we implement the abstraction operation of \(\lambda\)-calculus in
Python by using a lambda
expressions with a single argument each, and if we
want to define a function that takes more than one argument we stack up lambda
expressions like lambda x: lambda y: lambda z :...
.
Note that we can write our entire program without recourse to any names if
we treat ss
as a definition and replace the string ss
with its right-hand
side where ever it appears. The final program is:
(lambda x: lambda y: x*x + y*y)(3)(4)
Which you can run and verify that it gives the answer 25.
In the \(\lambda\)-calculus, application is the only way we can do anything, so we should re-write the binary expressions as more primitive functions:
(lambda x: lambda y: plus(mult(x)(x))(mult(y)(y)))(3)(4)
One specific trick deserves attention. We could define a constant function like so:
lambda x: c
When called, this function ignores the argument x
and always returns the
constant c
. This expression however, has c
as a free variable: there is
no lambda c:
above it. We can remedy that with another abstraction:
lambda c: lambda x: c
This gives us a way to construct constant functions out of any value. This kind
of thing is called a closure. Closures have the property that they can
“remember” values assigned to them at runtime. In fact, the function ss
defined above also returned a closure, which “remembered” that x
was supposed
to equal 3
until the time came when a concrete calculation could be carried
out. Closures are extremely important in the \(\lambda\)-calculus because they are
our only way of defining data structures and passing state around.
Setting aside closures, let’s next discuss how computation is actually carried out. You may well have noticed that the operation I’ve called “application” shows the syntax that says a function should be called with a certain argument, but I haven’t said anything about how the calling of a function should actually be carried out!
It turns out this is exactly the reverse of abstraction - we replace abstract
variables with concrete values. This is called the \(\beta\)-reduction rule (read
“beta reduction”.) For example, we may call the function lambda x: x(x)(x)
with concrete value t
by writing (lambda x: x(x)(x))(t)
. The
\(\beta\)-reduction rules allows us to remove the lambda x
and go through and
replace every x
in the body of the function with t
, which leaves us with
t(t)(t)
. Since t
is a free variable, we cannot reduce this any further, so
we stop. Let’s do the same thing to our sums-of-squares example:
(lambda x: lambda y: plus(mult(x)(x))(mult(y)(y)))(3)(4)
(lambda y: plus(mult(3)(3))(mult(y)(y)))(4)
plus(mult(3)(3))(mult(4)(4))
Since plus
, mult
, and even 3
and 4
are just definitions, we could
expand those definitions and continue the beta-reduction process until only a
single concrete value remains. Below, we will study the Church encoding for
natural numbers and learn exactly how to do this.
So abstraction introduces lambda expressions, application tells us when we should call a function and which argument to pass in, and \(\beta\)-reduction tells us how to carry out that function call.
There is one other rule, called \(\alpha\)-replacement (read “alpha replacement”), which tells us that we can change the variable names of functions whenever we want, as long as we are consistent and change it everywhere inside the body of the function too while also avoiding conflicts with other variable names. For example, these two lambda expressions are the same, and we can use the \(\alpha\)-replacement rule to transform one into the other and vice versa:
lambda x: x === lambda y: y
lambda x: x !== lambda x: y
The lambda expression lambda x: y
on the other hand, would not be the same
because we cannot replace x
with y
without a conflict. We could change
lambda x: y
into lambda z: y
, but that would be a different function. This
rule should be intuitively obvious. It’s also not particularly important
because we equally well could have used an infinite sequence of variable
names to avoid conflicts; this would eliminate the need for the
\(\alpha\)-replacement rule. The \(\beta\)-reduction rule captures the very
essence of what it means to carry out a computation; the \(\alpha\)-replacement
rule is book-keeping.
The above gives the flavor of the \(\lambda\)-calculus: abstraction, application, \(\alpha\)-replacement and \(\beta\)-reduction. Since the \(\lambda\)-calculus is Turing complete, this can be interpreted as implying that all programming can be reduced to abstraction and application, and all computation can be reduced to the \(\beta\)-reduction rule; all else is vanity and grasping at wind.
But the examples I’ve used have share the weakness that if you go far enough
down, you are relying on other operations. For example, mult = lambda x: lambda y: x * y
is ultimately defined in terms of the built-in multiplication
operation *
, and x
and y
are are of type int
. This won’t do; indeed,
this is a serious defect, because the whole point of the lambda calculus is to
prove that these two operations suffice to define any computable function.
Copping out halfway through and relying on native operations proves nothing.
To correct this defect, we need to start from scratch and scrupulously avoid
using any operation except for abstraction and application. Happily, Church
encoding provides a roadmap… it will however lead us to types and data
structures very different than the native python int
and bool
!
Formal Grammar
The subset of Python which constitutes the Pythonic \(\lambda\)-calculus can fully described by this BNF specification:
<expression> ::= <variable>
| "(" lambda <var> ":" <expression> ")"
| <expression> "(" <expression> ")"
In plain english, we build up expressions recursively out of variables, lambda functions, and function applications.
In some cases, where it causes no ambiguity, we will omit the parentheses
around lambda functions, or add parentheses around function application e.g.
(f(x))
. Note that the placement of parentheses is rather different than in
the original \(\lambda\)-calculus syntax! Where we might have written \((f x y)\) in
the convential notation we will write f(x)(y)
, and where we would have
written \((f (x y))\) we will write f(x(y))
. Hopefully, the Pythonic notation
for function application will actually be more familiar to those of you who
have studied modern high level programming languages or pure mathematics.
As a convenience, we will also allow ourselves definitions, although we will later show that these definitions are merely a convenience and can be done away with whenever we choose through the simple process of string substitution. (The development of the Church encoding proceeds mainly by means of such definitions.) A definition looks like this:
<definition> ::= <definition-name> "=" <expression>
Where the <expression>
on the right is restrictions to have no free
variables. That is to say, every variable used is in fact inside the body of a
lambda function with that variable as a parameter. For example, lambda x: x(x)(x(x))
has no free variables, but lambda x: y(x)
has y
as a free
variable. Furthermore, while definitions can contain other definitions,
they cannot ever contain themselves, not even implicitly hidden away in
some other definition. Otherwise, the simple string substitution macro
expansion of a definition would never terminate! (Later, when we need recursion
to implement the Fibonacci function, we will need to find a way to do this!)
These restrictions exist so that we can think of our extended grammar as nothing more than a set of lightweight macros on top of the \(\lambda\)-calculus. In principle, for any given program written in the extended grammar, we can simply substitute in the body of the definition wherever we find the name of the definition. This takes only a finite number of steps and is guaranteed to terminate. At the end of this process, we are left with an equivalent program with no definitions or definition names. Furthermore, we can complete this “macro preprocessing” step entirely before beginning to run the program. In this sense, it is ancillary to the real calculation.
BTW, these rules for what constitutes a valid definition (as opposed to an axiom) can be traced back to Frege, who needed it because he was in the process of adding quantifiers over bound variables to logic. It turned out be a very fruitful principle; modern mathematics is 99% definitions, with only a handful of axioms holding up the foundation. It’s also very broad - even though the \(\lambda\)-calculus is not a logic, the concept of “definition” remains much the same.
To wrap up, the full extended grammar looks like this:
<definition> ::= <definition-name> "=" <expression>
<expression> ::= <variable>
| "(" lambda <var> ":" <expression> ")"
| <expression> "(" <expression> ")"
| "(" <definition-name> ")"
Both the original grammar and the extended grammar are strict subsets of Python, and as such is runnable on a Python interpreter. This is, perhaps, the most extreme example of programming “into” a language instead of programming “in” a language, following McConnell’s distinction.
Note that if we run an expression in the extended grammar directly in Python, the interpreter does not do the macro expansions as described above… but the results of the calculations will always be identical if we’ve carefully followed the rules for introducing definitions! Later, we will show examples of running programs both ways.
Church Booleans
We’ll start with the simplest item on our shopping list: Boolean logic. There are only two Boolean values:
# Church Booleans
true = lambda x: lambda y: x
false = lambda x: lambda y: y
Note that these are not the same as Python’s built-in True
and False
constants; our lowercase true
and false
appear to Python as callable
objects of type function
, not objects of type bool
.
Some of you may object that these are not values, these are functions. Of course they are; the \(\lambda\)-calculus is made out of nothing but functions! But that doesn’t prevent us from thinking of some functions as values when it is convenient for us. Consider the “objects” of object-oriented programming - an object is nothing but a collection of functions, but we usually think of objects as values.
More generally, it is often convenient to talk about the “type” of different lambda expressions. However, because we are technically working in the “untyped” \(\lambda\)-calculus, we will have to keep the concept of “type” high-level and informal for now. (There are also “typed” versions but we don’t need it to actually compute stuff.)
Since we are keeping things informal, we can use an intuitive definition: the “type” of an expression is roughly the number and types of the parameter it would normally be called with. In high level languages this is often called the function signature. The two Church Booleans we defined both have the same type - they expect to be called with two arguments and will then return one or the other of those two arguments unchanged.
Consider the following function, which takes a Boolean argument:
lambda p: p(a)(b)
This piece of code will work equally well if p
is true
or false
- only
the behavior will be different. If p
is true
, it will return a
and if p
if false
it will return b
. In other words, it has the same semantics
as an if/else statement or the ?:
ternary operator.
In general, if we have a predicate (an expression which evaluates to a Boolean), we can always write an if/else statement like:
((<predicate>)
(<if-expression>)
(<else-expression>))
While we will have to wait until after we’ve defined the Church numbers to
get really useful predicates like less_than
, we can go ahead and define
the usual Boolean operators purely in terms of our above definitions for true
and false
:
# Boolean logic
AND = lambda x: lambda y: x(y)(false)
OR = lambda x: lambda y: x(true)(y)
NOT = lambda x: x(false)(true)
XOR = lambda x: lambda y: x(NOT(y))(y)
Each of these expects to be passed Boolean values and then returns a single
Boolean value. We can unpack these using the above equivalence to if/else
;
for example, AND
reads as “if x is true, then return y, else return false”.
This will return true only if x
and y
are both true
, so this function
acts like “and”. You can read the other definitions in the same way.
To more easily interface with these functions, let’s write some bridge code:
from typing import Callable, Any
def church_to_bool(b: Callable) -> bool:
return b(True)(False)
def bool_to_church(b: bool) -> Callable:
return true if b else false
Now that we have these bridge functions, it’s fairly easy to write a test demonstrating that we’ve correctly implemented the truth tables for each of Boolean operation:
for x in (True, False):
for y in (True, False):
x_c = bool_to_church(x)
y_c = bool_to_church(y)
z_c = AND(x_c)(y_c)
z = church_to_bool(z_c)
print(x, "AND", y, "=", z)
True AND True = True True AND False = False False AND True = False False AND False = False
At this point I encourage you to try to test some of the other Boolean
operators, and to write your own, such as NAND
or the SUM
and CARRY
of a
full adder for more of a challenge.
This has been our first taste of computing the \(\lambda\)-calculus. The pattern (which we’ll soon see more of) is simple: use Church encoding to somehow translate your input values into lambda expressions. Pass those into a lambda expression which represent your program. Finally, reverse the Church encoding to recover meaningful values.
Church Numerals
The Church encoding of the natural numbers, called Church numerals, defines the
number \(n\) to be a binary function (here, “binary” means taking two arguments)
which takes a function f
and an arbitrary value x
and applies f
to x
exactly \(n\) times:
zero = lambda f: lambda x: x
one = lambda f: lambda x: f(x)
two = lambda f: lambda x: f(f(x))
three = lambda f: lambda x: f(f(f(x)))
# ... and so on
How do you apply a function zero times? Well, you don’t; you just return the
value x
right away. To call it once is f(x)
, twice is f(f(x))
, and so on.
This means that Church numbers are not an arbitrary sequence of symbols
that only gain semantics because of the relations defined on them (as they are
in other models) but actually have behavior which is directly related to
their meaning: the \(n\)-th Church number has the behavior of repeating a
computation \(n\) times.
For example, suppose we have a function called greet
and we want to call it 3
times. How would we implement the equivalent of a for
or while
loop in the
Pythonic lambda calculus? Just so:
def hello_world(n):
print(f"Iteration #{n}: Hello, Lambda Calculus!")
return n+1
three(hello_world)(1)
Iteration #1: Hello, Lambda Calculus!
Iteration #2: Hello, Lambda Calculus!
Iteration #3: Hello, Lambda Calculus!
4
The first time greet()
is called, it is called with the 1
we passed in. Each
subsequent call is passed the return value from the previous call. The function greet()
will be called 3 times in total, printing a message each time. Finally, it returns
a value of 4
.
Of all the high-level programming languages I am aware of, I think only Ruby
comes close to the idea that numbers should literally be their own for
loops. Even LISP and Haskell require recursion, a separate map function,
or a loop macro. (For code readability alone this is probably a good
thing, though. In writing this article I’ve found the lack of traditional
signpost statements more confusing than elegant, and have had to use comments
to indicate where such control flow statements were being used.)
This one-to-one correspondence between Church numerals and behaviors makes it relatively easy to define mathematical operations on Church numerals. Following Peano, the first thing we need is a successor function which can increment a number by one:
succ = lambda n: lambda f: lambda x: f(n(f)(x))
Why does this work? Well, given an original number n
, it first uses n
to apply f
to x
\(n\) times. It then applies f
once more itself. Thus,
the final value will be f
applied to x
\(n+1\) times, which is \(n+1\) by
definition.
This successor function allows us to easily construct new numbers ad infinitum:
# 0-10 for convenience
zero = lambda f: lambda x: x
one = lambda f: lambda x: f(x)
two = succ(one)
three = succ(two)
four = succ(three)
five = succ(four)
six = succ(five)
seven = succ(six)
eight = succ(seven)
nine = succ(eight)
ten = succ(nine)
church_digits = [zero, one, two, three, four, five, six, seven, eight, nine]
We can now define other mathematical operators:
# church numerals
plus = lambda m: lambda n: lambda f: lambda x: m(f)(n(f)(x))
mult = lambda m: lambda n: lambda f: lambda x: m(n(f))(x)
exp = lambda m: lambda n: n(m)
pred = (lambda n:
lambda f: lambda x:
n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u))
minus = lambda m: lambda n: n(pred)(m)
Of these, plus
and mult
are the easiest to understand. plus
first applies
f
to x
\(m\) times, then applies f
to the result \(n\) times, for a total of
\(n+m\) applications. The multiplication operator mult
is similar, but does
things in a slightly different order: it first defines a new function g = n(f)
which applies f
to some value n times, and then applies g
to x
m
times. Since each call to g
ends up calling f
\(n\) times, the result is
that f
is applied to x
\(m \times n\) times.
Try to figure out exp
for yourself. It’s equivalent to \(m^n\). It’s not on the
main line of functionality we need for the Fibonacci function, and it’s very
clever!
pred
is the predecessor relation. It subtracts one from a number if possible.
(Zero is just mapped to zero, as negative numbers are not defined.) It’s
more complex than succ
but studying it is extremely rewarding,
because it leads to understanding how data structures can be represented in the
\(\lambda\)-calculus. The basic idea is that we are going to but the value x
in a
box and replace f
with a different function, which I’ll call skip_first
. The
first time skip_first
is called, it sees that the box has not been opened, so
it opens that. After that, it sees that the box is already open, so it takes
the value out of the box, applies f
to it once, and puts it back in the box.
It does this \(n\) times. At the end, it takes the value of the box. The
ultimate result is that f
is applied to x
\(n-1\) times, because nothing
happened the first time. In this analogy, the initial closed “box” is lambda u: x
, the new box that is created after each step is lambda h: h(g(f))
, and
the lambda u: u
at the end is the final act of taking the value out of the
box.
pred
is tricky understand, especially in this elementary form. The Wikipedia
article also has a pretty good explanation too. A good exercise to
manually work out the \(\beta\)-reduction of pred(two)
to get a feel for it.
If it still gives you trouble, I suggest you leave it aside and study the rest
of theory until you learn how to encode the “pair” data structure. Then pred
much may be defined in a much more natural way. just to implement the
Fibonacci function so decided to take the straight path through the mud.
Nevertheless, there is a switchback trail with a very gentle slope right over
there.
Defining pred
was the hard part. The definition of minus
in terms of pred
is much easier: n
applies the function pred
to m
\(n\) times, so we
subtract one \(n\) times, which is the same as subtracting n
from m
. Easy, yes?
Now that we have a reasonable set of mathematical operations, let’s do some practical examples. We can do basic operations like \(2+2\) or \(6 \times 7\):
plus(two)(two)
mult(six)(seven)
The problem with these is that what they return is a Church numeral, which is a Python Callable. All I see on my screen when I run the above snippets is opaque and ambiguous output like this:
<function __main__.<lambda>.<locals>.<lambda>.<locals>.<lambda>(x)>
To translate this back into something we can understand, we need to write another bridge function. But how can we “unpack” a Church number? Recall that the Church encoding of \(n\) is a function taking two arguments, a function \(f\) and a value \(x\), and it applies \(f\) to \(x\) \(n\) times: \(f^n(x)\). In our Python environment, this works even if \(x\) and \(f\) are not written in the lambda calculus. Therefore we can follow the spy pattern often used in unit testing and pass in an function which will report how often it was called.
def church_to_int(n: Callable) -> int:
return n(lambda x: x+1)(0)
Going the other way requires no such tricks. If I want to encode the number 42 in the \(\lambda\)-calculus, I can use the first ten digits (defined above) and our mathematical operations to build up the Church number:
plus(mult(four)(ten))(two)
We can use the same strategy for any number. All we really need to do is parse the base 10 representation of a number and build up the Church number in stages:
def int_to_church(n: int) -> Callable:
church_number = church_digits[ int(str(n)[0]) ]
for digit in str(n)[1:]:
church_digit = church_digits[int(digit)]
church_number = plus(mult(church_number)(ten))(church_digit)
return church_number
We can now perform non-trivial calculations entirely in the \(\lambda\)-calculus:
> print("2 + 2 =", church_to_int(plus(two)(two)))
4
> print("6 * 7 =", church_to_int(mult(six)(seven)))
42
Here is a much larger (and slower) example:
> a = int_to_church(1001)
> b = int_to_church(999)
> ab = mult(a)(b)
> print("1001 * 999 =", church_to_int(ab))
999999
Now, finally, we can implement our sums-of-squares method:
> a = int_to_church(3)
> b = int_to_church(4)
> ss = plus(exp(a)(two))(exp(b)(two))
> print("3**2 + 4**2 =", church_to_int(ss))
25
This isn’t all of number theory of course, but its enough to implement our little Fibonacci function!
Predicates Involving Numbers
The first and most basic predict test we need is a check for zero. This will form the foundation of all the other predicates:
is_zero = lambda n: n(lambda x: false)(true)
This works because if lambda x: false
is called even once, the result will be
false
, and this can only be avoided if the function is never called, in which
case the original value true
will be returned. But the only Church numeral
which never calls its function argument is zero
, so the above function
returns true
only for zero
, and false
for every other number.
By the way, a function which returns a value of type Church Boolean is the definition of a “predicate” in this context. The word carries no logical or semantic content here.
The fact that pred
stops at zero (i.e., pred(zero) == zero
) implies that
minus(x)(y) == zero
if and only if y
is bigger than or equal to x
. We can
use this fact to define various comparison tests:
leq = lambda m: lambda n: is_zero(minus(m)(n))
less_than = lambda m: lambda n: leq(succ(m))(n)
eq = lambda m: lambda n: AND(leq(m)(n))(leq(n)(m))
These functions are interesting because while the expect their arguments n
and m
to be Church numbers, their return value is a Church Boolean. The main
thing we wanted was less_than
, which we will need for our Fibonacci function.
Recursion
Rather than jumping straight into implementing recursion in the \(\lambda\)-calculus, let’s take it slow and develop the idea in stages. Let’s start with vanilla Python recursion:
def factorial(n):
if n <= 1:
return 1
else:
return n * factorial(n-1)
This only works because by the time the interpreter reaches the statement
factorial(n-1)
the global symbol table already contains an entry for
factorial
, so Python happily resolves factorial
to that function and calls
it. In other words, it works because Python is doing all the heavy lifting for
us!
In the \(\lambda\)-calculus, there is no global symbol table. Even if there were,
lambda functions are all anonymous: they don’t have names, so what would you
even query the symbol table for? The workaround is to pass the function into
itself as an argument. This is totally legal; x(x)
is a perfectly cromulent
expression in Pythonic \(\lambda\)-calculus. Continuing with our factorial example a
little further, we have:
def _factorial(f, n):
if n <= 1:
return 1
else:
return n * f(f, n-1)
def factorial(n):
return _factorial(_factorial, n)
Neat. But it’s a little ugly that we have to make the recursive call as f(f, n-1)
and explicitly pass f
back into itself. Why not make f
a closure which Currys
that first argument for us?
def _factorial(f, n):
if n <= 1:
return 1
else:
return n * f(n-1)
def factorial(n):
f = lambda m: _factorial(f, m)
return f(n)
We can make this more generic and reduce reliance on the global namespace by
passing _factorial
in as an argument:
def call_recursively(f, n):
g = lambda m: f(g, m)
return g(n)
def _factorial(f, n):
if n <= 1:
return 1
else:
return n * f(n-1)
def factorial(n):
return call_recursively(_factorial, n)
Finally, the call_recursively
function can be abstracted entirely as a Python
decorator. (If you’re not familiar with the decorator syntax, @decorator/f
is
simply syntactic sugar for f = decorator(f)
and is a convenient way to apply
a functor to a function.)
def recursive(f):
def recursive(n):
g = lambda m: f(g, m)
return g(n)
return recursive
@recursive
def factorial(f, n):
if n <= 1:
return 1
else:
return n * f(n-1)
So, in vanilla Python, we’ve implemented a reusable utility which enables us to
conveniently do recursion while avoiding any reference to the global symbol
table. As we make the jump to the final form purely in Pythonic
\(\lambda\)-calculus, we will rename recursive
to Y
- it is indeed the famous
Y-combinator, the higher-order function which makes recursion (and therefore
also iteration) possible in the \(\lambda\)-calculus. As for why it is called
Y
, I have no idea - it’s just the standard symbol.
Y = lambda f: (lambda x: x(x))(lambda y: f(lambda z: y(y)(z)))
We can also apply exactly one application of the \(\beta\)-reduction rule to bring this into an equivalent symmetrical form:
Y = lambda f: (lambda y: f(lambda z: y(y)(z)))(lambda y: f(lambda z: y(y)(z)))
While the symmetrical form is more often seen, I prefer the first version because I think it more clearly expresses the idea of currying a function with itself. However, both do exactly the same thing. Furthermore, neither have any free variables and so meet our requirements for a proper definition.
In an ideal world we could now define the recursive factorial function entirely in Pythonic \(\lambda\)-calculus like so:
factorial = Y(lambda f:
lambda n:
((leq)(n)(one)
(one)
(mult(n)(f(pred(n))))))
However, there is a problem: when we run the above function we get a stack
overflow error. Why this so? Is the algorithm wrong? No: if you executed this
lambda expression with true \(\beta\)-reduction, it would work fine. The problem
is that our Church Boolean pseudo-if-else
statement is not quite a proper
if-else
! In a language like C or Python, the code inside the selected branch
will be executed, but code in the other branch will not even be run. However,
in the Pythonic \(\lambda\)-calculus, if we write:
((some-predicate)
(if-branch)
(else-branch))
Then both the if-brach
and the else-brach
will need to be evaluated
completely before some-predicate
can be called, regardless of the value of
some-predicate
!
This is called “eager” evaluation and Python always eagerly evaluates all
arguments of a function call before performing the function call. Therefore
in Python, we will always compute both branches, and only at the very end
will we discard one of the values. Normally, this wouldn’t cause serious
problems because the answer would always be the same (it would just be a little
slower as it takes time to the work which gets thrown away.) It becomes a
serious problem in the case of recursion because the else
branch is always
evaluated, which calls the function again, which calls the function again, and
so for forever.
One response would be to give up on Python and go abuse some other language which has lazy evaluation. (It’s not a coincidence that functional languages like Haskell generally support lazy evaluation!) Alternatively, we could write an interpreter for the \(\lambda\)-calculus which implements \(\beta\)-reduction rule, side-stepping Python entirely.
These are good options, and for someone writing a very complete implementation
of the \(\lambda\)-calculus they would be necessities. But today we’re only
concerned to write one function, the Fibonacci function, so we can use a much
simpler hack to prevent infinite recursion, which I will abbreviate PIR
:
# cheap hack because true and false don't short circuit.
def PIR(f):
def wrapper_function(n):
if church_to_bool(is_zero(n)):
return zero
else:
return f(n)
return wrapper_function
With our little hack in place, we can now implement a working version of the factorial function which does not cause a stack overflow:
factorial = Y(lambda f: PIR(
lambda n:
((leq)(n)(one)
(one)
(mult(n)(f(pred(n)))))))
Long stacks of parentheses appear to be an operational hazard when working with \(\lambda\)-calculus inspired languages.
In any case, the above function is a little inconvenient to call because it requires a Church numeral as input and returns an opaque Church numeral. Let’s wrap it in bridge code:
def slow_factorial(n):
n = int_to_church(n)
fac = factorial(n)
return church_to_int(fac)
This little function correctly computes factorials:
for n in range(1, 10):
print(n, slow_factorial(n))
1 1
2 2
3 6
4 24
5 120
6 720
7 5040
8 40320
9 362880
10 3628800
What about our main design objective? I am pleased to report that the above function is indeedextremely slow: it takes over a minute to calculate \(10!\) on a fairly new laptop. That works out to 36 million times slower than the vanilla Python implementation.
We can profile the code to figure out why. Here, I’ve edited the profiler
output from %prun slow_factorial(9)
to give names to the most common calls;
in the original the function name was always just (<lambda>)
distinguished
only by line number.
ncalls tottime percall cumtime percall function
1564014 0.849 0.000 0.849 0.000 succ
362889 0.354 0.000 0.542 0.000 plus
362880 0.188 0.000 0.188 0.000 church_to_int
260669 0.152 0.000 0.152 0.000 zero
79211 0.052 0.000 0.052 0.000 pred
9/1 0.000 0.000 0.003 0.003 factorial
So, we actually spend almost all of our time simply incrementing numbers by one. Church numbers are easy to define and work with, but they are gloriously inefficient, especially as the numbers grow. A more efficient encoding could be defined by using the Boolean algebra we developed earlier to define operations on binary strings, but that is not what we are about today.
Final Fibonacci
Our shopping list is complete: we now have all the necessary tools to proceed to the endgame. All that remains is to implement the Fibonacci algorithm:
fibonacci = Y(lambda f: PIR(
lambda n:
less_than(n)(two)
(n)
(plus
(f(minus(n)(one)))
(f(minus(n)(two))))))
The less_then(n)(two)
is a predicate that resolves to a Church Boolean.
This Boolean is then used as an if/else
statement returning n
for the if
branch or the recurance relation for the Fibonacci for the else
branch. The
else
branch is simply the sum of \(F_{n-1}\) and \(F_{n-2}\). The Y
-combinator
ensures that f
is in fact the same fibonacci
function so we can call
f(n-1)
and f(n-2)
to calculuate \(F_{n-1}\) and \(F_{n-2}\).
Because we make two separate recursive calls and don’t do any caching, the
number of calls to fibonacci()
will grow roughly as \(\mathcal{O}(2^n)\). This
is of course a terrible algorithm; it’s the same one I called naive_fib()
in
my earlier article on optimizing the Fibonacci function. However, this
is entirely in keeping with our goal of writing the slowest possible version!
As before, we’ll wrap this in bridge code to handle the translation between native Python integers and Church numerals:
def slow_fibonacci(n: int) -> int:
n = int_to_church(n)
fib = fibonacci(n)
return church_to_int(fib)
We can test that it is correct by exhibiting the first 20 Fibonacci numbers:
for n in range(21):
print(n, slow_fibonacci(n))
0 0
1 1
2 1
3 2
4 3
5 5
6 8
7 13
8 21
9 34
10 55
11 89
12 144
13 233
14 377
15 610
16 987
17 1597
18 2584
19 4181
20 6765
How Slow is Slow?
As expected, this is rather slow, over 6 seconds to calculate \(F_{20}\):
%time slow_fibonacci(20)
CPU times: user 6.59 s, sys: 20 ms, total: 6.61 s
Wall time: 6.6 s
This is one thousand times slower than the same naive algorithm in implemented vanilla Python, and about a million times slower than a good algorithm. Note however that this is still faster than a human could work it out on paper! As recently as 80 years ago this would have been state-of-the-art.
A straight line on a log-scale plot shows that the algorithm scales as \(\mathcal{O}(2^n)\).
The profiler shows where we were spending our time:
1922492 function calls (1833945 primitive calls) in 1.858 seconds
Ordered by: internal time
ncalls tottime percall cumtime percall function
1133885 0.513 0.000 0.513 0.000 pred
17710/1 0.199 0.000 31.615 31.615 fibonacci
95316 0.173 0.000 3.507 0.000 two
59896 0.159 0.000 3.636 0.000 mult
53131 0.118 0.000 30.497 0.001 is_zero
53130 0.110 0.000 0.280 0.000 minus
35421/1 0.092 0.000 31.615 31.615 PIR-wrapper
35420/2 0.083 0.000 31.615 15.807 Y
119792 0.074 0.000 0.074 0.000 three
35421 0.072 0.000 0.114 0.000 church_to_bool
17710 0.057 0.000 10.598 0.001 less_than
35420 0.052 0.000 0.076 0.000 fibonacci-wrapper
64077 0.041 0.000 0.041 0.000 zero
35420 0.024 0.000 0.024 0.000 PIR
17710 0.022 0.000 0.033 0.000 one
28655 0.014 0.000 0.014 0.000 false
24476 0.012 0.000 0.012 0.000 true
17710 0.012 0.000 0.012 0.000 leq
17711 0.012 0.000 0.012 0.000 plus
17710 0.012 0.000 0.012 0.000 succ
6765 0.006 0.000 0.006 0.000 church_to_int
Unlike slow_factorial()
which deals with numbers that blow up very
quickly, slow_factorial()
deals with relatively smaller numbers which
means that we spent less time simply iterating through succ
and more time
doing interesting things. Nevertheless, it spends a lot of time doing simple
subtractions - this is one of the weak points of the Church numerals.
Slower Than Slow
How could we make this even slower? Again, in a fair way, not just sprinkling no-ops and sleep statements throughout.
One interesting approach would be to implement what is sometimes called a meta-circular evaluator: an interpreter for the \(\lambda\)-calculus written entirely within the \(\lambda\)-calculus itself. We could then stack interpreters indefinitely, with each layer costing us another factor of a thousand. It would be reminiscent of this art project where a long gear chain is used to build a machine which will take 13.7 billion years for the final gear to complete one rotation:
The machine has a electric motor happily whirring away at one end (click the image for a video showing it action) and a solid block of concrete at the other. Normally that would be a recipe for disaster but because each gear steps the revolution speed down by a factor of 10, and because so many gears are chained together, the motion is infinitesimal by the end.
We’re not actually going to do that, of course. This article is already way too long. But we could.
Macro Expansion
Perhaps you don’t believe that this is really a \(\lambda\)-calculus program; after all, it has all those “definitions” which look suspiciously like named functions and in fact are being treated as named functions by the Python interpreter! Very suspicious.
We can see this is not the case by doing a search for each definition’s name
and replacing it with its body, and repeating until no named definitions are
left. Below is the step-by-step expansion process, with one version per line;
the last line contains no definitions (expect our PIR
hack) and instead is
composed entirely of lambda
functions, function calls, and bound variable
names.
fibonacci = Y(lambda f: PIR(lambda n: (less_than(m)(two))(n)(two)(n)(plus(f(minus(n)(one)))(f(minus(n)(two))))))
fibonacci = Y(lambda f: PIR(lambda n: (lambda m: lambda n: leq(succ(m))(n))(n)(two)(n)(plus(f(minus(n)(one)))(f(minus(n)(two))))))
fibonacci = Y(lambda f: PIR(lambda n: (lambda m: lambda n: (lambda m: lambda n: is_zero(minus(m)(n)))(succ(m))(n))(n)(two)(n)(plus(f(minus(n)(one)))(f(minus(n)(two))))))
fibonacci = Y(lambda f: PIR(lambda n: (lambda m: lambda n: (lambda m: lambda n: (lambda n: n(lambda x: false)(true))(minus(m)(n)))(succ(m))(n))(n)(two)(n)(plus(f(minus(n)(one)))(f(minus(n)(two))))))
fibonacci = Y(lambda f: PIR(lambda n: (lambda m: lambda n: (lambda m: lambda n: (lambda n: n(lambda x: (lambda x: lambda y: y))(lambda x: lambda y: x))(minus(m)(n)))(succ(m))(n))(n)(two)(n)(plus(f(minus(n)(one)))(f(minus(n)(two))))))
fibonacci = Y(lambda f: PIR(lambda n: (lambda m: lambda n: (lambda m: lambda n: (lambda n: n(lambda x: (lambda x: lambda y: y))(lambda x: lambda y: x))((lambda m: lambda n: n(pred)(m))(m)(n)))(succ(m))(n))(n)(two)(n)(plus(f((lambda m: lambda n: n(pred)(m))(n)(one)))(f((lambda m: lambda n: n(pred)(m))(n)(two))))))
fibonacci = Y(lambda f: PIR(lambda n: (lambda m: lambda n: (lambda m: lambda n: (lambda n: n(lambda x: (lambda x: lambda y: y))(lambda x: lambda y: x))((lambda m: lambda n: n(pred)(m))(m)(n)))(succ(m))(n))(n)(two)(n)(plus(f((lambda m: lambda n: n(pred)(m))(n)(one)))(f((lambda m: lambda n: n(pred)(m))(n)(two))))))
fibonacci = Y(lambda f: PIR(lambda n: (lambda m: lambda n: (lambda m: lambda n: (lambda n: n(lambda x: (lambda x: lambda y: y))(lambda x: lambda y: x))((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(m)(n)))(succ(m))(n))(n)(two)(n)(plus(f((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(n)(one)))(f((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(n)(two))))))
fibonacci = Y(lambda f: PIR(lambda n: (lambda m: lambda n: (lambda m: lambda n: (lambda n: n(lambda x: (lambda x: lambda y: y))(lambda x: lambda y: x))((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(m)(n)))(succ(m))(n))(n)(two)(n)((lambda m: lambda n: lambda f: lambda x: m(f)(n(f)(x)))(f((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(n)(one)))(f((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(n)(two))))))
fibonacci = Y(lambda f: PIR(lambda n: (lambda m: lambda n: (lambda m: lambda n: (lambda n: n(lambda x: (lambda x: lambda y: y))(lambda x: lambda y: x))((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(m)(n)))(succ(m))(n))(n)(succ(one))(n)((lambda m: lambda n: lambda f: lambda x: m(f)(n(f)(x)))(f((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(n)(lambda f: lambda x: f(x))))(f((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(n)(succ(one)))))))
fibonacci = Y(lambda f: PIR(lambda n: (lambda m: lambda n: (lambda m: lambda n: (lambda n: n(lambda x: (lambda x: lambda y: y))(lambda x: lambda y: x))((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(m)(n)))((lambda n: lambda f: lambda x: f(n(f)(x)))(m))(n))(n)((lambda n: lambda f: lambda x: f(n(f)(x)))(lambda f: lambda x: f(x)))(n)((lambda m: lambda n: lambda f: lambda x: m(f)(n(f)(x)))(f((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(n)(lambda f: lambda x: f(x))))(f((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(n)((lambda n: lambda f: lambda x: f(n(f)(x)))(lambda f: lambda x: f(x))))))))
fibonacci = (lambda f: (lambda x: x(x))(lambda y: f(lambda z: y(y)(z))))(lambda f: PIR(lambda n:(lambda m: lambda n: (lambda m: lambda n: (lambda n: n(lambda x: (lambda x: lambda y: y))(lambda x: lambda y: x))((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(m)(n)))((lambda n: lambda f: lambda x: f(n(f)(x)))(m))(n))(n)((lambda n: lambda f: lambda x: f(n(f)(x)))(lambda f: lambda x: f(x)))(n)((lambda m: lambda n: lambda f: lambda x: m(f)(n(f)(x)))(f((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(n)(lambda f: lambda x: f(x))))(f((lambda m: lambda n: n((lambda n: lambda f: lambda x: n(lambda g: lambda h: h(g(f)))(lambda u: x)(lambda u: u)))(m))(n)((lambda n: lambda f: lambda x: f(n(f)(x)))(lambda f: lambda x: f(x))))))))
The ultimate expanded version of this function works just as well and runs just as fast. Although it does nothing but create closures and call functions, somehow in the end it carries out the computation of a mathematical function. True, it is a little hard to read in this form, but so is machine code.
Here is a visualization of the full Fibonacci program (click for a larger image.) This shows every node (either a variable, function call, or lambda abstraction) as a binary tree.
It reminds me a lot of this XKCD comic strip about LISP where
cons
is taken as atomic. Of course, the pair data structure
can easily be defined in the \(\lambda\)-calculus as well, making lambda
abstractions even more than fundamental cons
. The essential insight remains
the same: all programs and data structures can be reduced to binary trees and
all information about the program is somehow contained in the very structure of
the tree itself.
Conclusion
All models of computation are equal, but some are more equal than others. In theory, the \(\lambda\)-calculus is only a constant factor away from any other model of computation, but the same is not true for the Church encoding: Church numerals are useful because they are very easy to “bootstrap;” that is to say, to implement in terms of lower level primitives, while on the other hand it is very difficult to implement more efficient numbers without data structures and a ready supply of distinct symbols, which the Church numerals provide.
It took us only a few dozen definitions to go from something so spartan that it seemed to be missing every convenience of modern programming, to a useful language with recursion, for loops, if/else statements, and recursive function calls. Things like closures, Currying, functors/decorators, which are considered advanced features in other languages, we somehow got for free.
If we had carried on defining signed numbers, pairs, lists, associative maps, and so on, this parsimony would continue and after a few hundred definitions - rather smaller than the standard library of most languages - we would have a perfectly functional language. (Pun absolutely intended.)
Some languages like Forth and io in fact take this exact approach:
the language is absolutely minimal and almost everything is moved to the
standard library, including the definition of standard constructs like for
and if/else
.
If computation is so simple, why are modern languages so complicated? It mostly boils down to the impedance mismatch between the \(\lambda\)-calculus and the kinds of electronic computers we can actually build. We can’t actually make Church numerals fast, but we can make a machine which can perform mathematical operations on two 64 bit numbers in a single cycle. We can’t write a performant \(\beta\)-reducer, but we can write a block of machine code which calls functions by using a calling convention: pops a certain number arguments off the stack when called and returns a value on the stack or in a particular register. And so on.
For reasons that aren’t entirely clear to me, the \(\lambda\)-calculus has been a good model for thinking about high-level, expressive ways of computing, while the Turing machine has been a good model for designing actual computing machines. While in theory we can always switch from one model of computation to another, in practice this can involve a three or four order of magnitude reduction is performance. Practical programming languages have to walk the line between exposing the real capabilities of the machine while also providing useful high level abstractions.
Which isn’t to say there aren’t practical languages very much inspired by and very close to the \(\lambda\)-calculus. LISP and Haskell come to mind. After watching some of the SICP lectures on Youtube, I’m convinced the only reason Harold Abelson didn’t just teach the whole course in pure \(\lambda\)-calculus is because computers back then were still slow enough that it would have been just a little too painful, and LISP was chosen as a compromise. (Unfortunately, as we’ve seen today, this is still true. But maybe someday…) Many JavaScript programs too, particularly those that create lots of closures to handle asynchronous callbacks, seem to me to be much closer in spirit to the \(\lambda\)-calculus (as opposed to something like C, which uses a pointer machine model which is only slightly higher level than a Turing machine.) It’s never been more important to expose CS students to the idea and modes of thinking inspired by the \(\lambda\)-calculus and I think it’s importance will only grow as we can better afford the cost of abstractions.
As always, all the source code for this article is free and open source.
Postscript
Reading the “macro expanded version” of the function out loud, the similarity between “lambda” and “llama” made me think of the children’s book "Llama llama red pajama, a poem about a baby llama and his mama. I translated the above function declaration into a made up agglutinative language to produce this nonsense bedtime rhyme… perhaps the only one in the world which is also a executable program.
Hey llama baby hey llama mama hey
Ma sleep sleep hey llama ba baby
Hey llama ta ba hey ba sleep hey
Ta sleep sleep sleep sleep hey
Llama baby llama na hey llama ga
Llama na hey llama ga llama na
Hey llama na na hey llama ma hey
Llama ma llama ba ba sleep sleep
Hey llama ma llama ba ma sleep
Sleep hey hey llama ga llama na
Na hey hey llama na llama baby
Llama ma na hey llama red pajama
Llama haha hey red pajama hey
Baby sleep sleep sleep hey llama
Drama sleep hey llama dra dra
Sleep sleep sleep hey ga sleep
Sleep hey ga sleep hey na sleep
Sleep sleep hey hey llama na
Llama baby llama ma baby hey na
Hey baby sleep hey ma sleep sleep
Sleep hey ga sleep sleep hey na
Sleep sleep hey na sleep hey hey
Llama na llama baby llama ma baby
Hey na hey baby sleep hey ma
Sleep sleep sleep hey llama baby
Llama ma baby hey ma sleep sleep
Sleep hey na sleep hey hey llama
Ga llama na llama baby llama ma
Ga hey baby sleep hey na hey baby
Sleep hey ma sleep sleep sleep
Hey baby hey hey llama ga llama
Na na hey hey llama na llama baby
Llama ma na hey llama red pajama
Llama haha hey red pajama hey
Baby sleep sleep sleep hey llama
Drama sleep hey llama dra dra
Sleep sleep sleep hey ga sleep
Sleep hey na sleep hey llama baby
Llama ma baby hey ma sleep sleep
Sleep sleep hey baby hey hey
Llama ga llama na na hey hey
Llama na llama baby llama ma na
Hey llama red pajama llama haha
Hey red pajama hey baby sleep
Sleep sleep hey llama drama sleep
Hey llama dra dra sleep sleep
Sleep hey ga sleep sleep hey na
Sleep hey hey llama na llama baby
Llama ma baby hey na hey baby
Sleep hey ma sleep sleep sleep
Hey llama baby llama ma baby hey ma
Sleep sleep sleep sleep sleep sleep sleep