Nov 14, 2015 · Edited Apr 18
Optimising Tail Recursion and Performance
by Ryan Welch · 3 Min Read
Recursion is a common programming paradigm where a function calls itself. Tail recursion, a special case, occurs when the recursive call is the function’s last operation. It can be optimized into a loop, improving performance by eliminating function call overhead and stack growth.
Modern compilers are incredibly complex and good at their jobs, some modern compilers will optimise tail recursion for us, meaning that we can enjoy the benefits of writing recursively whilst everything is taken care for us under the hood.
To prove that tail recursion is optimised we will set up a simple experiment.Below is a simple C program that recursively calls the function count() and adds 1 to our total num each time until we reach 500000 where we return.
#include <stdio.h>
int count(int num) { if(num == 500000) { printf("We did it!\n"); return 0; }
num = num + 1;
return count(num);}
int main() { count(0); return 0;}The program should print out We did it once we reach 500000. So let’s compile it with gcc, and then test it.
$ gcc recursion.c -o recursion$ ./recursionSegmentation fault (core dumped)That’s not what we expected, we get the very descriptive error message Segmentation fault instead.
Taking a closer look and debugging our program in gdb shows the function count() called thousands of times, this is not what we expected. We expected the compiler to optimise our recursive function into a loop so we should not be seeing the count() function being called more than once.
It turns out after reading the documentation that the compiler doesn’t optimise our code by default, so let’s try again with optimisations turned on. The flag -O3 tells gcc to use maximum optimisation and apply all of them to our code. A list of optimisations and what they do is here ↗.
$ gcc recursion.c -O3 -o recursion$ ./recursionWe did it!Success! Our program runs perfectly without any core dumps.
We’re not done yet though! Let’s take a look at how the compiler actually optimises these calls. I’m dumping the assembly code generated by gcc by using the -S flag which tells gcc to stop just before it compiles the assembly code.
I’ve stripped away some of the assembly code and only left the relevant bits to make it easier to read.
$ gcc recursion.c -S -o recursioncount:.LFB0: .cfi_startproc pushq %rbp .cfi_def_cfa_offset 16 .cfi_offset 6, -16 movq %rsp, %rbp .cfi_def_cfa_register 6 subq $16, %rsp movl %edi, -4(%rbp) cmpl $500000, -4(%rbp) jne .L2 movl $.LC0, %edi call puts movl $0, %eax jmp .L3.L2: addl $1, -4(%rbp) movl -4(%rbp), %eax movl %eax, %edi call count.L3: leave .cfi_def_cfa 7, 8 ret .cfi_endproc.LFE0: .size count, .-count .globl main .type main, @functionAs you can see in this unoptimised version, we have a call to count from within the count function.
The cmpl instruction computes a conditional (whether the current count is equal to 500000) and sets a flag. The instruction jne is a conditional branch that jumps to the location L2 if the flag is false, L2 then calls count.
The jmp call is an unconditional branch and terminates the program once the flag is true. This is clearly unoptimised and will add a new level to the stack each time count is called eventually leading to a stack overflow.
$ gcc recursion.c -S -O3 -o recursioncount:.LFB24: .cfi_startproc subq $8, %rsp .cfi_def_cfa_offset 16 movl $.LC0, %edi call puts xorl %eax, %eax addq $8, %rsp .cfi_def_cfa_offset 8 ret .cfi_endproc.LFE24: .size count, .-count .section .text.startup,"ax",@progbits .p2align 4,,15 .globl main .type main, @functionThe compiler does a much better job with the optimisations on, there are no branches as in the previous code and there are no calls to the count function from within itself. So it has successfully optimised the code to exclude tail recursion.
Whilst gcc does a good job at optimising it does not apply optimisations by default. Languages such as Java do not optimise tail recursion at all. Therefore where recusrion is critical I strongly recommend you to not rely on the compilers optimisations. Having said that, there’s plenty of cases where using recursion makes your life much easier.
Liked the post?
Get in touch
Have a question? Spotted a mistake? Or just want to say thank you?
Send me an email at hello@ryanwelch.co.uk - seriously! I love hearing from you.