Here is how you add a billion numbers in Rust using a fold

const N: u64 = 1000000000;
fn main() {
let r = (0..N).fold(0, |sum, i| sum+i);
println!("{}", r);
}


(0..N) is a Rust iterator representing a range from 0 to N-1 (similar to xrange(N) in Python). The second argument to the fold function is a Rust closure which takes in two parameters and returns their sum; fold works similar to reduce in Python.

We have a good number of high level abstractions here: a generic fold function, closures, iterators. Abstraction always comes with a cost - this code is surely going to be slow compared to a raw loop which just sums up the first N-1 integers.

Let’s try it out.

First, compile the code with optimizations enabled:

rustc -O sum.rs


And do a crude time measurement:

time ./sum


Here is what I got on my old AMD Athlon PC running Ubuntu:

499999999500000000

real    0m0.028s
user    0m0.000s
sys     0m0.008s


You can’t get faster than that … cool!

The one big advantage with Rust is that you can look at what is happening under the hood by compiling the code to assembly:

rustc -O sum.rs --emit=asm


Here is a part of the code I got:

.Ltmp0:
.cfi_def_cfa_offset 80
movabsq \$499999999500000000, %rax
movq    %rax, 64(%rsp)


Seems like Rust/LLVM has pulled a clever Carl and evaluated the series at compile time!

One of the exciting things about Rust is Zero Cost Abstractions. You can write functional style code with generics, iterators, closures etc and be assured that most of it will get compiled into blazing fast assembly with very little run time overhead. In the above case, the Rust compiler broke down the fold into a simple loop and inlined the closure; LLVM did the magic of evaluating the loop as an (N*(N+1))/2 expression at compile time (that is what I feel is happening behind the scenes - I have no idea of compiler internals)!

I tried the same experiment on a Raspberry Pi - the series once again got evaluated in constant time. But I couldn’t see the final result in the asm code generated by rustc; seems like the generated code is doing (N*(N+1))/2 (or something similar - I didn’t try to decipher the ARM asm code) at run time!

Just for fun, let’s look at the figure for Python (running on the same PC):


reduce(lambda x,y: x+y, xrange(1000000000))



Here is what I got:

499999999500000000

real    1m50.307s
user    1m50.044s
sys     0m0.080s



That is around 110 seconds (replacing the reduce with a sum brought down the run time to around 10 seconds).

This was one of the code examples I showed during my first Rust workshop recently.

[Discuss on HN]