Iterating in Rust

I finished reading the Rust book a few days ago. My first program calculates the edge connectivity of a graph. It relies on iteration quite a lot, and I had the occasion to dig a bit deeper into Rust iterators.

Silomem II. Huile sur toile, 2009, Odile Ferron-Verron.

Iterators are a simple, idiomatic way to iterate over lazily generated sequences. Rust didn’t invent them, but they are absent from enough popular languages to warrant a discussion.

Powers of 2

Let’s pick a simple example: the powers of 2. The simplest way to iterate is a for loop:

let powers_first_five = [1, 2, 4, 8, 16];

println!("First 5 powers of 2");
for x in &powers_first_five {
    println!("{}", x);
}

What if we want more than five numbers? We can create a function that takes a count argument:

fn powers_count(count: usize) -> Vec<u32> {
    let mut x = 1;
    let mut v = Vec::with_capacity(count);
    for _ in 0..count {
        v.push(x);
        x *= 2;
    }
    v
}

println!("First 10 powers of 2");
for x in powers_count(10) {
    println!("{}", x);
}

What if we want all the powers of 2 up to 100? We can create a function that takes a max argument:

fn powers_max(max: u32) -> Vec<u32> {
    let mut x = 1;
    let mut v = vec![];
    while x <= max {
        v.push(x);
        x *= 2;
    }
    v
}

println!("Powers of 2 up to 100");
for x in powers_max(100) {
    println!("{}", x);
}

Alarms should start ringing: the code that generates powers of two is now duplicated. It’s not a big deal in our example because generating powers of two is simple. But imagine if it was something more complicated, like a graph traversal. We really want to write, debug and maintain it only once.

Only one piece of code should generate numbers. This piece of code should not worry about how many numbers to generate. This responsibility is better left to the code that uses the numbers, because it knows best what it needs:

println!("First 10 powers of 2");
let mut count = 0;
for x in powers() {
    if count > 9 {
        break;
    }
    count += 1;
    println!("{}", x);
}

println!("Powers of 2 up to 100");
for x in powers() {
    if x > 100 {
        break;
    }
    println!("{}", x);
}

There is only one powers() function, reused everywhere. Replacing count > 9 or x > 100 with any other arbitrary condition is simple.

But what does powers() return? We can return a Vec<u32> containing all the powers of 2 that fit in a u32:

fn powers() -> Vec<u32> {
    let mut x = 1u32;
    let mut v = vec![x];
    while let Some(next) = x.checked_mul(2) {
        v.push(next);
        x = next;
    }
    v
}

It would work quite well in our example because there are only 32 such values. But there is room for improvement. We could avoid allocating a Vec. We could also generate lazily only as many numbers as we need—why generate 32 numbers if we only need 10?

Iterators to the rescue

Let’s go straight to the code:

struct Powers { next_value: Option<u32> }

impl Iterator for Powers {
    type Item = u32;

    fn next(&mut self) -> Option<u32> {
        let saved = self.next_value;
        if let Some(x) = self.next_value {
            self.next_value = x.checked_mul(2);
        }
        saved
    }
}

fn powers() -> Powers {
    Powers { next_value: Some(1) }
}

println!("Powers of 2 up to 100");
for x in powers() {
    if x > 100 {
        break;
    }
    println!("{}", x);
}

Powers contains the state of the iteration, and can be iterated over because it implements the Iterator trait. Each iteration calls next.

If the next value doesn’t fit in a u32, next returns None and the for loop exits. Otherwise, next returns Some(x: u32) and the loop keeps going. If we break out of the loop, next stops being called and no more numbers are generated. Win!

This allows more complicated operations to be exposed in an familiar and performant way. For example, here’s how to do a breadth-first-search in my little program (variables renamed for clarity):

for Edge { x, y } in graph.breadth_first_search(start) {
    // Do something with x and y. Break out of the loop
    // if y is the vertex we were looking for.
}

Convenience methods

Iterators have many convenience methods. For example, printing the first 10 powers of 2 can be shortened to:

println!("First 10 powers of 2");
for x in powers().take(10) {
    println!("{}", x);
}

Unfold

The unstable Unfold iterator helps shorten the code, but I find the type declarations overly verbose:

// Unfold is unstable as of 2015/05/25.
// Select the nightly channel to run in the Rust playground.

#![feature(core)]
use std::iter::Unfold;

fn powers() -> Unfold<Option<u32>,
                      fn(&mut Option<u32>) -> Option<u32>> {
    Unfold::new(Some(1), unfold_powers)
}

fn unfold_powers(next_value: &mut Option<u32>) -> Option<u32> {
    let saved = *next_value;
    if let Some(x) = *next_value {
        *next_value = x.checked_mul(2);
    }
    saved
}

The type declarations can be elided using a closure, but it makes powers difficult to reuse outside of the current scope. In most cases it’s probably not a good choice:

// Unfold is unstable as of 2015/05/25.
// Select the nightly channel to run in the Rust playground.

#![feature(core)]
use std::iter::Unfold;

let powers = || {
    Unfold::new(Some(1u32), |next_value| {
        let saved = *next_value;
        if let Some(x) = *next_value {
            *next_value = x.checked_mul(2);
        }
        saved
    })
};

For reference and tinkering, here is the fully working code, ready to run in the Rust playground:

struct Powers { next_value: Option<u32> }

impl Iterator for Powers {
    type Item = u32;

    fn next(&mut self) -> Option<u32> {
        let saved = self.next_value;
        if let Some(x) = self.next_value {
            self.next_value = x.checked_mul(2);
        }
        saved
    }
}

fn powers() -> Powers {
    Powers { next_value: Some(1) }
}

fn main() {
    println!("First 10 powers of 2");
    for x in powers().take(10) {
        println!("{}", x);
    }

    println!("Powers of 2 up to 100");
    for x in powers() {
        if x > 100 {
            break;
        }
        println!("{}", x);
    }
}