Home About Me

Day 11 - Plutonian Pebbles

2024-12-11

Part 1 starts of with a simple problem to solve naively, part 2 makes the solution unfeasible.

Link To The Code On GitHub

There are “physics-defying stones” arranged in a straight line, each engraved with a number, with each blink, each stone will be replaced based on the rules:

Input

Today’s input is the shortest I’ve ever seen, just a few numbered stones with a space in between the numbers.

Part 1

How many stones after 25 blinks?
The naive solution is fairly simple: replace the old stone vector with a new stone vector where each stone was replaced with the corresponding next step.
First, the outer function that parses the numbers and calls the step function:

RUST
pub fn stone_stepper_naive(mut input: &[u8], steps: u8) -> u64 {
    let mut stones = Vec::<u64>::with_capacity(16);
    loop {
        let (num, remainder) = fast_parse(input);
        stones.push(num);
        if remainder.is_empty() {
            break;
        }
        input = &remainder[1..];
    }
    for _ in 0..steps {
        stones = step(stones);
    }
    stones.len() as u64
}

And then the actual step function:

RUST
fn step(stones: Vec<u64>) -> Vec<u64> {
    let mut new_stones = Vec::<u64>::with_capacity(stones.len());
    for stone in stones {
        if stone == 0 {
            new_stones.push(1);
        } else {
            let digits = stone.ilog10() + 1;
            if digits % 2 == 0 {
                let tenpow = 10u64.pow(digits / 2);
                new_stones.push(stone / tenpow);
                new_stones.push(stone % tenpow);
            } else {
                new_stones.push(stone * 2024);
            }
        }
    }
    new_stones
}

There are obvious efficiency issues here, but it will do for part 1.

Part 2

How many stones after 75 blinks?

That’s all? No new rules? Can I just run the same solution with a bigger number?
Of course not, this is Advent of Code, running the same code slows down to a crawl at around ~40 steps, and crashes due to being out of memory(on my 32GiB system) at around ~50 steps.

Not surprising, time for a new solution.

It was obvious from the start some sort of cache will do good here, so I’ve written a simple cached recursive function, also known as “memoization”:

RUST
fn cached_step(stone: u64, steps: u8, cache: &mut FxHashMap<(u64, u8), u64>) -> u64 {
    if steps == 0 {
        return 1;
    }
    if let Some(expanded_amount) = cache.get(&(stone, steps)) {
        *expanded_amount
    } else {
        let amount = if stone == 0 {
            cached_step(1, steps - 1, cache)
        } else {
            let digits = stone.ilog10() + 1;
            if digits % 2 == 0 {
                let tenpow = 10u64.pow(digits / 2);
                let amount1 = cached_step(stone / tenpow, steps - 1, cache);
                let amount2 = cached_step(stone % tenpow, steps - 1, cache);
                amount1 + amount2
            } else {
                cached_step(stone * 2024, steps - 1, cache)
            }
        };
        cache.insert((stone, steps), amount);
        amount
    }
}

And the outer function simply calls this function once, and now it doesn’t need to store the stones, it can just parse them and call the function.

That’s part 2 solved faster than the old solution could complete ~30 steps.
Of course it can also solve part 1 a lot faster.

Optimizations

As always, this is where the CPU clock gets locked.
First, how much faster is the cached solution compared to the naive one?

Day11 - Part1/naive     time:   [5.0976 ms 5.1092 ms 5.1223 ms]
Day11 - Part1/cached    time:   [135.03 µs 135.14 µs 135.24 µs]

A lot faster.

And part 2(obviously only with the cached solution):

Day11 - Part2/cached    time:   [10.313 ms 10.336 ms 10.362 ms]

My first optimization was to split the cache to 75 different HashMaps, each for a different step count, the only difference is that now a specific cached value is accessed using cache[(steps-1) as usize].get(stone) instead of cache.get((stone,steps)).

This is a little faster for part 2 and a little slower for part 1(I didn’t bother making part 1 create only 25 maps, 75 are created and it uses only 25)

Day11 - Part1/cached_multicache time:   [145.63 µs 147.01 µs 148.82 µs]
Day11 - Part2/cached_multicache time:   [8.6433 ms 8.6509 ms 8.6587 ms]

This next approach I have taken from a Discord conversation:
Instead of checking stones individually, stones with the same number can be grouped together and updated together, for example, 5 stones with the number 1234, will always become 5 stones with the number 12, and 5 stones with the number 34, it is irrelevant to the total amount of stones how these 10 new stones are placed within the entire line.
To implement this approach, I use a HashMap to track the amount of each number:

RUST
fn step_grouped(mut stones: FxHashMap<u64, u64>, steps: u8) -> u64 {
    let mut next_stones = FxHashMap::<u64, u64>::default();
    for _ in 0..steps {
        // advance all groups one step
        for (&stone, &count) in &stones {
            if stone == 0 {
                next_stones
                    .entry(1)
                    .and_modify(|existing_count| *existing_count += count)
                    .or_insert(count);
            } else {
                let digits = stone.ilog10() + 1;
                if digits % 2 == 0 {
                    let tenpow = 10u64.pow(digits / 2);
                    next_stones
                        .entry(stone / tenpow)
                        .and_modify(|existing_count| *existing_count += count)
                        .or_insert(count);
                    next_stones
                        .entry(stone % tenpow)
                        .and_modify(|existing_count| *existing_count += count)
                        .or_insert(count);
                } else {
                    next_stones
                        .entry(stone * 2024)
                        .and_modify(|existing_count| *existing_count += count)
                        .or_insert(count);
                }
            };
        }
        // swap the maps, double buffer approach
        stones.clear();
        swap(&mut stones, &mut next_stones);
    }
    stones.into_values().sum()
}

Some things to note:

This is even faster than the cached solution:

Day11 - Part1/grouped   time:   [105.27 µs 105.36 µs 105.45 µs]
Day11 - Part2/grouped   time:   [4.9255 ms 4.9305 ms 4.9365 ms]

At this point I tried to reduce allocations by preallocating big enough HashMaps, the exact amount of slots varies based on the algorithm and part(for example, the single hash map solution for part 2 was set to 150k to avoid any reallocations, and the other part 2 HashMaps were set to 5k).
This significantly improved the solutions of all the cached solutions:

Day11 - Part1/cached(OLD)            time:   [135.03 µs 135.14 µs 135.24 µs]
Day11 - Part1/cached                 time:   [95.527 µs 95.740 µs 95.993 µs]
Day11 - Part1/cached_multicache(OLD) time:   [145.63 µs 147.01 µs 148.82 µs]
Day11 - Part1/cached_multicache      time:   [78.156 µs 78.462 µs 78.840 µs]
Day11 - Part2/cached(OLD)            time:   [10.313 ms 10.336 ms 10.362 ms]
Day11 - Part2/cached                 time:   [5.1566 ms 5.1696 ms 5.1894 ms]
Day11 - Part2/cached_multicache(OLD) time:   [8.6433 ms 8.6509 ms 8.6587 ms]
Day11 - Part2/cached_multicache      time:   [4.7614 ms 4.7750 ms 4.7904 ms]

And even helped the grouped solutions a little:

Day11 - Part1/grouped   time:   [100.12 µs 100.22 µs 100.32 µs]
Day11 - Part2/grouped   time:   [4.7336 ms 4.7438 ms 4.7564 ms]

At this point the multi-cache solution for part 2 is about as fast as the grouped solution, and the multi-cache solution for part 1 is faster than the grouped solution.

Final Times

As usual, time to unlock the CPU clock:

Day11 - Part1/cached            time:   [53.077 µs 53.438 µs 54.021 µs]
Day11 - Part1/cached_multicache time:   [48.077 µs 48.771 µs 49.463 µs]
Day11 - Part1/grouped           time:   [57.347 µs 57.507 µs 57.659 µs]
Day11 - Part2/cached            time:   [2.9647 ms 2.9701 ms 2.9764 ms]
Day11 - Part2/cached_multicache time:   [2.8079 ms 2.8120 ms 2.8163 ms]
Day11 - Part2/grouped           time:   [3.1182 ms 3.1223 ms 3.1268 ms]
Up: Advent Of Code 2024
Previous: Day 10 - Hoof It Next: Day 12 - Garden Groups