Rust: Fearless at all Levels

Felix Klock (@pnkfelix), Mozilla

space: next slide; esc: overview; arrows navigate https://bit.ly/2pWTjC6

What is Rust?

New programming language

(... and ecosystem, and development community)

Emphasizing control, safety, and speed

(... especially when it comes to concurrency)

Low-level hacking without fear of segfaults nor data races

or more simply: "Hack without Fear"

Why use Rust?

Safety and Parallelism

Safety

No segmentation faults
No undefined behavior
No data races

(Multi-paradigm) Parallelism

msg passing via channels
shared state (R/W-capabilities controlled via types)
use native threads... or scoped threads... or work-stealing...

Lets talk performance

Demo

Example: Sum of (even) numbers

Add even values in an integer range.

Rust:

fn do_partial_sum(start: i64, to_do: i64) -> i64 {
    let mut result = 0;
    for i in start..(start + to_do) {
        if i % 2 == 0 { result += i; }
    }
    result
}

C++:

int64_t do_partial_sum(int64_t start, int64_t to_do) {
    int64_t result = 0;
    for (int64_t i = start; i < start + to_do; i++) {
        if (i % 2 == 0) { result += i; }
    }
    return result;
}

(avoiding reduction to closed form solution; we're benchmarking!)

Parallelism

Parallelism: Rust (v0)

play: https://bit.ly/2PH9te8

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + val > MAX { MAX - val } else { each };
    threads.push(std::thread::spawn(|| {
        result += do_partial_sum(start, incr);
    }));
    start += sum_per_thread;
}
for t in threads {
    t.join().unwrap();
}

Parallelism: C++

int64_t each = (MAX + thread_count - 1) / thread_count;
result = 0;
std::vector<std::thread> threads;
for (int64_t val = 0; val < MAX; val += each) {
    int64_t incr = (each + val > MAX) ? (MAX - val) : each;
    threads.push_back(std::thread([&, val, incr] () {
                result += do_partial_sum(val, incr);
            }));
}
for (int i = 0; i < thread_count; i++) {
    threads[i].join();
}

Let's try our C++ code

threads	average time	notes
1	(TBD)
4	(TBD)
1000	(TBD)

Four runs, 1 thread

.
.
.
.
.
.
.
.

Let's try our C++ code

threads	average time	notes
1	(TBD)
4	(TBD)
1000	(TBD)

Four runs, 1 thread

sum of evens < 2000000000 result: 999999999000000000
1.08s
sum of evens < 2000000000 result: 999999999000000000
1.08s
sum of evens < 2000000000 result: 999999999000000000
1.07s
sum of evens < 2000000000 result: 999999999000000000
1.07s

Let's try our C++ code

threads	average time	notes
1	1.075s
4	(TBD)
1000	(TBD)

Four runs, 1 thread

sum of evens < 2000000000 result: 999999999000000000
1.08s
sum of evens < 2000000000 result: 999999999000000000
1.08s
sum of evens < 2000000000 result: 999999999000000000
1.07s
sum of evens < 2000000000 result: 999999999000000000
1.07s

Let's try our C++ code

threads	average time	notes
1	1.075s
4	(TBD)
1000	(TBD)

Four runs, 4 threads

.
.
.
.
.
.
.
.

Let's try our C++ code

threads	average time	notes
1	1.075s
4	(TBD)
1000	(TBD)

Four runs, 4 threads

sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s

Let's try our C++ code

threads	average time	notes
1	1.075s
4	0.290s
1000	(TBD)

Four runs, 4 threads

sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.29s

Let's try our C++ code

threads	average time	notes
1	1.075s
4	0.290s
1000	(TBD)

Four runs, 1000 threads

.
.
.
.
.
.
.
.

Let's try our C++ code

threads	average time	notes
1	1.075s
4	0.290s
1000	(TBD)

Four runs, 1000 threads

sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.30s
sum of evens < 2000000000 result: 999598999001000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.30s

Let's try our C++ code

threads	average time	notes
1	~~1.075s~~	(but
4	~~0.290s~~	its all
1000	~~0.295s~~	bogus!)

Four runs, 1000 threads

sum of evens < 2000000000 result: 999999999000000000
0.29s
sum of evens < 2000000000 result: 999999999000000000
0.30s
sum of evens < 2000000000 result: 999598999001000000
0.29s                                ---------[uh oh]
sum of evens < 2000000000 result: 999999999000000000
0.30s

Data Race!

            threads.push_back(std::thread([&, val, incr] () {
                                        // ~
                        result += do_partial_sum(val, incr);
                     // ~~~~~~
                    }));

Revisiting Rust version

play: https://bit.ly/2PH9te8

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + val > MAX { MAX - val } else { each };
    threads.push(std::thread::spawn(|| {
        result += do_partial_sum(start, incr);
    }));
    start += sum_per_thread;
}
for t in threads {
    t.join().unwrap();
}

Revisiting Rust version

play: https://bit.ly/2PH9te8

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + val > MAX { MAX - val } else { each };
    threads.push(std::thread::spawn(|| {
        result += do_partial_sum(start, incr);
    }));
    start += sum_per_thread;
}
for t in threads {
    t.join().unwrap();
}

Why no timings for Rust version

play: https://bit.ly/2PH9te8

error[E0373]: closure may outlive the current function, but it 
      borrows `result`, which is owned by the current function
  --> partsum/src/v0.rs:47:37
   |
47 |     threads.push(std::thread::spawn(|| {
   |                                     ^^ may outlive borrowed
   |                                        value `result`
48 |         result += do_partial_sum(start, incr);
   |         ------ `result` is borrowed here
help: to force the closure to take ownership of `result` (and any
      other referenced variables), use the `move` keyword

The next version

Version 1

(our array indexes start at 0)

Lets try doing exactly what the compiler suggests

Parallelism: Rust, v1

play: https://bit.ly/2OuTOC8

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + val > MAX { MAX - val } else { each };
    threads.push(std::thread::spawn(move || { // <-- `move` is new
        result += do_partial_sum(start, incr);
    }));
    start += sum_per_thread;
}
for t in threads {
    t.join().unwrap();
}

It compiles!

Let's try our Rust code (v1)

play: https://bit.ly/2OuTOC8

Four runs, 1 thread

.
.
.
.
.
.
.
.

Let's try our Rust code (v1)

play: https://bit.ly/2OuTOC8

Four runs, 1 thread

result: 0
0.03s
result: 0
0.03s
result: 0
0.03s
result: 0
0.03s

"Once it compiles, it works"

Not this time

(sorry to disappoint)

Rust will not prove your code correct

What went wrong in v1

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + val > MAX { MAX - val } else { each };
    threads.push(std::thread::spawn(move || { // <-- `move` is new
        result += do_partial_sum(start, incr); // <-- `result` copied!
    }));
    start += sum_per_thread;
}
for t in threads {
    t.join().unwrap();
}

Ownership

Rust types

Generic Types, Bounded Polymorphism

fn choose<T>(x: T, y: T) -> T { return x; }

fn sum<T: Add>(x: T, y: T) -> T::Output { return x + y; }

fn sum3<T>(x: T, y: T, z: T) -> T where T: Add<Output=T> {
    return x + y + z;
}

Type Constructions

Move	Copy	Copy if `T:Copy`
`Vec<T>`, `String`, ...	`i32`, `char`, ...	`[T; n]`, `(T1,T2,T3)`, ...

Ownership is important

Ownership enables:	which removes:
RAII-style destructors	a source of memory leaks (or fd leaks, etc)
no dangling pointers	many resource management bugs
no data races	many multithreading heisenbugs

Do I need to take ownership here, accepting the associated resource management responsibility? Would temporary access suffice?

Good developers ask this already!

Rust forces function signatures to encode the answers

(and they are checked by the compiler)

More Rust types

References to data

&mut T, &T
(plus library reference types: Box<T>, Cow<T>, Rc<T>, ...)

Version 2

Parallelism: Rust, v2

play: https://bit.ly/2Crmsx1

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let result_ptr = &mut result; // <-- *borrowing* explictly
    let incr = if each + start > MAX { MAX - start } else { each };
    threads.push(std::thread::spawn(move || {
        *result_ptr += do_partial_sum(start, incr);
    }));
    start += each;
}
for t in threads {
    t.join().unwrap();
}

No go

play: https://bit.ly/2Crmsx1

error[E0597]: `result` does not live long enough
  --> partsum/src/v2.rs:46:35
   |
46 |             let result_ptr = &mut result;
   |                                   ^^^^^^ borrowed value does not
   |                                          live long enough
...
59 | }
   | - borrowed value only lives until here
   |
   = note: borrowed value must be valid for the static lifetime...

Borrows cannot outlive their owners

and std::thread::spawn creates a thread that may outlive its parent!

Must threads do this?

Do all threads terminate at arbitrary times?

Alternative threading APIs

Cargo and crates.io

Rayon

https://crates.io/crates/rayon

Crossbeam

https://crates.io/crates/crossbeam

Version 3

Parallelism: Rust, v3

play: https://bit.ly/2pZOTe4

let mut result = 0;
crossbeam::thread::scope(|the_scope| { // <-- create scope here
    let mut threads = Vec::new();
    let mut start = 0;
    let incr = if each + start > MAX { MAX - start } else { each };
    while start < MAX {
        let result_ptr = &mut result; // <-- (still borrowing explicitly)
        threads.push(the_scope.spawn(move || {
            *result_ptr += do_partial_sum(start, incr);
        }));
        start += each;
    }
    for t in threads {
        t.join().unwrap();
    } // now we *know* all threads finish while `result` still on stack
});

Still no go

play: https://bit.ly/2pZOTe4

error[E0499]: cannot borrow `result` as mutable more than once at a
              time
  --> src/main.rs:24:30
   |
24 |    let result_ptr = &mut result;
   |                     ^^^^^^^^^^^ mutable borrow starts here in
   |                                 previous iteration of loop

Why multiple `&`-reference types?

Distinguish exclusive access from shared access
Enables safe, parallel API's

Ownership	`T`
Exclusive access	`&mut T`	("mutable")
Shared access	`&T`	("read-only")

Exclusive access

Rust does not allow simultaneous mutable borrows of unsynchronized data

This is usually ensured via static analysis.

(But some standard library types enforce this rule via dynamic checks.)

So what can we do?

In both C++ and in Rust, you can easily eliminate the data race

common approach: can wrap a mutex around result
better: use atomic operation (e.g. compare-and-swap, fetch-and-add)
better still: use separate result receiver for each thread, then add those up at end.

Point is: Rust forces you to resolve this bug
C++ allows silent, scheduler dependent failure (which may arise only rarely)

Version 4

Parallelism: Rust, v4

play: https://bit.ly/2QXsCsq

let mut threads = Vec::new();
result = 0;
let mut start = 0;
while start < MAX {
    let incr = if each + start > MAX { MAX - start } else { each };
    threads.push(std::thread::spawn(move || {
        return do_partial_sum(start, incr);
    }));
    start += each;
}
for t in threads {
    // updates result solely on main thread, after each join.
    result += t.join().unwrap();
}

In case of pedantry

Rust timings

play: https://bit.ly/2QXsCsq

threads	average time	notes
1	0.820s	Slightly faster
4	0.220s	than earlier C++ code;
1000	0.235s	dunno why

Four runs, 1000 threads

sum of evens < 2000000000 result: 999999999000000000
0.23s
sum of evens < 2000000000 result: 999999999000000000
0.23s
sum of evens < 2000000000 result: 999999999000000000
0.24s
sum of evens < 2000000000 result: 999999999000000000
0.24s

Other cool stuff

Many things in Rust that leverage or extend ownership and borrowing to get fast, flexible, safe code

Send and Sync traits

Lifetimes

Object-orientation via trait-references

Unboxed closures

FFI interop

Growing community

http://rustconf.com/, https://www.rust-belt-rust.com/ (USA)

https://rustfest.eu/ (EU)

Rust Paris: https://www.meetup.com/Rust-Paris/

Rust as enabler of individuals

From "mere script programmer"

to "lauded systems hacker"

Programming in Rust has made me look at C++ code in a whole new light

Thanks

www.rust-lang.org Hack Without Fear

Rust: Fearless at all Levels

What is Rust?

What is Rust?

New programming language

Emphasizing control, safety, and speed

Low-level hacking without fear of segfaults nor data races

Why use Rust?

Safety and Parallelism

Safety

(Multi-paradigm) Parallelism

Lets talk performance

Demo

Example: Sum of (even) numbers

Parallelism

Parallelism: Rust (v0)

Parallelism: C++

Let's try our C++ code

Let's try our C++ code

Let's try our C++ code

Let's try our C++ code

Let's try our C++ code

Let's try our C++ code

Let's try our C++ code

Let's try our C++ code

Let's try our C++ code

Data Race!

Data Race!

Revisiting Rust version

Revisiting Rust version

Why no timings for Rust version

The next version

Version 1

Lets try doing exactly what the compiler suggests

Parallelism: Rust, v1

It compiles!

Let's try our Rust code (v1)

Let's try our Rust code (v1)

"Once it compiles, it works"

What went wrong in v1

Ownership

Rust types

Generic Types, Bounded Polymorphism

Ownership is important

Sharing Data: Ownership and References

More Rust types

Version 2

Parallelism: Rust, v2

No go

Borrows cannot outlive their owners

Must threads do this?

Alternative threading APIs

Cargo and crates.io

Rayon

Crossbeam

Version 3

Parallelism: Rust, v3

Still no go

Why multiple &-reference types?

Exclusive access

So what can we do?

Version 4

Parallelism: Rust, v4

In case of pedantry

Rust timings

Other cool stuff

Other cool stuff

Growing community

Rust as enabler of individuals

From "mere script programmer"

to "lauded systems hacker"

Thanks

Why multiple `&`-reference types?