@pnkfelix
), Mozilla ResearchSlides: http://bit.ly/2cMFbYB
space | next slide |
shiftspace | previous slide |
esc | overview |
arrows | navigate |
Any sufficiently advanced technology is indistinguishable from magic.
Third Law, Arthur C. Clarke
The only way of discovering the limits of the possible is to venture a little way past them into the impossible.
Second Law, Arthur C. Clarke
(Clarke's first law is an impossible statement about elderly scientists.)
Lots of interesting tech in Rust
Borrow Checker
Trait and Object System
Destructor Semantics
L-value match
-ing
Any of above can look like magic
But, subtyping has characteristics of a magic show
"We all already know what subtyping is..."
"You can lie as entertainment, but still be honest and moral"
Penn Jillette, attributed to James Randi
I'm probably going to lie to you several times during this presentation.
(Sometimes exploratory science, sometimes magic show.)
Raise your hand if you ...
T
(i.e. &[T]
) and a Vec<T>
?PhantomData<T>
is?Insisting types of inputs exactly match expected types leads compiler to reject programs that seem obviously well-behaved.
"Have a &mut Vec<T>
, but this function wants a &[T]
slice..."
"Have a String
, but this method is only defined on str
..."
"Want to return Err(IOErr)
but the return type is Result<(), ParseErr>
..."
Amazingly, "try it and see" often works.
"Have a specific error, but result's error is more general. What now?"
fn phase_1(x: Ast) -> Result<Mir, PhaseOneErr> { /* ... */ }
fn phase_2(y: Mir) -> Result<Out, PhaseTwoErr> { /* ... */ }
fn composed(a: Ast) -> Result<Out, EndToEndErr> {
let b = try!(phase_1(a));
let c = try!(phase_2(b));
return Ok(c);
}
(we will look more at an instance of this example later)
Vec<T>
and [T]
look like subtyping
"Have a Vec<i32>
, but code expects a [i32]
slice. What now?"
fn rotate(nums: &mut [i32]) {
let len = nums.len();
let first = nums[0]; // save first number
for i in 1..len { nums[i-1] = nums[i]; } // shift up all other numbers
nums[len-1] = first; // put saved number at end
}
fn demo_rotate() {
let v = vec![1, 2, 3];
rotate(v);
assert_eq!(v, &[2, 3, 1]);
}
does it compile?
Vec<T>
and [T]
look like subtyping
"Have a Vec<i32>
, but code expects a [i32]
slice. What now?"
fn rotate(nums: &mut [i32]) {
let len = nums.len();
let first = nums[0]; // save first number
for i in 1..len { nums[i-1] = nums[i]; } // shift up all other numbers
nums[len-1] = first; // put saved number at end
}
fn demo_rotate() {
let v = vec![1, 2, 3];
rotate(v);
assert_eq!(v, &[3, 1, 2]);
}
fn demo_rotate() {
let mut v = vec![1, 2, 3];
rotate(&mut v);
assert_eq!(v, &[2, 3, 1]);
}
mod demo_vec_slice {
fn provide(m: &mut Vec<i32>) { expect(m); }
fn expect(_nums: &mut [i32]) { unimplemented!() }
}
"If it compiles, it works"
The reference types &mut T
and &T
look like candidates for having some sort of compatibility relationship.
"Have a &mut [i32]
, but code expects a &[i32]
slice. What now?"
mod demo_mut_slice_imm_slice {
fn provide(m: &mut [i32]) { expect(m); }
fn expect(_nums: &[i32]) { unimplemented!() }
}
If you have self
or Self
, you're (probably) talking about a method
"Have an X, but code expects a Y",
for every X,Y from { Self, &Self, &mut Self }
Let us try "something simple": Methods on char
arrays.
Review: Extending types with new methods
trait AsciiIncr {
/// Increments `self` by one unit. (Only works for 7-bit ASCII characters though.)
fn incr(&mut self);
}
impl AsciiIncr for char {
fn incr(&mut self) { *self = (*self as u8 + 1) as char }
}
#[test]
fn check_ascii_incr() {
let mut c: char = 'a';
c.incr();
assert_eq!(c, 'b');
}
Exploration: Mock up trivially simple traits, rather than get bogged down with code of "useful" traits (like AsciiIncr
).
trait Receiver {
fn by_ref(&self); fn by_mut(&mut self); fn by_val(self) where Self: Sized;
}
impl Receiver for [char; 2] {
fn by_ref(&self) { println!("ref: {:?}", self[0]); }
fn by_mut(&mut self) { println!("mut: {:?}", self[0]); self[1].incr(); }
fn by_val(mut self) { println!("val: {:?}", self[0]); self[1].incr(); }
}
trait Receiver {
fn by_ref(&self); fn by_mut(&mut self); fn by_val(self) where Self: Sized;
}
impl Receiver for [char; 2] {
fn by_ref(&self) { println!("ref: {:?}", self[0]); }
fn by_mut(&mut self) { println!("mut: {:?}", self[0]); self[1].incr(); }
fn by_val(mut self) { println!("val: {:?}", self[0]); self[1].incr(); }
}
fn demo_obvious_cases() {
let a = ['a', '1']; let b = &['b', '4']; let c = &mut ['c', '7'];
// [char; 2] &[char; 2] &mut [char; 2]
a.by_val();
b.by_ref();
c.by_mut();
println!("obvious: (a,b,c): {:?}", (a,b,c));
}
prints:
val: 'a'
ref: 'b'
mut: 'c'
obvious: (a,b,c): (['a', '1'], ['b', '4'], ['c', '8'])
trait Receiver {
fn by_ref(&self); fn by_mut(&mut self); fn by_val(self) where Self: Sized;
}
impl Receiver for [char; 2] {
fn by_ref(&self) { println!("ref: {:?}", self[0]); }
fn by_mut(&mut self) { println!("mut: {:?}", self[0]); self[1].incr(); }
fn by_val(mut self) { println!("val: {:?}", self[0]); self[1].incr(); }
}
#[test]
fn demo_interesting_cases() {
let mut a = ['a', '1']; let b = &['b', '4']; let c = &mut ['c', '7'];
// [char; 2] &[char; 2] &mut [char; 2]
a.by_val(); b.by_val(); c.by_val();
a.by_ref(); b.by_ref(); c.by_ref();
a.by_mut(); /* ... */ c.by_mut();
println!("interesting: (a,b,c): {:?}", (a,b,c));
}
Only b.by_mut()
rejected by compiler.
For receiver for a method (i.e. the x
in x.m(...)
):
&[char; n]
to &mut [char; n]
is disallowed[char; n]
to &mut [char; n]
&[char; n]
to [char; n]
Does this make sense?
(How can we go from a reference to a value?)
How could Vec<T>
possibly be a subtype of &[T]
? They have incompatible representations.
(However, a language hacker's view may differ...)
(Temporarily putting aside previous examples)
Classic example: "Records"
"Have a record with these fields, want one with a subset of the fields"
A Rust analogue: "Have (A, B, C)
, this function wants (A, B)
"
mod demo_subtuple {
fn provide<A, B, C>(tup: (A, B, C)) { expect(tup); }
fn expect<A, B>(_tup: (A, B)) { unimplemented!(); }
}
(Q: Why might disallowing this be a good idea for a language like Rust?)
Academic example: Functions
Int
, Real
where Int <: Real
"Have a function f
of type Real -> Int
, want one like Real -> Real
"
i.e., is it legal to pass f
as if it is a Real -> Real
?
What about a function g
of type Int -> Int
? Can that be passed as Real -> Real
?
Int -> Int
, want one like Real -> Real
"g: Int -> Int
assumes input is always an Int
, yet client taking Real -> Real
is free to apply it to -7.2 or ⅓ or π.twice: (Real -> Real) Real -> Real
twice(f, r) = f(f(r))
num_divisors: Int -> Int
num_divisors(4) = |{1, 2, 4}| = 3
num_divisors(12) = |{1, 2, 3, 4, 6, 12}| = 6
twice(num_divisors, 2.3) = num_divisors(num_divisors(2.3))
= num_divisors(???)
= ill-defined!
Int -> Int
<: Real -> Real
Real -> Int
, want one like Real -> Real
"Int -> Int
<: Real -> Real
"Have a function of type Real -> Int
, want one like Real -> Real
"
twice: (Real -> Real) Real -> Real
twice(f, r) = f(f(r))
ceil_plus: Real -> Int
ceil_plus(x) = ⌈x⌉ + 1
"Client says they can handle a production of any real number, so it is safe to provide something that will only produce integer values."
Real -> Int
, want one like Real -> Real
""Client says they can handle a production of any real number, so it is safe to provide something that will only produce integer values."
class Real { Real increment() { /* ... */ } }
class Int extends Real { Int increment() { /* ... */ } }
or, more accurately,
class RealFun { Real operation(Real x) { /* ... */ } }
class IntFun extends RealFun { Int operation(Real x) { /* ... */ } }
Example:
aka ->
is covariant with respect to its return type.
Example instance:
All caller can do is get an X
from calling the function;
So it is safe to narrow and use a Y
as the return value.
"Have a function of type Real -> Int
, want one like Int -> Int
"
Sometimes unintuitive
"Client says they will only feed integer values into the function, so it is safe to provide something that can consume any real number."
aka ->
is covariant with respect to its return type, and ->
is contravariant with respect to its argument type.
All caller can do is feed in more specific B
(and get out more general X
).
So it is safe to be more liberal and accept any A
at all, and guarantee the more specific Y
as return value.
Aside: What about when domain = range?
mod demo_variance_and_vec_slice {
fn provide(m: &Vec<i32>) { expect(m); }
fn expect(_nums: &[i32]) { unimplemented!() }
}
mod demo_variance_and_vec_slice_hof {
fn provide_hof(f: fn (&usize) -> &Vec<i32>) { expect_hof(f); }
fn expect_hof(_f: fn (&usize) -> &[i32]) { unimplemented!(); }
}
error: mismatched types [E0308]
fn provide_hof(f: fn (&usize) -> &Vec<i32>) { expect_hof(f); }
^
note: expected type `fn(&usize) -> &[i32]`
note: found type `fn(&usize) -> &std::vec::Vec<i32>`
("hof" stands for "higher-order function")
mod demo_variance_and_mut_slice {
fn provide(m: &mut [i32]) { expect(m); }
fn expect(_nums: &[i32]) { unimplemented!() }
}
mod demo_variance_and_mut_slice_hof {
fn provide_hof(f: fn (&usize) -> &mut [i32]) { expect_hof(f); }
fn expect_hof(_f: fn (&usize) -> &[i32]) { unimplemented!(); }
}
error: mismatched types [E0308]
fn provide_hof(f: fn (&usize) -> &mut [i32]) { expect_hof(f); }
^
note: expected type `fn(&usize) -> &[i32]`
note: found type `fn(&usize) -> &mut [i32]`
Simon Sapin (paraphrased):
I had some code that does not compile. I do not understand why, so to try to understand it, I narrowed it down to a particular use of
Cell
. But I didn't understand what was happening, so I tried making my tiny version ofCell
, and the problem went away.
Let's try to implement std::cell::Cell
, in user code.
struct MyCell<T> {
value: T,
}
impl<T: Copy> MyCell<T> {
fn new(x: T) -> MyCell<T> { MyCell { value: x } }
fn get(&self) -> T { self.value }
fn set(&self, value: T) {
use std::ptr;
unsafe {
ptr::write(&self.value as *const _ as *mut _, value);
}
}
}
Is this use of unsafe
sound?
(Actually completely broken; reasons include compiler details regarding aliasing info; we'll focus on another subtle one here.)
static X: i32 = 10;
#[test]
fn test_mycell_short_lifetime() {
let cell = MyCell::new(&X);
step1(&cell);
fn step1<'a>(r_c1: &MyCell<&'a i32>) { let val: i32 = 13;
step2(&val, r_c1);
println!("step1 value: {}", r_c1.value); }
fn step2<'b>(r_val: &'b i32, r_c2: &MyCell<&'b i32>) { r_c2.set(r_val); }
println!(" end value: {}", cell.value);
}
(DON'T PANIC: We're going to step through this.)
Output (some run on my machine):
step1 value: 13
end value: 28672
(28672
??? Bogus!) Where does this garbage come from?
static X: i32 = 10;
#[test]
fn test_mycell_short_lifetime() {
let cell = MyCell::new(&X);
step1(&cell);
fn step1<'a>(r_c1: &MyCell<&'a i32>) { let val: i32 = 13; step2(&val, r_c1); }
fn step2<'b>(r_val: &'b i32, r_c2: &MyCell<&'b i32>) { r_c2.set(r_val); }
}
before step1(&cell);
:
static X: i32 = 10;
#[test]
fn test_mycell_short_lifetime() {
let cell = MyCell::new(&X);
step1(&cell);
fn step1<'a>(r_c1: &MyCell<&'a i32>) { let val: i32 = 13; step2(&val, r_c1); }
fn step2<'b>(r_val: &'b i32, r_c2: &MyCell<&'b i32>) { r_c2.set(r_val); }
}
before step2(&val, r_c1);
:
static X: i32 = 10;
#[test]
fn test_mycell_short_lifetime() {
let cell = MyCell::new(&X);
step1(&cell);
fn step1<'a>(r_c1: &MyCell<&'a i32>) { let val: i32 = 13; step2(&val, r_c1); }
fn step2<'b>(r_val: &'b i32, r_c2: &MyCell<&'b i32>) { r_c2.set(r_val); }
}
before r_c2.set(r_val)
:
static X: i32 = 10;
#[test]
fn test_mycell_short_lifetime() {
let cell = MyCell::new(&X);
step1(&cell);
fn step1<'a>(r_c1: &MyCell<&'a i32>) { let val: i32 = 13; step2(&val, r_c1); }
fn step2<'b>(r_val: &'b i32, r_c2: &MyCell<&'b i32>) { r_c2.set(r_val); }
}
after r_c2.set(r_val)
:
static X: i32 = 10;
#[test]
fn test_mycell_short_lifetime() {
let cell = MyCell::new(&X);
step1(&cell);
fn step1<'a>(r_c1: &MyCell<&'a i32>) { let val: i32 = 13; step2(&val, r_c1); }
fn step2<'b>(r_val: &'b i32, r_c2: &MyCell<&'b i32>) { r_c2.set(r_val); }
}
after step2
returns:
static X: i32 = 10;
#[test]
fn test_mycell_short_lifetime() {
let cell = MyCell::new(&X);
step1(&cell);
fn step1<'a>(r_c1: &MyCell<&'a i32>) { let val: i32 = 13; step2(&val, r_c1); }
fn step2<'b>(r_val: &'b i32, r_c2: &MyCell<&'b i32>) { r_c2.set(r_val); }
}
after step1
returns:
Either MyCell
is broken, or the compiler is.
Cell
also broken?mod test_stdcell_short_lifetime {
use std::cell::Cell;
static X: i32 = 10;
#[test]
fn test_stdcell_short_lifetime() {
let cell = Cell::new(&X);
step1(&cell);
}
fn step1<'a>(r_c1: &Cell<&'a i32>) { let val: i32 = 13; step2(&val, r_c1); }
fn step2<'b>(r_val: &'b i32, r_c2: &Cell<&'b i32>) { r_c2.set(r_val); }
}
Compiler output:
error: `val` does not live long enough
step2(&val, r_c1);
^~~
note: reference must be valid for the lifetime 'a as defined on the block at 775:39...
fn step1<'a>(r_c1: &Cell<&'a i32>) {
^
note: ...but borrowed value is only valid for the block suffix following statement 0 at 776:26
let val: i32 = 13;
^
MyCell
and Cell
?struct MyCell<T> {
value: T,
}
pub struct Cell<T> {
value: UnsafeCell<T>,
}
#[lang = "unsafe_cell"]
pub struct UnsafeCell<T: ?Sized> {
value: T,
}
MyCell
just holds a T
, while Cell
holds an UnsafeCell<T>
, which is a "language item" known specially to the compiler.
MyCell
and reject Cell
?/// Picks either `x` or `y`, based on some internal choice.
fn pick<'a>(x: &'a i32, y: &'static i32) -> &'a i32 {
if *x > 0 { x } else { y }
}
static GLOBAL: i32 = 100;
#[test] fn pick_test() { let temp: i32 = 200; pick(&temp, &GLOBAL); }
The value y: &'static i32
is able to be returned as a &'a i32
.
This is sound because 'static
outlives 'a
'static: 'a
, pronounced "outlives")fn promote<'a>(x: &'a i32) -> &'static i32 {
return x;
}
#[test] fn promote_test() { let temp: i32 = 200; promote(&temp); }
Not legal to return &'a i32
as a &'static i32
for arbitrary 'a
; otherwise, dangling pointers.
Insight: For any type T
and any lifetime 'a
, clearly &'static T
should be valid anywhere that &'a T
is.
Gentzen style
For any type T
and lifetimes 'a
, 'b
, if 'b
outlives 'a
, then &'b T
should be valid anywhere &'a T
is.
Already established we should have &'static T <: &'a T
.
What about a reference to a reference?
Should &'a &'static T <: &'a &'a T
...?
Intuition: All you can do with a &X
is read data from the X
it holds.
Analogous to a function A -> Y <: A -> X
&X
is covariant with respect to X
But that's &X
; what about &mut X
?
Should &'a mut &'static T <: &'a mut &'a T
...?
Allowing that exposes:
mod promote_short_to_static {
static X: i32 = 10;
fn test() {
let mut ptr: &'static i32 = &mut X;
step1(&mut ptr);
}
fn step1<'a>(r1: &'a mut &'static i32) { let val: i32 = 13; step2(r1, &val); }
fn step2<'b, T>(r2: &'b mut &'b T, r_val: &'b T) { *r2 = r_val; }
}
Insight: for any type X
and any lifetime 'a
, &'static mut X
is valid anywhere &'a mut X
is.
But we cannot generalize to Y <: X
Intuition: Once you allow mutation through a reference, the type itself must remain fixed.
Java example: Does this compile? (FYI Float <: Number
)
static void modify_array(Number[] numArray) { numArray[0] = new Float(3.14); }
Java array T[]
is covariant with respect to T
.
What happens here?
static void mut_array() {
Integer[] intArray = new Integer[1]; modify_array(intArray); }
Exception in thread "main" java.lang.ArrayStoreException: java.lang.Float
at Examples.modify_array(Examples.java:25)
at Examples.mut_array(Examples.java:21)
at Examples.main(Examples.java:9)
(Old wart historical artifact; predates Java support for generics.)
mod demo_variance_and_static_ref {
fn provide(m: &'static i32) { let val = 13; expect(&val, m); }
fn expect<'a>(_: &'a i32, _r: &'a i32) { unimplemented!() }
}
mod demo_variance_and_static_ref_hof {
fn prov_hof(f: fn(&usize) -> &'static i32) { let val = 13; exp_hof(&val, f); }
fn exp_hof<'a>(_: &'a i32, _f: fn (&'a usize) -> &'a i32) { unimplemented!() }
}
Functions even contravariant with respect to their argument types
(Rust does not have method overloading so no conflict there)
Compiler deduces the variance of a type (with respect to its type parameters) based on the structure of that type.
struct OuterInline<T> { one: T, inner: InnerInline<T> }
struct InnerInline<T> { data: T }
InnerInline
and OuterInline
both covariant with respect to T
struct OuterRef<'a, T: 'a> { one: &'a mut T, inner: InnerRef<'a, T> }
struct InnerRef<'a, T: 'a> { data: &'a T }
InnerRef
is covariant w.r.t. T
, while OuterRef
is invariant w.r.t. T
If compiler sees a PhantomData<SomeType>
, it traverses the structure of SomeType
as if it were embedded directly.
Cell
, UnsafeCell
UnsafeCell<T>
is invariant with respect to T
, (and that bubbles out to Cell<T>
).
MyCell
Structural definition of MyCell
alone implies it is covariant w.r.t. T
This (broken) method violates rules associated with covariance:
fn set(&self, value: T) {
use std::ptr;
unsafe {
ptr::write(&self.value as *const _ as *mut _, value);
}
}
Must impose invariance
Use PhantomData<fn (T) -> T>
in MyCell<T>
: one way
Use UnsafeCell<T>
in MyCell<T>
: better way
The earlier examples are not kind of subtyping that references have.
For receiver for a method (i.e. the x
in x.m(...)
):
Compiler will automatically insert a borrow or dereferences to find a type that provides method m
.
See also StackOverflow Q
That, combined with dereference of a Copy
type, explains
let mut a = ['a', '1']; let b = &['b', '4']; let c = &mut ['c', '7'];
// [char; 2] &[char; 2] &mut [char; 2]
a.by_val(); b.by_val(); c.by_val();
a.by_ref(); b.by_ref(); c.by_ref();
a.by_mut(); /* ... */ c.by_mut();
println!("interesting: (a,b,c): {:?}", (a,b,c));
(Note: Parameters other than receiver do not get this special treatment)
When you have a type &Y
, where Y
can be dereferenced to yield X
, then compiler will automatically coerce &Y
to &X
by inserting necessary derefers.
See RFC 271: "Deref Conversions"
That, combined with impl
such that Vec<T>: Deref<Target=[T]>
, explains
As if things weren't confusing enough ...
If you pass a mutable value
reference into an expression context that is itself inferred to be expecting a reference, the compiler will automatically insert &mut *value
.
Main effect of this: passing a &mut T
does not always move it; sometimes it implicitly reborrows (so that you can mutate the &mut
after the call, rather than losing access to it).
How did we get EndToEndErr
from the results of phase_1
/phase_2
?
fn phase_1(x: Ast) -> Result<Mir, PhaseOneErr> { /* ... */ }
fn phase_2(y: Mir) -> Result<Out, PhaseTwoErr> { /* ... */ }
fn composed(a: Ast) -> Result<Out, EndToEndErr> {
let mir = try!(phase_1(a));
let out = try!(phase_2(mir));
return Ok(out);
}
Are PhaseOneErr
and PhaseTwoErr
subtypes of EndToEndErr
?
Answer: Magic is hidden behind the try!
(and some trait impls)
Answer: Magic is hidden behind the try!
(and some trait impls)
let mir = try!(phase_1(a));
expands to:
let mir =
match phase_1(a) {
::std::result::Result::Ok(val) => val,
::std::result::Result::Err(err) => {
return ::std::result::Result::Err(::std::convert::From::from(err))
}
};
(New ?
form today is same as try!
; easier to show try!
expansion)
inserts conversion via std::convert::From
transforming specific error to whatever error is expected in context of expression
::std::result::Result::Err(err) => {
return ::std::result::Result::Err(::std::convert::From::from(err))
}
inserts conversion via std::convert::From
transforming specific error to whatever error is expected in context of expression
enum EndToEndErr {
P1(PhaseOneErr),
P2(PhaseTwoErr),
}
impl From<PhaseOneErr> for EndToEndErr {
fn from(phase_1_err: PhaseOneErr) -> EndToEndErr {
EndToEndErr::P1(phase_1_err)
}
}
impl From<PhaseTwoErr> for EndToEndErr {
fn from(phase_2_err: PhaseTwoErr) -> EndToEndErr {
EndToEndErr::P2(phase_2_err)
}
}
(So, all the magic here is in macros and trait system.)
Compiler needs the expected type.
E.g. this compiles and runs:
fn process1(input: &[i32]) { }
fn foo() { process1(&vec![1, 2, 3]); }
but this fails at compile time:
trait Input { }
impl Input for [i32] { }
fn process2<I>(input: &I) where I: Input { }
fn bar() { process2(&vec![1, 2, 3]); }
fn process1(input: &[i32]) { }
Compiler sees input + expected types
&Vec -> &[i32]
coercionfn process2<I>(input: &I) where I: Input { }
Compiler decides I
is Vec<i32>
impl Input for Vec<i32>
&'static mut T |
<: ? |
&'a mut T |
&'a &'static mut T |
<: ? |
&'a &'a mut T |
&'a mut &'static T |
<: ? |
&'a mut &'a T |
One answer: How often do you actually answer such questions?
(be aware of subtyping, but need not be foremost in your mind)
In case you care: Yes, Yes, No
Getting covariance vs invariance right matters for soundness
But you should not need to think about it at all if you are not writing unsafe { ... }
(it is solely compiler's job until unsafe
gets involved)
See RFCs
The Book
and the Rustonomicon
And contribute back to them!
Keep asking questions
"Show me the code!"
Do the experiments
Magic: As real as you want it to be