Understanding Ownership of Rust: Best Practices
In my last story Understanding Ownership of Rust: Rules, I introduced what is the ownership and the basic rules to play with it.
The ownership mechanism makes your program safe in using memory. But it also increases the time that you need to spend in considering how you should use memory and work with the ownership rules.
In this article, I am going to share some skills/tools that may make Rust coding (against the ownership mechanism) easier.
Box
How to allocate some memory form heap? Using Box
is one of the ways.
See an example using Box
:
struct DataOnHeap {
x: u32,
y: u32,
}
fn main() {
let b0 = Box::new(5);
let b1 = Box::new(DataOnHeap { x: 0, y: 0 });
}
As the name suggests, you can understand the Box
a container that holds some heap memory. Now b0
holds 4 bytes of memory that is allocated from the heap, and value 5
is written to the memory. b1
holds the heap memory for struct DataOnHeap
.
Box
implements Drop
trait, so when the boxes you created are out of their scope (come to the end of main
in this case), they will be dropped. The heap memory will be de-allocated.
Box
also implements Deref
, so you can use a box like a reference. You can access the content of a box with a de-reference
operator *
, but that’s not mandatory, because the compiler can do that for you when necessary. See following example:
fn main() {
let b0 = Box::new(5);
println!("{}, {}", b0, *b0); // Prints: 5, 5
}
Please note, the rules of the ownership still apply with Box
. Once you move the ownership of the content out of a box, the box will be invalid from that point.
struct DataOnHeap {
x: u32,
y: u32,
}
fn main() {
let b1 = Box::new(DataOnHeap { x: 0, y: 0 });
let b1_real = *b1;
println!("{}, {}", b1_real.x, b1_real.y);
println!("{}, {}", b1.x, b1.y);
}
The code fails to build. Because with let b1_real = *b1;
the content of the box has been de-referenced out, and the ownership of the struct DataOnHeap
has been moved to b1_real
. The heap memory can no longer be borrowed in the box.
|
8 | let b1_real = *b1;
| --- value moved here
9 | println!("{}, {}", b1_real.x, b1_real.y);
10 | println!("{}, {}", b1.x, b1.y);
| ^^^^ value borrowed here after move
Rc
Box
makes it convenient to manage heap memory, but not enough.
When you need to access the heap data in different functions or different threads, you still need to manage the ownership and the lifetime of the Box variable itself.
The following code will not work obviously:
struct DataOnHeap {
x: u32,
y: u32,
}
fn func(b: Box<DataOnHeap>) {
println!("{}, {}", b.x, b.y);
}
fn main() {
let b = Box::new(DataOnHeap { x: 0, y: 0 });
func(b);
println!("{}, {}", b.x, b.y); // <-- Invalid borrowing
}
You can change the parameter to func
a reference of Box
, rather than the box itself. Yes, that works. But you know using references is not always the best approach.
Can it be easier? Yes, using Rc
make things easier.
Rc
means Reference Counted, it is equivalent to the smart counter in C++. Using Rc
, the ownership is shared, so you don’t need to take care of the lifetime of the variable, it will be dropped automatically when the last reference go out of its scope.
Here is an example:
use std::rc::Rc;
struct DataOnHeap {
x: u32,
y: u32,
}
fn func(rc: Rc<DataOnHeap>) {
println!("{}, {}", rc.x, rc.y);
}
fn main() {
let rc = Rc::new(DataOnHeap { x: 0, y: 0 });
func(rc.clone());
println!("{}, {}", rc.x, rc.y);
}
The usage of the Rc
is almost the same as using Box
. Just when you need to create one more reference to the data, you need to call the clone()
method of it. That way you got an identical Rc
instance as the original one, but the heap memory inside the Rc
container was not copied, both Rc
variables point to the same heap memory. The ownership of the heap memory is shared among all the Rc
variables.
With Rc
, which makes the heap memory ownership sharable, you are free from taking care of the lifetime of the heap data.
Arc
Rc
is good, but it also comes with a problem: Imagine, if the Rc
’s that are cloned from the same source are modified in different threads, the concurrency can make the reference counting uncertain. For this reason, Rust doesn’t allow Rc
to be moved in to a thread, see a discussion here.
Then how to enjoy the convenience of shared ownership in a safer way? The answer is Arc
, which stands for Atomically Reference Counted.
The previous example evolves with Arc
like:
use std::sync::Arc;
struct DataOnHeap {
x: u32,
y: u32,
}
fn func(rc: Arc<DataOnHeap>) {
println!("{}, {}", rc.x, rc.y);
}
fn main() {
let rc = Arc::new(DataOnHeap { x: 0, y: 0 });
func(rc.clone());
println!("{}, {}", rc.x, rc.y);
}
Arc
works almost the same way as Rc
. But it uses atomic operations for its reference counting, it is thread safe.
Please notice, when I said that Arc is thread safe, I limited that to the reference counting. That means, in multiple threads, Arc
guarantees that reference count increasing and decreasing is always correct, but it doesn’t guarantee the access to the referenced heap memory is thread safe.
If you need to make the heap memory thread safe as well as its Arc
container, it’s better to combine Arc
with Mutex
. The code will become like this:
use std::sync::{Arc, Mutex};
struct DataOnHeap {
x: u32,
y: u32,
}
fn func(rc: Arc<Mutex<DataOnHeap>>) {
let d = rc.lock().unwrap();
println!("{}, {}", d.x, d.y);
}
fn main() {
let rc = Arc::new(Mutex::new(DataOnHeap { x: 0, y: 0 }));
func(rc.clone());
let d = rc.lock().unwrap();
println!("{}, {}", d.x, d.y);
}
To access the shared heap content, you need to lock the mutex at first. When a thread succeeds to obtain the lock, the access attempts from other threads will be blocked until the lock is released.
Using Arc
and Mutex
, the heap memory access is easy and safe enough, but it comes with the price of performance. You need to balance.