Re: [RFC PATCH] rust: types: Add explanation for ARef pattern

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Benno,

Thanks for taking a look.

On Thu, Jul 25, 2024 at 06:51:56PM +0000, Benno Lossin wrote:
> On 10.07.24 05:24, Boqun Feng wrote:
> > As the usage of `ARef` and `AlwaysRefCounted` is growing, it makes sense
> > to add explanation of the "ARef pattern" to cover the most "DO" and "DO
> > NOT" cases when wrapping a self-refcounted C type.
> > 
> > Hence an "ARef pattern" section is added in the documentation of `ARef`.
> > 
> > Signed-off-by: Boqun Feng <boqun.feng@xxxxxxxxx>
> > ---
> > This is motivated by:
> > 
> > 	https://lore.kernel.org/rust-for-linux/20240705110228.qqhhynbwwuwpcdeo@vireshk-i7/
> > 
> >  rust/kernel/types.rs | 156 +++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 156 insertions(+)
> > 
> > diff --git a/rust/kernel/types.rs b/rust/kernel/types.rs
> > index bd189d646adb..70fdc780882e 100644
> > --- a/rust/kernel/types.rs
> > +++ b/rust/kernel/types.rs
> > @@ -329,6 +329,162 @@ pub unsafe trait AlwaysRefCounted {
> >  ///
> >  /// The pointer stored in `ptr` is non-null and valid for the lifetime of the [`ARef`] instance. In
> >  /// particular, the [`ARef`] instance owns an increment on the underlying object's reference count.
> > +///
> > +/// # [`ARef`] pattern
> > +///
> > +/// "[`ARef`] pattern" is preferred when wrapping a C struct which has its own refcounting
> 
> I would have written "[...] struct which is reference-counted, because
> [...]", is there a specific reason you wrote "its own"?
> 

"its own" indicates the reference counters are inside the object (i.e.
self refcounted), it's different than `Arc<T>` where the reference
counters are "attached" to `T`. Your version looks good to me as well.

> > +/// mechanism, because it decouples the operations on the object itself (usually via a `&Foo`) vs the
> > +/// operations on a pointer to the object (usually via an `ARef<Foo>`). For example, given a `struct
> 
> Not exactly sure I understand your point here, what exactly is the
> advantage of decoupling the operations?
> In my mind the following points are the advantages of using `ARef`:
> (1) prevents having to implement multiple abstractions for a single C
>     object: say there is a `struct foo` that is both used via reference
>     counting and by-value on the stack. Without `ARef`, we would have to
>     write two abstractions, one for each use-case. With `ARef`, we can
>     have one `Foo` that can be wrapped with `ARef` to represent a
>     reference-counted object.
> (2) `ARef<T>` always represents a reference counted object, so it helps
>     with understanding the code. If you read `Foo`, you cannot be sure
>     if it is heap or stack allocated.
> (3) generalizes common code of reference-counted objects (ie avoiding
>     code duplication) and concentration of `unsafe` code.
> 
> In my opinion (1) is the most important, then (2). And (3) is a nice
> bonus. If you agree with the list above (maybe you also have additional
> advantages of `ARef`?) then it would be great if you could also add them
> somewhere here.
> 

Basically to me, the advantages are mostly (1) and (2) in your list,
thank you for the list. And I did try to use an example (below) to
explain these, because I felt an example of the bad cases is
straightforward.

I will add your list here, because although an example may be
straightforward of reading, a list of advantages are better for
references. Again, thanks a lot!

> > +/// foo` defined in C, which has its own refcounting operations `get_foo()` and `put_foo()`. Without
> > +/// "[`ARef`] pattern", i.e. **bad case**:
> 
> Instead of "bad case" I would have written "i.e. you want to avoid this:".
> 

I'm OK with your version, but for my personal interest, why? ;-)

> > +///
> > +/// ```ignore
> > +/// pub struct Foo(NonNull<foo>);
> > +///
> > +/// impl Foo {
> > +///     // An operation on the pointer.
> > +///     pub unsafe fn from_ptr(ptr: *mut foo) -> Self {
> > +///         // Note that whether `get_foo()` is needed here depends on the exact semantics of
> > +///         // `from_ptr()`: is it creating a new reference, or it continues using the caller's
> > +///         // reference?
> > +///         unsafe { get_foo(ptr); }
> > +///
> > +///         unsafe { Foo(NonNull::new_unchecked(foo)) }
> > +///     }
> > +///
> > +///     // An operation on the object.
> > +///     pub fn get_bar(&self) -> Bar {
> > +///         unsafe { (*foo.0.as_ptr()).bar }
> > +///     }
> > +/// }
> > +///
> > +/// // Plus `impl Clone` and `impl Drop` are also needed to implement manually.
> > +/// impl Clone for Foo {
> > +///     fn clone(&self) -> Self {
> > +///         unsafe { get_foo(self.0.as_ptr()); }
> > +///
> > +///         Foo(self.0)
> > +///     }
> > +/// }
> > +///
> > +/// impl Drop for Foo {
> > +///     fn drop(&mut self) {
> > +///         unsafe { put_foo(self.0.as_ptr()); }
> > +///     }
> > +/// }
> > +/// ```
> > +///
> > +/// In this case, it's hard to tell whether `Foo` represent an object of `foo` or a pointer to
> > +/// `foo`.
> > +///
> > +/// However, if using [`ARef`] pattern, `foo` can be wrapped as follow:
> > +///
> > +/// ```ignore
> > +/// /// Note: `Opaque` is needed in most cases since there usually exist C operations on
> 
> I would disagree for the reason that `Opaque` is needed. You need it if
> the `foo` eg contains a bool, since C might just write a nonsense
> integer which would then result in immediate UB in Rust.
> Other reasons might be that certain bytes of `foo` are written to by
> other threads, even though on the Rust side we have `&mut Foo` (eg a
> `mutex`).
> 

hmm.. "since there usually exist C operations on ..." include these two
cases you mentioned, no? Plus, the reference counters themselves are not
marked as atomic at the moment, so without `Opaque`, we also have UB
because of the reference counters. I was trying to summarize all these
as "C operations on ...", maybe I should say "concurrent C operations on
..."? I am trying to be concise here since it's a comment inside a
comment ;-)

> > +/// /// `struct foo *`, and `#[repr(transparent)]` is needed for the safety of converting a `*mut
> > +/// /// foo` to a `*mut Foo`
> > +/// #[repr(transparent)]
> > +/// pub struct Foo(Opaque<foo>);
> > +///
> > +/// impl Foo {
> > +///     pub fn get_bar(&self) -> Bar {
> > +///         // SAFETY: `self.0.get()` is a valid pointer.
> > +///         //
> > +///         // Note: Usually extra safety comments are needed here to explain why accessing `.bar`
> > +///         // doesn't race with C side. Most cases are either calling a C function, which has its
> > +///         // own concurrent access protection, or holding a lock.
> > +///         unsafe { (*self.0.get()).bar }
> > +///     }
> > +/// }
> > +/// ```
> > +///
> > +/// ## Avoid `impl AlwaysRefCounted` if unnecesarry
> 
> I would move this section below the next one.
> 
> > +///
> > +/// If Rust code doesn't touch the part where the object lifetimes of `foo` are maintained, `impl
> > +/// AlwaysRefCounted` can be temporarily avoided: it can always be added later as an extension of
> > +/// the functionality of the Rust code. This is usually the case for callbacks where the object
> > +/// lifetimes are already maintained by a framework. In such a case, an `unsafe` `fn(*mut foo) ->
> > +/// &Foo` function usually suffices:
> > +///
> > +/// ```ignore
> > +/// impl Foo {
> > +///     /// # Safety
> > +///     ///
> > +///     /// `ptr` has to be a valid pointer to `foo` for the entire lifetime `'a'.
> > +///     pub unsafe fn as_ref<'a>(ptr: *mut foo) -> &'a Self {
> > +///         // SAFETY: Per function safety requirement, reborrow is valid.
> > +///         unsafe { &*ptr.cast() }
> > +///     }
> > +/// }
> > +/// ```
> > +///
> > +/// ## Type invariants of `impl AlwaysRefCounted`
> 
> I think you should first show how the example looks like with `ARef` and
> then talk about type invariants.
> 

These two sound good to me.

Regards,
Boqun

> > +///
> > +/// Types that `impl AlwaysRefCounted` usually needs an invariant to describe why the type can meet
> > +/// the safety requirement of `AlwaysRefCounted`, e.g.
> > +///
> > +/// ```ignore
> > +/// /// # Invariants:
> > +/// ///
> > +/// /// Instances of this type are always refcounted, that is, a call to `get_foo` ensures that the
> > +/// /// allocation remains valid at least until the matching call to `put_foo`.
> > +/// #[repr(transparent)]
> > +/// pub struct Foo(Opaque<foo>);
> > +///
> > +/// // SAFETY: `Foo` is always ref-counted per type invariants.
> > +/// unsafe impl AlwaysRefCounted for Foo {
> > +///     fn inc_ref(&self) {
> > +///         // SAFETY: `self.0.get()` is a valid pointer and per type invariants, the existence of
> > +///         // `&self` means it has a non-zero reference count.
> > +///         unsafe { get_foo(self.0.get()); }
> > +///     }
> > +///
> > +///     unsafe dec_ref(obj: NonNull<Self>) {
> > +///         // SAFETY: The refcount of `obj` is non-zero per function safety requirement, and the
> > +///         // cast is OK since `foo` is transparent to `Foo`.
> > +///         unsafe { put_foo(obj.cast()); }
> > +///     }
> > +/// }
> > +/// ```
> > +///
> > +/// After `impl AlwaysRefCounted for foo`, `clone()` (`get_foo()`) and `drop()` (`put_foo()`)  are
> 
> Typo: it should be `impl AlwaysRefCounted for Foo`.
> 
> ---
> Cheers,
> Benno
> 
> > +/// available to `ARef<Foo>` thanks to the generic implementation.
> > +///
> > +/// ## `ARef<Self>` vs `&Self`
> > +///
> > +/// For an `impl AlwaysRefCounted` type, `ARef<Self>` represents an owner of one reference count,
> > +/// e.g.
> > +///
> > +/// ```ignore
> > +/// impl Foo {
> > +///     /// Gets a ref-counted reference of [`Self`].
> > +///     ///
> > +///     /// # Safety
> > +///     ///
> > +///     /// - `ptr` must be a valid pointer to `foo` with at least one reference count.
> > +///     pub unsafe fn from_ptr(ptr: *mut foo) -> ARef<Self> {
> > +///         // SAFETY: `ptr` is a valid pointer per function safety requirement. The cast is OK
> > +///         // since `foo` is transparent to `Foo`.
> > +///         //
> > +///         // Note: `.into()` here increases the reference count, so the returned value has its own
> > +///         // reference count.
> > +///         unsafe { &*(ptr.cast::<Foo>()) }.into()
> > +///     }
> > +/// }
> > +/// ```
> > +///
> > +/// Another function that returns an `ARef<Self>` but with a different semantics is
> > +/// [`ARef::from_raw`]: it takes away the refcount of the input pointer, i.e. no refcount
> > +/// incrementation inside the function.
> > +///
> > +/// However `&Self` represents a reference to the object, and the lifetime of the **reference** is
> > +/// known at compile-time. E.g. the `Foo::as_ref()` above.
> > +///
> > +/// ## `impl Drop` of an `impl AlwaysRefCounted` should not touch the refcount
> > +///
> > +/// [`ARef`] descreases the refcount automatically (in [`ARef::drop`]) when it goes out of the
> > +/// scope, therefore there's no need to `impl Drop` for the type of objects (e.g. `Foo`) to decrease
> > +/// the refcount.
> >  pub struct ARef<T: AlwaysRefCounted> {
> >      ptr: NonNull<T>,
> >      _p: PhantomData<T>,
> > --
> > 2.45.2
> > 
> 




[Index of Archives]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux