Re: [PATCH v6 4/4] rust: add abstraction for `struct page`

Boqun Feng <boqun.feng@xxxxxxxxx> · Thu, 18 Apr 2024 11:52:56 -0700

On Thu, Apr 18, 2024 at 08:59:20AM +0000, Alice Ryhl wrote:
> Adds a new struct called `Page` that wraps a pointer to `struct page`.
> This struct is assumed to hold ownership over the page, so that Rust
> code can allocate and manage pages directly.
> 
> The page type has various methods for reading and writing into the page.
> These methods will temporarily map the page to allow the operation. All
> of these methods use a helper that takes an offset and length, performs
> bounds checks, and returns a pointer to the given offset in the page.
> 
> This patch only adds support for pages of order zero, as that is all
> Rust Binder needs. However, it is written to make it easy to add support
> for higher-order pages in the future. To do that, you would add a const
> generic parameter to `Page` that specifies the order. Most of the
> methods do not need to be adjusted, as the logic for dealing with
> mapping multiple pages at once can be isolated to just the
> `with_pointer_into_page` method.
> 

Thank you for doing this, and breaking the chicken-and-egg problem chain
;-) For sure, the whole package of page API would need more time to
design, implement and review, but this patch looks good enough to me.

> Rust Binder needs to manage pages directly as that is how transactions
> are delivered: Each process has an mmap'd region for incoming
> transactions. When an incoming transaction arrives, the Binder driver
> will choose a region in the mmap, allocate and map the relevant pages
> manually, and copy the incoming transaction directly into the page. This
> architecture allows the driver to copy transactions directly from the
> address space of one process to another, without an intermediate copy
> to a kernel buffer.
> 
> This code is based on Wedson's page abstractions from the old rust
> branch, but it has been modified by Alice by removing the incomplete
> support for higher-order pages, by introducing the `with_*` helpers
> to consolidate the bounds checking logic into a single place, and
> various other changes.
> 
> Co-developed-by: Wedson Almeida Filho <wedsonaf@xxxxxxxxx>
> Signed-off-by: Wedson Almeida Filho <wedsonaf@xxxxxxxxx>
> Reviewed-by: Andreas Hindborg <a.hindborg@xxxxxxxxxxx>
> Reviewed-by: Trevor Gross <tmgross@xxxxxxxxx>
> Reviewed-by: Benno Lossin <benno.lossin@xxxxxxxxx>
> Signed-off-by: Alice Ryhl <aliceryhl@xxxxxxxxxx>

Reviewed-by: Boqun Feng <boqun.feng@xxxxxxxxx>

Something I want to bring up for discussion below:

[...]

> +    /// Runs a piece of code with a raw pointer to a slice of this page, with bounds checking.
> +    ///
> +    /// If `f` is called, then it will be called with a pointer that points at `off` bytes into the
> +    /// page, and the pointer will be valid for at least `len` bytes. The pointer is only valid on
> +    /// this task, as this method uses a local mapping.
> +    ///
> +    /// If `off` and `len` refers to a region outside of this page, then this method returns
> +    /// `EINVAL` and does not call `f`.
> +    ///
> +    /// # Using the raw pointer
> +    ///
> +    /// It is up to the caller to use the provided raw pointer correctly. The pointer is valid for
> +    /// `len` bytes and for the duration in which the closure is called. The pointer might only be
> +    /// mapped on the current thread, and when that is the case, dereferencing it on other threads
> +    /// is UB. Other than that, the usual rules for dereferencing a raw pointer apply: don't cause
> +    /// data races, the memory may be uninitialized, and so on.
> +    ///
> +    /// If multiple threads map the same page at the same time, then they may reference with
> +    /// different addresses. However, even if the addresses are different, the underlying memory is
> +    /// still the same for these purposes (e.g., it's still a data race if they both write to the
> +    /// same underlying byte at the same time).
> +    fn with_pointer_into_page<T>(
> +        &self,
> +        off: usize,
> +        len: usize,
> +        f: impl FnOnce(*mut u8) -> Result<T>,

I wonder whether the way to go here is making this function signature:

    fn with_slice_in_page<T> (
        &self,
	off: usize,
	len: usize,
	f: iml FnOnce(&UnsafeCell<[u8]>) -> Result<T>
    ) -> Result<T>

, because in this way, it makes a bit more clear that what memory that
`f` can access, in other words, the users are less likely to use the
pointer in a wrong way.

But that depends on whether `&UnsafeCell<[u8]>` is the correct
abstraction and the ecosystem around it: for example, I feel like these
two functions:

	fn len(slice: &UnsafeCell<[u8]>) -> usize
	fn as_ptr(slice: &UnsafeCell<[u8]>) -> *mut u8

should be trivially safe, but I might be wrong. Again this is just for
future discussion.

Regards,
Boqun

> +    ) -> Result<T> {
> +        let bounds_ok = off <= PAGE_SIZE && len <= PAGE_SIZE && (off + len) <= PAGE_SIZE;
> +
> +        if bounds_ok {
> +            self.with_page_mapped(move |page_addr| {
> +                // SAFETY: The `off` integer is at most `PAGE_SIZE`, so this pointer offset will
> +                // result in a pointer that is in bounds or one off the end of the page.
> +                f(unsafe { page_addr.add(off) })
> +            })
> +        } else {
> +            Err(EINVAL)
> +        }
> +    }
> +
[...]
> 
> -- 
> 2.44.0.683.g7961c838ac-goog
>