On Thu, Nov 21, 2024 at 11:23:39AM +0100, Alice Ryhl wrote: > On Wed, Nov 20, 2024 at 8:07 PM Lorenzo Stoakes > <lorenzo.stoakes@xxxxxxxxxx> wrote: > > > > On Wed, Nov 20, 2024 at 02:49:56PM +0000, Alice Ryhl wrote: > > > This adds a type called VmAreaRef which is used when referencing a vma > > > that you have read access to. Here, read access means that you hold > > > either the mmap read lock or the vma read lock (or stronger). > > > > This is pure nit and not important but... > > > > I know we screwed up naming in mm, with the god awful vm_area_struct (we > > abbreviate, for 2 letters, get bored, stop abbreviating, but throw in a > > struct for a laugh) or 'VMA', but I wonder if this would be better as > > VmaRef? Then again that's kinda terrible too. > > > > Sorry about that. I mean this isn't hugely important + ultimately _our > > fault_. > > > > VirtualMemoryAreaRef is worse... VirtMemAreaRef? OK. Maybe VMAreaRef is the > > least bad... > > Happy to use whichever name you prefer. :) Let's leave it as-is, this is kinda our fault* in mm :) * pun intended > > > > Additionally, a vma_lookup method is added to the mmap read guard, which > > > enables you to obtain a &VmAreaRef in safe Rust code. > > > > > > Signed-off-by: Alice Ryhl <aliceryhl@xxxxxxxxxx> Other than trivial doc/comment tweaks this looks fine to me from mm perspective so: Acked-by: Lorenzo Stoakes <lorenzo.stoakes@xxxxxxxxxx> (for mm bits) > > > --- > > > rust/helpers/mm.c | 6 ++ > > > rust/kernel/mm.rs | 21 ++++++ > > > rust/kernel/mm/virt.rs | 171 +++++++++++++++++++++++++++++++++++++++++++++++++ > > > 3 files changed, 198 insertions(+) > > > > > > diff --git a/rust/helpers/mm.c b/rust/helpers/mm.c > > > index 7201747a5d31..7b72eb065a3e 100644 > > > --- a/rust/helpers/mm.c > > > +++ b/rust/helpers/mm.c > > > @@ -37,3 +37,9 @@ void rust_helper_mmap_read_unlock(struct mm_struct *mm) > > > { > > > mmap_read_unlock(mm); > > > } > > > + > > > +struct vm_area_struct *rust_helper_vma_lookup(struct mm_struct *mm, > > > + unsigned long addr) > > > +{ > > > + return vma_lookup(mm, addr); > > > +} > > > > I wonder whether we want to abstract some of the VMA iterator stuff, > > because it's inefficient to look up VMAs by doing this each time if you are > > looking at, for instance, adjacent VMAs. > > > > But perhaps reasonable to defer doing that to later work? > > Yeah, that should probably be deferred. Binder only has one vma per > mm, so iteration is not useful. Sure, no problem! > > > > diff --git a/rust/kernel/mm.rs b/rust/kernel/mm.rs > > > index 84cba581edaa..ace8e7d57afe 100644 > > > --- a/rust/kernel/mm.rs > > > +++ b/rust/kernel/mm.rs > > > @@ -12,6 +12,8 @@ > > > }; > > > use core::{ops::Deref, ptr::NonNull}; > > > > > > +pub mod virt; > > > + > > > /// A wrapper for the kernel's `struct mm_struct`. > > > /// > > > /// Since `mm_users` may be zero, the associated address space may not exist anymore. You can use > > > @@ -210,6 +212,25 @@ pub struct MmapReadGuard<'a> { > > > _nts: NotThreadSafe, > > > } > > > > > > +impl<'a> MmapReadGuard<'a> { > > > + /// Look up a vma at the given address. > > > + #[inline] > > > + pub fn vma_lookup(&self, vma_addr: usize) -> Option<&virt::VmAreaRef> { > > > > Kind of lovely to have functions under the read guard and then knowing that > > hey - we must hold the read lock to be able to do this. > > > > Imagine, a programming language with actual types... :P > > :) > > > > + // SAFETY: We hold a reference to the mm, so the pointer must be valid. Any value is okay > > > + // for `vma_addr`. > > > + let vma = unsafe { bindings::vma_lookup(self.mm.as_raw(), vma_addr as _) }; > > > > Why do we say 'as _' here? > > This is an integer cast where the target type of the cast is inferred > by the compiler. It will go away once this lands: > https://lore.kernel.org/rust-for-linux/20240913213041.395655-1-gary@xxxxxxxxxxx/ OK cool. > > > > + > > > + if vma.is_null() { > > > + None > > > + } else { > > > + // SAFETY: We just checked that a vma was found, so the pointer is valid. Furthermore, > > > + // the returned area will borrow from this read lock guard, so it can only be used > > > + // while the mmap read lock is still held. > > > > Lovely! > > > > > + unsafe { Some(virt::VmAreaRef::from_raw(vma)) } > > > + } > > > + } > > > +} > > > + > > > impl Drop for MmapReadGuard<'_> { > > > #[inline] > > > fn drop(&mut self) { > > > diff --git a/rust/kernel/mm/virt.rs b/rust/kernel/mm/virt.rs > > > new file mode 100644 > > > index 000000000000..1e755dca46dd > > > --- /dev/null > > > +++ b/rust/kernel/mm/virt.rs > > > @@ -0,0 +1,171 @@ > > > +// SPDX-License-Identifier: GPL-2.0 > > > + > > > +// Copyright (C) 2024 Google LLC. > > > + > > > +//! Virtual memory. > > > > Trivial rust q again but does this result in a heading in generated docs or > > something? > > Yes, this becomes module documentation. Check out: > https://rust.docs.kernel.org/kernel/ > or try out `make LLVM=1 rustdoc` Cool thanks, thought this might be the case! > > > > +use crate::{bindings, types::Opaque}; > > > + > > > +/// A wrapper for the kernel's `struct vm_area_struct` with read access. > > > +/// > > > +/// It represents an area of virtual memory. > > > +/// > > > +/// # Invariants > > > +/// > > > +/// The caller must hold the mmap read lock or the vma read lock. > > > > Might be worth mentioning here that you have yet to abstract the vma lock? > > I do that in the next patch. Yeah saw, one of those 'comment in patch X, addressed in patch X+1' scenarios :) > > > > +#[repr(transparent)] > > > +pub struct VmAreaRef { > > > + vma: Opaque<bindings::vm_area_struct>, > > > +} > > > + > > > +// Methods you can call when holding the mmap or vma read lock (or strong). They must be usable no > > > +// matter what the vma flags are. > > > > typo: 'or strong' -> 'or stronger'. > > > > Nitty, but perhaps say 'Methods must be usable' rather than 'they' to be > > totally clear. > > Will reword. Thanks! > > > > +impl VmAreaRef { > > > + /// Access a virtual memory area given a raw pointer. > > > + /// > > > + /// # Safety > > > + /// > > > + /// Callers must ensure that `vma` is valid for the duration of 'a, and that the mmap read lock > > > + /// (or stronger) is held for at least the duration of 'a. > > > > Well, VMA locks make this more complicated, in that we can just hold a VMA > > lock. But again, perhaps worth doing this incrementally and just talk about > > mmap locks to start with. > > > > However since we reference VMA locks elsewhere, we should reference them > > here (and probably worth mentioning that they are not yet abstracted). > > Oh I forgot to update this, it should say "mmap or vma read lock". Cool, one to tweak on respin also then. > > > > + #[inline] > > > + pub unsafe fn from_raw<'a>(vma: *const bindings::vm_area_struct) -> &'a Self { > > > + // SAFETY: The caller ensures that the invariants are satisfied for the duration of 'a. > > > + unsafe { &*vma.cast() } > > > + } > > > + > > > + /// Returns a raw pointer to this area. > > > + #[inline] > > > + pub fn as_ptr(&self) -> *mut bindings::vm_area_struct { > > > + self.vma.get() > > > + } > > > + > > > + /// Returns the flags associated with the virtual memory area. > > > + /// > > > + /// The possible flags are a combination of the constants in [`flags`]. > > > + #[inline] > > > + pub fn flags(&self) -> vm_flags_t { > > > + // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this > > > + // access is not a data race. > > > + unsafe { (*self.as_ptr()).__bindgen_anon_2.vm_flags as _ } > > > > Hm what's up with this __bindgen_anon_N stuff? > > Whenever you have a `struct { ... }` or `union { ... }` in the middle > of a struct, bindgen generates additional types for them because Rust > doesn't have a direct equivalent. OK I see. > > > > + } > > > + > > > + /// Returns the start address of the virtual memory area. > > > + #[inline] > > > + pub fn start(&self) -> usize { > > > > Is usize guranteed to be equivalent to unsigned long? > > They are guaranteed to have the same size, yes. > > But again, right now `unsigned long` is being translated to whichever > one of u32 or u64 has the same size as usize, instead of just being > translated to usize. Thus the annoying casts. We can get rid of them > as soon as > https://lore.kernel.org/rust-for-linux/20240913213041.395655-1-gary@xxxxxxxxxxx/ > lands. Ack cool. > > > > + // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this > > > + // access is not a data race. > > > + unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_start as _ } > > > + } > > > + > > > + /// Returns the end address of the virtual memory area. > > > > Worth mentioning that it's an _exclusive_ end. > > Sure, I'll add that. Thanks > > > > + #[inline] > > > + pub fn end(&self) -> usize { > > > + // SAFETY: By the type invariants, the caller holds at least the mmap read lock, so this > > > + // access is not a data race. > > > + unsafe { (*self.as_ptr()).__bindgen_anon_1.__bindgen_anon_1.vm_end as _ } > > > + } > > > + > > > + /// Unmap pages in the given page range. > > > > This needs some more description, as 'unmapping' pages is unfortunately an > > overloaded term in the kernel and this very much might confuse people as > > opposed to e.g. munmap()'ing a range. > > > > I'd say something like 'clear page table mappings for the range at the leaf > > level, leaving all other page tables intact, freeing any memory referenced > > by the VMA in this range (anonymous memory is completely freed, file-backed > > memory has its reference count on page cache folio's dropped, any dirty > > data will still be written back to disk as usual)'. > > Sure, will add that to the docs. Thanks, I assume you mean this comment, which will form part of the docs? As here we should at least replace the 'unmap' with 'zap' to avoid confusion vs. munmap(). > > > > + #[inline] > > > + pub fn zap_page_range_single(&self, address: usize, size: usize) { > > > + // SAFETY: By the type invariants, the caller has read access to this VMA, which is > > > + // sufficient for this method call. This method has no requirements on the vma flags. Any > > > + // value of `address` and `size` is allowed. > > > + unsafe { > > > + bindings::zap_page_range_single( > > > > Hm weirdly I see this in rust/bindings/bindings_generated.rs but not in > > rust/helpers/mm.c is this intended? > > > > Is this meant to be generated _from_ somewhere? Is something missing for > > that? > > The bindings_generated.rs file is generated at built-time from C > headers. The zap_page_range_single is a real function, not a fake > static inline header-only function, so bindgen is able to generate it > without anything in rust/helpers. > > > > + self.as_ptr(), > > > + address as _, > > > + size as _, > > > + core::ptr::null_mut(), > > > + ) > > > + }; > > > + } > > > +} > > > + > > > +/// The integer type used for vma flags. > > > +#[doc(inline)] > > > +pub use bindings::vm_flags_t; > > > > Where do you declare this type? > > It's declared in include/linux/mm_types.h I meant from a rust perspective, but I guess bindgen handles this? > > > > + > > > +/// All possible flags for [`VmAreaRef`]. > > > > Well you've not specified all as they're some speciailist ones and ones > > that depend on config flags, so maybe worth just saying 'core' flags? > > Sure. Thanks. > > > > +pub mod flags { > > > > Pure rust noobie question (again...) but what is a 'mod'? > > It's a module. Ack. > > > > + use super::vm_flags_t; > > > + use crate::bindings; > > > + > > > + /// No flags are set. > > > + pub const NONE: vm_flags_t = bindings::VM_NONE as _; > > > > This is strictly not a flag, as is the 0 value (and is used 'cleverly' in > > kernel code when we have flags that are defined under some circumstances > > but not others, where we can then assign these values to VM_NONE if not > > available, ensuring all |= ... operations are no-ops and all & > > ... expressions evaluate to false) but I guess it doesn't matter all too > > much right? > > Ultimately it's just a bunch of integer constants. We can include or > exclude whichever ones we prefer. Yeah not a big deal. > > > > + /// Mapping allows reads. > > > + pub const READ: vm_flags_t = bindings::VM_READ as _; > > > + > > > + /// Mapping allows writes. > > > + pub const WRITE: vm_flags_t = bindings::VM_WRITE as _; > > > + > > > + /// Mapping allows execution. > > > + pub const EXEC: vm_flags_t = bindings::VM_EXEC as _; > > > + > > > + /// Mapping is shared. > > > + pub const SHARED: vm_flags_t = bindings::VM_SHARED as _; > > > + > > > + /// Mapping may be updated to allow reads. > > > + pub const MAYREAD: vm_flags_t = bindings::VM_MAYREAD as _; > > > + > > > + /// Mapping may be updated to allow writes. > > > + pub const MAYWRITE: vm_flags_t = bindings::VM_MAYWRITE as _; > > > + > > > + /// Mapping may be updated to allow execution. > > > + pub const MAYEXEC: vm_flags_t = bindings::VM_MAYEXEC as _; > > > + > > > + /// Mapping may be updated to be shared. > > > + pub const MAYSHARE: vm_flags_t = bindings::VM_MAYSHARE as _; > > > + > > > + /// Page-ranges managed without `struct page`, just pure PFN. > > > + pub const PFNMAP: vm_flags_t = bindings::VM_PFNMAP as _; > > > + > > > + /// Memory mapped I/O or similar. > > > + pub const IO: vm_flags_t = bindings::VM_IO as _; > > > + > > > + /// Do not copy this vma on fork. > > > + pub const DONTCOPY: vm_flags_t = bindings::VM_DONTCOPY as _; > > > + > > > + /// Cannot expand with mremap(). > > > + pub const DONTEXPAND: vm_flags_t = bindings::VM_DONTEXPAND as _; > > > + > > > + /// Lock the pages covered when they are faulted in. > > > + pub const LOCKONFAULT: vm_flags_t = bindings::VM_LOCKONFAULT as _; > > > + > > > + /// Is a VM accounted object. > > > + pub const ACCOUNT: vm_flags_t = bindings::VM_ACCOUNT as _; > > > + > > > + /// should the VM suppress accounting. > > > + pub const NORESERVE: vm_flags_t = bindings::VM_NORESERVE as _; > > > + > > > + /// Huge TLB Page VM. > > > + pub const HUGETLB: vm_flags_t = bindings::VM_HUGETLB as _; > > > + > > > + /// Synchronous page faults. > > > > Might be worth mentioning that this is DAX-specific only. > > Will add. Ack. > > > > + pub const SYNC: vm_flags_t = bindings::VM_SYNC as _; > > > + > > > + /// Architecture-specific flag. > > > + pub const ARCH_1: vm_flags_t = bindings::VM_ARCH_1 as _; > > > + > > > + /// Wipe VMA contents in child.. > > > > Typo '..' - also probably worth saying 'wipe VMA contents in child on > > fork'. > > Will add. Ack, thanks. > > > > + pub const WIPEONFORK: vm_flags_t = bindings::VM_WIPEONFORK as _; > > > + > > > + /// Do not include in the core dump. > > > + pub const DONTDUMP: vm_flags_t = bindings::VM_DONTDUMP as _; > > > + > > > + /// Not soft dirty clean area. > > > + pub const SOFTDIRTY: vm_flags_t = bindings::VM_SOFTDIRTY as _; > > > + > > > + /// Can contain `struct page` and pure PFN pages. > > > + pub const MIXEDMAP: vm_flags_t = bindings::VM_MIXEDMAP as _; > > > + > > > + /// MADV_HUGEPAGE marked this vma. > > > + pub const HUGEPAGE: vm_flags_t = bindings::VM_HUGEPAGE as _; > > > + > > > + /// MADV_NOHUGEPAGE marked this vma. > > > + pub const NOHUGEPAGE: vm_flags_t = bindings::VM_NOHUGEPAGE as _; > > > + > > > + /// KSM may merge identical pages. > > > + pub const MERGEABLE: vm_flags_t = bindings::VM_MERGEABLE as _; > > > +} > > > > > > > I'd say these comments are a bit sparse and need more detail, but they are > > literally comments from include/linux/mm.h and therefore, again, our fault > > so this is fine :) > > Happy to add more if you tell me what you'd like to see. ;) Sure, this is fine for now. > > Alice