On 2/4/25 7:26 PM, David Hildenbrand wrote: > On 03.02.25 22:05, Zi Yan wrote: >> On 3 Feb 2025, at 9:32, Asahi Lina wrote: >> >>> On 2/3/25 6:58 PM, Simona Vetter wrote: >>>> On Sun, Feb 02, 2025 at 10:05:42PM +0900, Asahi Lina wrote: >>>>> This series refactors the existing Page wrapper to support borrowing >>>>> `struct page` objects without ownership on the Rust side, and >>>>> converting >>>>> page references to/from physical memory addresses. >>>>> >>>>> The series overlaps with the earlier submission in [1] and follows a >>>>> different approach, based on the discussion that happened there. >>>>> >>>>> The primary use case for this is implementing IOMMU-style page table >>>>> management in Rust. This allows drivers for IOMMUs and MMU-containing >>>>> SoC devices to be written in Rust (such as embedded GPUs). The >>>>> intended >>>>> logic is similar to how ARM SMMU page tables are managed in the >>>>> drivers/iommu tree. >>>>> >>>>> First, introduce a concept of Owned<T> and an Ownable trait. These are >>>>> similar to ARef<T> and AlwaysRefCounted, but are used for types which >>>>> are not ref counted but rather have a single intended owner. >>>>> >>>>> Then, refactor the existing Page support to use the new mechanism. >>>>> Pages >>>>> returned from the page allocator are not intended to be ref counted by >>>>> consumers (see previous discussion in [1]), so this keeps Rust's >>>>> view of >>>>> page ownership as a simple "owned or not". Of course, this is still >>>>> composable as Arc<Owned<Page>> if Rust code needs to reference >>>>> count its >>>>> own Page allocations for whatever reason. >>>> >>>> I think there's a bit a potential mess here because the conversion to >>>> folios isn't far enough yet that we can entirely ignore page >>>> refcounts and >>>> just use folio refcounts. But I guess we can deal with that oddity >>>> if we >>>> hit it (maybe folio conversion moves fast enough), since this only >>>> really >>>> starts to become relevant for hmm/svm gpu stuff. >>>> >>>> iow I think anticipating the future where struct page really doesn't >>>> have >>>> a refcount is the right move. Aside from that it's really not a >>>> refcount >>>> that works in the rust ARef sense, since struct page cannot >>>> disappear for >>>> system memory, and for dev_pagemap memory it's an entirely different >>>> reference you need (and then there's a few more special cases). >>> >>> Right, as far as this abstraction is concerned, all that needs to hold >>> is that: >>> >>> - alloc_pages() and __free_pages() work as intended, however that may >>> be, to reserve and return one page (for now, though I think extending >>> the Rust abstraction to handle higher-order folios is pretty easy, but >>> that can happen later). >>> - Whatever borrows pages knows what it's doing. In this case there's >>> only support for borrowing pages by physaddr, and it's only going to be >>> used in a driver for a platform without memory hot remove (so far) and >>> only for pages which have known usage (in principle) and are either >>> explicitly allocated or known pinned or reserved, so it's not a problem >>> right now. Future abstractions that return borrowed pages can do their >>> own locking/bookkeeping/whatever is necessary to keep it safe. >>> >>> I would like to hear how memory hot-remove is supposed to work though, >>> to see if we should be doing something to make the abstraction safer >>> (though it's still unsafe and always will be). Is there a chance a >>> `struct page` could vanish out from under us under some conditions? >> >> Add DavidH and OscarS for memory hot-remove questions. >> >> IIUC, struct page could be freed if a chunk of memory is hot-removed. > > Right, but only after there are no users anymore (IOW, memory was freed > back to the buddy). PFN walkers might still stumble over them, but I > would not expect (or recommend) rust to do that. The physaddr to page function does look up pages by pfn, but it's intended to be used by drivers that know what they're doing. There are two variants of the API, one that is completely unchecked (a fast path for cases where the driver definitely allocated these pages itself, for example just grabbing the `struct page` back from a decoded PTE it wrote), and one that has this check: pfn_valid(pfn) && page_is_ram(pfn) Which is intended as a safety net to allow drivers to look up firmware-reserved pages too, and fail gracefully if the kernel doesn't know about them (because they weren't declared in the bootloader/firmware memory map correctly) or doesn't have them mapped in the direct map (because they were declared no-map). Is there anything else that can reasonably be done here to make the API safer to call on an arbitrary pfn? If the answer is "no" then that's fine. It's still an unsafe function and we need to document in the safety section that it should only be used for memory that is either known to be allocated and pinned and will not be freed while the `struct page` is borrowed, or memory that is reserved and not owned by the buddy allocator, so in practice correct use would not be racy with memory hot-remove anyway. This is already the case for the drm/asahi use case, where the pfns looked up will only ever be one of: - GEM objects that are mapped to the GPU and whose physical pages are therefore pinned (and the VM is locked while this happens so the objects cannot become unpinned out from under the running code), - Raw pages allocated from the page allocator for use as GPU page tables, - System memory that is marked reserved by the firmware/bootloader, - (Potentially) invalid PFNs that aren't part of the System RAM region at all and don't have a struct page to begin with, which we check for, so the API returns an error. This would only happen if the bootloader didn't declare some used firmware ranges at all, so Linux doesn't know about them. > >> >> Another case struct page can be freed is when hugetlb vmemmap >> optimization >> is used. Muchun (cc'd) is the maintainer of hugetlbfs. > > Here, the "struct page" remains valid though; it can still be accessed, > although we disallow writes (which would be wrong). > > If you only allocate a page and free it later, there is no need to worry > about either on the rust side. This is what the safe API does. (Also the unsafe physaddr APIs if all you ever do is convert an allocated page to a physaddr and back, which is the only thing the GPU page table code does during normal use. The walking leaf PFNs story is only for GPU device coredumps when the firmware crashes.) ~~ Lina