On Thu, Mar 21, 2024 at 3:11 PM Alice Ryhl <aliceryhl@xxxxxxxxxx> wrote: > > On Thu, Mar 21, 2024 at 2:56 PM Benno Lossin <benno.lossin@xxxxxxxxx> wrote: > > > > On 3/21/24 14:42, Alice Ryhl wrote: > > > On Thu, Mar 21, 2024 at 2:16 PM Benno Lossin <benno.lossin@xxxxxxxxx> wrote: > > >> > > >> On 3/20/24 09:46, Alice Ryhl wrote: > > >>>> On 3/11/24 11:47, Alice Ryhl wrote: > > >>>>> +/// A pointer to a page that owns the page allocation. > > >>>>> +/// > > >>>>> +/// # Invariants > > >>>>> +/// > > >>>>> +/// The pointer points at a page, and has ownership over the page. > > >>>> > > >>>> Why not "`page` is valid"? > > >>>> Do you mean by ownership of the page that `page` has ownership of the > > >>>> allocation, or does that entail any other property/privilege? > > >>> > > >>> I can add "at a valid page". > > >> > > >> I don't think that helps, what you need as an invariant is that the > > >> pointer is valid. > > > > > > To me "points at a page" implies that the pointer is valid. I mean, if > > > it was dangling, it would not point at a page? > > > > > > But I can reword to something else if you have a preferred phrasing. > > > > I would just say "`page` is valid" or "`self.page` is valid". > > > > >>>>> + /// Runs a piece of code with this page mapped to an address. > > >>>>> + /// > > >>>>> + /// The page is unmapped when this call returns. > > >>>>> + /// > > >>>>> + /// It is up to the caller to use the provided raw pointer correctly. > > >>>> > > >>>> This says nothing about what 'correctly' means. What I gathered from the > > >>>> implementation is that the supplied pointer is valid for the execution > > >>>> of `f` for `PAGE_SIZE` bytes. > > >>>> What other things are you allowed to rely upon? > > >>>> > > >>>> Is it really OK for this function to be called from multiple threads? > > >>>> Could that not result in the same page being mapped multiple times? If > > >>>> that is fine, what about potential data races when two threads write to > > >>>> the pointer given to `f`? > > >>>> > > >>>>> + pub fn with_page_mapped<T>(&self, f: impl FnOnce(*mut u8) -> T) -> T { > > >>> > > >>> I will say: > > >>> > > >>> /// It is up to the caller to use the provided raw pointer correctly. > > >>> /// The pointer is valid for `PAGE_SIZE` bytes and for the duration in > > >>> /// which the closure is called. Depending on the gfp flags and kernel > > >>> /// configuration, the pointer may only be mapped on the current thread, > > >>> /// and in those cases, dereferencing it on other threads is UB. Other > > >>> /// than that, the usual rules for dereferencing a raw pointer apply. > > >>> /// (E.g., don't cause data races, the memory may be uninitialized, and > > >>> /// so on.) > > >> > > >> I would simplify and drop "depending on the gfp flags and kernel..." and > > >> just say that the pointer is only valid on the current thread. > > > > > > Sure, that works for me. > > > > > >> Also would it make sense to make the pointer type *mut [u8; PAGE_SIZE]? > > > > > > I think it's a trade-off. That makes the code more error-prone, since > > > `pointer::add` now doesn't move by a number of bytes, but a number of > > > pages. > > > > Yeah. As long as you document that the pointer is valid for r/w with > > offsets in `0..PAGE_SIZE` bytes, leaving the type as is, is fine by me. > > > > > > >>> It's okay to map it multiple times from different threads. > > >> > > >> Do you still need to take care of data races? > > >> So would it be fine to execute this code on two threads in parallel? > > >> > > >> static PAGE: Page = ...; // assume we have a page accessible by both threads > > >> > > >> PAGE.with_page_mapped(|ptr| { > > >> loop { > > >> unsafe { ptr.write(0) }; > > >> pr_info!("{}", unsafe { ptr.read() }); > > >> } > > >> }); > > > > > > Like I said, the usual pointer rules apply. Two threads can access it > > > in parallel as long as one of the following are satisfied: > > > > > > * Both accesses are reads. > > > * Both accesses are atomic. > > > * They access disjoint byte ranges. > > > > > > Other than the fact that it uses a thread-local mapping on machines > > > that can't address all of their memory at the same time, it's > > > completely normal memory. It's literally just a PAGE_SIZE-aligned > > > allocation of PAGE_SIZE bytes. > > > > Thanks for the info, what do you think of this?: > > > > /// It is up to the caller to use the provided raw pointer correctly. The pointer is valid for reads > > /// and writes for `PAGE_SIZE` bytes and for the duration in which the closure is called. The > > /// pointer must only be used on the current thread. The caller must also ensure that no data races > > /// occur: when mapping the same page on two threads accesses to memory with the same offset must be > > /// synchronized. > > I would much rather phrase it in terms of "the usual pointer" rules. I > mean, the memory could also be uninitialized if you don't pass > __GFP_ZERO when you create it, so you also have to make sure to follow > the rules about uninitialized memory. I don't want to be in the > business of listing all requirements for accessing memory here. > > > >> If this is not allowed, I don't really like the API. As a raw version it > > >> would be fine, but I think we should have a safer version (eg by taking > > >> `&mut self`). > > > > > > I don't understand what you mean. It is the *most* raw API that `Page` > > > has. I can make them private if you want me to. The API cannot take > > > `&mut self` because I need to be able to unsafely perform concurrent > > > writes to disjoint byte ranges. > > > > If you don't need these functions to be public, I think we should > > definitely make them private. > > Also we could add a `raw` suffix to the functions to make it clear that > > it is a primitive API. If you think that it is highly unlikely that we > > get a safer version, then I don't think there is value in adding the > > suffix. > > The old code on the Rust branch didn't have these functions, but > that's because the old `read_raw` and `write_raw` methods did all of > these things directly in their implementation: > > * Map the memory so we can get a pointer. > * Get a pointer to a subslice (with bounds checks!) > * Do the actual read/write. > > I thought that doing this many things in a single function was > convoluted, so I decided to refactor the code by extracting the "get a > pointer to the page" logic into `with_page_mapped` and the "point to > subslice with bounds check" logic into `with_pointer_into_page`. That > way, each function has only one responsibility, instead of mixing > three responsibilities into one. > > So even if we get a safer version, I would not want to get rid of this > method. I don't want to inline its implementation into more > complicated functions. The safer method would call the raw method, and > then do whatever additional logic it wants to do on top of that. Adding to this: To me, we *do* already have safer versions of this method. Those are the read_raw and write_raw and fill_zero and copy_from_user_slice methods. Alice