On Fri, 20 Oct 2023 at 12:17, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote: > > On Fri, Oct 20, 2023 at 05:11:38AM +0100, Matthew Wilcox wrote: > > > Sure, it's safe to map a folio in general, but Rust has stricter rules > > > about aliasing and mutability that are part of how memory safety is > > > achieved. In particular, it requires that we never have mutable and > > > immutable pointers to the same memory at once (modulo interior > > > mutability). > > > > > > So we need to avoid something like: > > > > > > let a = folio.map(); // `a` is a shared pointer to the contents of the folio. > > > > > > // While we have a shared (and therefore immutable) pointer, we're > > > changing the contents of the folio. > > > sb.sread(sector_number, sector_count, folio); > > > > > > This violates Rust rules. `UniqueFolio` helps us address this for our > > > use case; if we try the above with a UniqueFolio, the compiler will > > > error out saying that `a` has a shared reference to the folio, so we > > > can't call `sread` on it (because sread requires a mutable, and > > > therefore not shareable, reference to the folio). > > > > This is going to be quite the impedance mismatch. Still, I imagine > > you're used to dealing with those by now and have a toolbox of ideas. > > > > We don't have that rule for the pagecache as it is. We do have rules that > > prevent data corruption! For example, if the folio is !uptodate then you > > must have the lock to write to the folio in order to bring it uptodate > > (so we have a single writer rule in that regard). But once the folio is > > uptodate, all bets are off in terms of who can be writing to it / reading > > it at the same time. And that's going to have to continue to be true; > > multiple processes can have the same page mmaped writable and write to > > it at the same time. There's no possible synchronisation between them. > > > > But I think your concern is really more limited. You're concerned > > with filesystem metadata obeying Rust's rules. And for a read-write > > filesystem, you're going to have to have ... something ... which gets a > > folio from the page cache, and establishes that this is the only thread > > which can modify that folio (maybe it's an interior node of a Btree, > > maybe it's a free space bitmap, ...). We could probably use the folio > > lock bit for that purpose, For the read-only filesystems, you only need > > be concerned about freshly-allocated folios, but you need something more > > when it comes to doing an ext2 clone. > > > > There's general concern about the overuse of the folio lock bit, but > > this is a reasonable use -- preventing two threads from modifying the > > same folio at the same time. > > Sorry, I didn't quite finish this thought; that'll teach me to write > complicated emails last thing at night. > > The page cache has no single-writer vs multiple-reader exclusion on folios > found in the page cache. We expect filesystems to implement whatever > exclusion they need at a higher level. For example, ext2 has no higher > level lock on its block allocator. Once the buffer is uptodate (ie has > been read from storage), it uses atomic bit operations in order to track > which blocks are freed. It does use a spinlock to control access to > "how many blocks are currently free". > > I'm not suggesting ext2 is an optimal strategy. I know XFS and btrfs > use rwsems, although I'm not familiar enough with either to describe > exactly how it works. Thanks for this explanation, it's good to see this kind of high-level decisions/directions spelled out. When we need writable file systems, we'll encode this in the type system.