Re: [RFC PATCH 09/19] rust: folio: introduce basic support for folios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 20 Oct 2023 at 12:17, Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>
> On Fri, Oct 20, 2023 at 05:11:38AM +0100, Matthew Wilcox wrote:
> > > Sure, it's safe to map a folio in general, but Rust has stricter rules
> > > about aliasing and mutability that are part of how memory safety is
> > > achieved. In particular, it requires that we never have mutable and
> > > immutable pointers to the same memory at once (modulo interior
> > > mutability).
> > >
> > > So we need to avoid something like:
> > >
> > > let a = folio.map(); // `a` is a shared pointer to the contents of the folio.
> > >
> > > // While we have a shared (and therefore immutable) pointer, we're
> > > changing the contents of the folio.
> > > sb.sread(sector_number, sector_count, folio);
> > >
> > > This violates Rust rules. `UniqueFolio` helps us address this for our
> > > use case; if we try the above with a UniqueFolio, the compiler will
> > > error out saying that  `a` has a shared reference to the folio, so we
> > > can't call `sread` on it (because sread requires a mutable, and
> > > therefore not shareable, reference to the folio).
> >
> > This is going to be quite the impedance mismatch.  Still, I imagine
> > you're used to dealing with those by now and have a toolbox of ideas.
> >
> > We don't have that rule for the pagecache as it is.  We do have rules that
> > prevent data corruption!  For example, if the folio is !uptodate then you
> > must have the lock to write to the folio in order to bring it uptodate
> > (so we have a single writer rule in that regard).  But once the folio is
> > uptodate, all bets are off in terms of who can be writing to it / reading
> > it at the same time.  And that's going to have to continue to be true;
> > multiple processes can have the same page mmaped writable and write to
> > it at the same time.  There's no possible synchronisation between them.
> >
> > But I think your concern is really more limited.  You're concerned
> > with filesystem metadata obeying Rust's rules.  And for a read-write
> > filesystem, you're going to have to have ... something ... which gets a
> > folio from the page cache, and establishes that this is the only thread
> > which can modify that folio (maybe it's an interior node of a Btree,
> > maybe it's a free space bitmap, ...).  We could probably use the folio
> > lock bit for that purpose,  For the read-only filesystems, you only need
> > be concerned about freshly-allocated folios, but you need something more
> > when it comes to doing an ext2 clone.
> >
> > There's general concern about the overuse of the folio lock bit, but
> > this is a reasonable use -- preventing two threads from modifying the
> > same folio at the same time.
>
> Sorry, I didn't quite finish this thought; that'll teach me to write
> complicated emails last thing at night.
>
> The page cache has no single-writer vs multiple-reader exclusion on folios
> found in the page cache.  We expect filesystems to implement whatever
> exclusion they need at a higher level.  For example, ext2 has no higher
> level lock on its block allocator.  Once the buffer is uptodate (ie has
> been read from storage), it uses atomic bit operations in order to track
> which blocks are freed.  It does use a spinlock to control access to
> "how many blocks are currently free".
>
> I'm not suggesting ext2 is an optimal strategy.  I know XFS and btrfs
> use rwsems, although I'm not familiar enough with either to describe
> exactly how it works.

Thanks for this explanation, it's good to see this kind of high-level
decisions/directions spelled out. When we need writable file systems,
we'll encode this in the type system.




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux