Re: [RFC v2 00/10] bdev: LBS devices support to coexist with buffer-heads

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 18 Sep 2023 10:59:07 +1000

On Mon, Sep 18, 2023 at 12:14:48AM +0100, Matthew Wilcox wrote:
> On Mon, Sep 18, 2023 at 08:38:37AM +1000, Dave Chinner wrote:
> > On Fri, Sep 15, 2023 at 02:32:44PM -0700, Luis Chamberlain wrote:
> > > LBS devices. This in turn allows filesystems which support bs > 4k to be
> > > enabled on a 4k PAGE_SIZE world on LBS block devices. This alows LBS
> > > device then to take advantage of the recenlty posted work today to enable
> > > LBS support for filesystems [0].
> > 
> > Why do we need LBS devices to support bs > ps in XFS?
> 
> It's the other way round -- we need the support in the page cache to
> reject sub-block-size folios (which is in the other patches) before we
> can sensibly talk about enabling any filesystems on top of LBS devices.
>
> Even XFS, or for that matter ext2 which support 16k block sizes on
> CONFIG_PAGE_SIZE_16K (or 64K) kernels need that support first.

Well, yes, I know that. But the statement above implies that we
can't use bs > ps filesytems without LBS support on 4kB PAGE_SIZE
systems. If it's meant to mean the exact opposite, then it is
extremely poorly worded....

> > > There might be a better way to do this than do deal with the switching
> > > of the aops dynamically, ideas welcomed!
> > 
> > Is it even safe to switch aops dynamically? We know there are
> > inherent race conditions in doing this w.r.t. mmap and page faults,
> > as the write fault part of the processing is directly dependent
> > on the page being correctly initialised during the initial
> > population of the page data (the "read fault" side of the write
> > fault).
> > 
> > Hence it's not generally considered safe to change aops from one
> > mechanism to another dynamically. Block devices can be mmap()d, but
> > I don't see anything in this patch set that ensures there are no
> > other users of the block device when the swaps are done. What am I
> > missing?
> 
> We need to evict all pages from the page cache before switching aops to
> prevent misinterpretation of folio->private. 

Yes, but if the device is mapped, even after an invalidation, we can
still race with a new fault instantiating a page whilst the aops are
being swapped, right? That was the problem that sunk dynamic
swapping of the aops when turning DAX on and off on an inode, right?

> If switching aops is even
> the right thing to do.  I don't see the problem with allowing buffer heads
> on block devices, but I haven't been involved with the discussion here.

iomap supports bufferheads as a transitional thing (e.g. for gfs2).

Hence I suspect that a better solution is to always use iomap and
the same aops, but just switch from iomap page state to buffer heads
in the bdev mapping interface via a synchronised invalidation +
setting/clearing IOMAP_F_BUFFER_HEAD in all new mapping requests...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx