On Mon, Sep 18, 2023 at 12:14:48AM +0100, Matthew Wilcox wrote: > On Mon, Sep 18, 2023 at 08:38:37AM +1000, Dave Chinner wrote: > > On Fri, Sep 15, 2023 at 02:32:44PM -0700, Luis Chamberlain wrote: > > > LBS devices. This in turn allows filesystems which support bs > 4k to be > > > enabled on a 4k PAGE_SIZE world on LBS block devices. This alows LBS > > > device then to take advantage of the recenlty posted work today to enable > > > LBS support for filesystems [0]. > > > > Why do we need LBS devices to support bs > ps in XFS? > > It's the other way round -- we need the support in the page cache to > reject sub-block-size folios (which is in the other patches) before we > can sensibly talk about enabling any filesystems on top of LBS devices. > > Even XFS, or for that matter ext2 which support 16k block sizes on > CONFIG_PAGE_SIZE_16K (or 64K) kernels need that support first. Well, yes, I know that. But the statement above implies that we can't use bs > ps filesytems without LBS support on 4kB PAGE_SIZE systems. If it's meant to mean the exact opposite, then it is extremely poorly worded.... > > > There might be a better way to do this than do deal with the switching > > > of the aops dynamically, ideas welcomed! > > > > Is it even safe to switch aops dynamically? We know there are > > inherent race conditions in doing this w.r.t. mmap and page faults, > > as the write fault part of the processing is directly dependent > > on the page being correctly initialised during the initial > > population of the page data (the "read fault" side of the write > > fault). > > > > Hence it's not generally considered safe to change aops from one > > mechanism to another dynamically. Block devices can be mmap()d, but > > I don't see anything in this patch set that ensures there are no > > other users of the block device when the swaps are done. What am I > > missing? > > We need to evict all pages from the page cache before switching aops to > prevent misinterpretation of folio->private. Yes, but if the device is mapped, even after an invalidation, we can still race with a new fault instantiating a page whilst the aops are being swapped, right? That was the problem that sunk dynamic swapping of the aops when turning DAX on and off on an inode, right? > If switching aops is even > the right thing to do. I don't see the problem with allowing buffer heads > on block devices, but I haven't been involved with the discussion here. iomap supports bufferheads as a transitional thing (e.g. for gfs2). Hence I suspect that a better solution is to always use iomap and the same aops, but just switch from iomap page state to buffer heads in the bdev mapping interface via a synchronised invalidation + setting/clearing IOMAP_F_BUFFER_HEAD in all new mapping requests... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx