On Fri, Sep 15, 2023 at 02:32:44PM -0700, Luis Chamberlain wrote: > Christoph added CONFIG_BUFFER_HEAD on v6.1 enabling a world where we can > live without buffer-heads. When we opt into that world we end up also > using the address space operations of the block device cache using > iomap. Since iomap supports higher order folios it means then that block > devices which do use the iomap aops can end up having a logical block > size or physical block size greater than PAGE_SIZE. We refer to these as > LBS devices. This in turn allows filesystems which support bs > 4k to be > enabled on a 4k PAGE_SIZE world on LBS block devices. This alows LBS > device then to take advantage of the recenlty posted work today to enable > LBS support for filesystems [0]. Why do we need LBS devices to support bs > ps in XFS? As long as the filesystem block size is >= logical sector size of the device, XFS just doesn't care what the "block size" of the underlying block device is. i.e. XFS will allow minimum IO sizes of the logical sector size of the device for select metadata and sub-block direct IO, but otherwise all other IO will be aligned to filesystem block size and so the underlying device block sizes are completely irrelevant... > To experiment with larger LBA formtas you can also use kdevops and enable > CONFIG_QEMU_ENABLE_EXTRA_DRIVE_LARGEIO. That enables a ton of drives with > logical and physical block sizes >= 4k up to a desriable max target for > experimentation. Since filesystems today only support up to 32k sector sizes, > in practice you may only want to experiment up to 32k physical / logical. > > Support for 64k sector sizes requires an XFS format change, which is something > Daniel Gomez has experimental patches for, in case folks are interested in > messing with. Please don't do this without first talking to the upstream XFS developers about the intended use cases and design of the new format. Especially when the problem involves requiring a whole new journal header and indexing format to be implemented... As a general rule, nobody should *ever* be writing patches to change the on-disk format of a -any- filesystem without first engaging the people who maintain that filesystem. Architecture and design come first, not implementation. The last thing we want is to have someone spend weeks or months on something that takes the experts half a minute to NACK because it is so obviously flawed.... > Patch 6 could probably be squashed with patch 5, but I wanted to be > explicit about this, as this should be decided with the community. > > There might be a better way to do this than do deal with the switching > of the aops dynamically, ideas welcomed! Is it even safe to switch aops dynamically? We know there are inherent race conditions in doing this w.r.t. mmap and page faults, as the write fault part of the processing is directly dependent on the page being correctly initialised during the initial population of the page data (the "read fault" side of the write fault). Hence it's not generally considered safe to change aops from one mechanism to another dynamically. Block devices can be mmap()d, but I don't see anything in this patch set that ensures there are no other users of the block device when the swaps are done. What am I missing? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx