On Mon, Sep 18, 2023 at 12:49:36PM +1000, Dave Chinner wrote: > On Sun, Sep 17, 2023 at 06:13:40PM -0700, Luis Chamberlain wrote: > > On Mon, Sep 18, 2023 at 10:59:07AM +1000, Dave Chinner wrote: > > > On Mon, Sep 18, 2023 at 12:14:48AM +0100, Matthew Wilcox wrote: > > > > On Mon, Sep 18, 2023 at 08:38:37AM +1000, Dave Chinner wrote: > > > > > On Fri, Sep 15, 2023 at 02:32:44PM -0700, Luis Chamberlain wrote: > > > > > > LBS devices. This in turn allows filesystems which support bs > 4k to be > > > > > > enabled on a 4k PAGE_SIZE world on LBS block devices. This alows LBS > > > > > > device then to take advantage of the recenlty posted work today to enable > > > > > > LBS support for filesystems [0]. > > > > > > > > > > Why do we need LBS devices to support bs > ps in XFS? > > > > > > > > It's the other way round -- we need the support in the page cache to > > > > reject sub-block-size folios (which is in the other patches) before we > > > > can sensibly talk about enabling any filesystems on top of LBS devices. > > > > > > > > Even XFS, or for that matter ext2 which support 16k block sizes on > > > > CONFIG_PAGE_SIZE_16K (or 64K) kernels need that support first. > > > > > > Well, yes, I know that. But the statement above implies that we > > > can't use bs > ps filesytems without LBS support on 4kB PAGE_SIZE > > > systems. If it's meant to mean the exact opposite, then it is > > > extremely poorly worded.... > > > > Let's ignore the above statement of a second just to clarify the goal here. > > The patches posted by Pankaj enable bs > ps even on 4k sector drives. > > Yes. They also enable XFS to support bs > ps on devices up to 32kB > sector sizes, too. All the sector size does is define the minimum > filesystem block size that can be supported by the filesystem on > that device and apart from that we just don't care what the sector > size on the underlying device is. Ah yes, but on a system with 4k page size, my point is that you still can't use a 32k sector size drive. > > This patch series by definition is suggesting that an LBS device is one > > where the minimum sector size is > ps, in practice I'm talking about > > devices where the logical block size is > 4k, or where the physical > > block size can be > 4k. > > Which XFS with bs > ps just doesn't care about. As long as the > logical sector size is a power of 2 between 512 bytes and 32kB, it > will just work. > > > There are two situations in the NVMe world where > > this can happen. One is where the LBA format used is > 4k, the other is > > where the npwg & awupf | nawupf is > 4k. The first makes the logical > > FWIW, I have no idea what these acronyms actually mean.... Sorry, I've described them them, here now: https://kernelnewbies.org/KernelProjects/large-block-size#LBS_NVMe_support What matters though is in practice it is a combination of two values which today also allows the nvme driver to have a higher than 4k physical block size. > > block size > 4k, the later allows the physical block size to be > 4k. > > The first *needs* an aops which groks > ps on the block device cache. > > The second let's you remain backward compatible with 4k sector size, but > > if you want to use a larger sector size you can too, but that also > > requires a block device cache which groks > ps. When using > ps for > > logical block size of physical block size is what I am suggesting we > > should call LBS devices. > > Sure. LBS means sector size > page size for the block device. That > much has always been clear. > > But telling me that again doesn't explain what LBS support has to do > with the filesystem implementation. mkfs.xfs doesn't require LBS > support to make a new XFS filesystem with a 32kB sector size and > 64kB filessytem block size. For the mounted filesystem that > supports bs > ps, it also doesn't care about the device sector size > is smaller than what mkfs told it to us. e.g. this is how run 4kB > sector size filesystem testing on 512 byte sector devices today.... > > What I'm asking is why LBS support even mentions filesystems? It's just that without this patchset you can mkfs an filesystem with larger bs > ps, but will only be able to mount them if the sector size matches the page size. This patchset allows them to be mounted and used where sector size > ps. The main issue there is just the block device cache aops and the current size limitaiton on set_blocksize(). > It's > not necessary for filesystems to support bs > ps for LBS to be > implemented, and it's not necessary for LBS to be supported for > filesytsems to implement bs > ps. Conflating them seems a recipe > for confusion.... I see, so we should just limit the scope to enabling LBS devices on the block device cache? > > > iomap supports bufferheads as a transitional thing (e.g. for gfs2). > > > > But not for the block device cache. > > That's why I'm suggesting that you implement support for bufferheads > through the existing iomap infrastructure instead of trying to > dynamically switch aops structures.... > > > > Hence I suspect that a better solution is to always use iomap and > > > the same aops, but just switch from iomap page state to buffer heads > > > in the bdev mapping > > > > Not sure this means, any chance I can trouble you to clarify a bit more? > > bdev_use_buggerheads() > { > /* > * set the bufferhead bit atomically with invalidation emptying the > * page cache to prevent repopulation races. > */ > filemap_invalidate_lock() > invalidate_bdev() > if (turn_on_bufferheads) > set_bit(bdev->state, BDEV_USE_BUFFERHEADS); > else > clear_bit(bdev->state, BDEV_USE_BUFFERHEADS); > filemap_invalidate_unlock() > } > > bdev_iomap_begin() > { > ..... > if (test_bit(bdev->state, BDEV_USE_BUFFERHEADS)) > iomap->flags |= IOMAP_F_BUFFER_HEAD; > } > > /* > * If an indexing switch happened while processing the iomap, make > * sure to get the iomap marked stale to force a new mapping to be > * looked up. > */ > bdev_iomap_valid() > { > bool need_bhs = iomap->flags & IOMAP_F_BUFFER_HEAD; > bool use_bhs = test_bit(bdev->state, BDEV_USE_BUGGERHEADS); > > return need_bhs == use_bhs; > } Will give this a shot, thanks! Luis