On Wed, Feb 28, 2024 at 01:38:44PM +0200, Amir Goldstein wrote: > > Seems a duplicate of this topic proposed by Luis? > > https://lore.kernel.org/linux-fsdevel/ZdfDxN26VOFaT_Tv@xxxxxxxxxxxxxxxxxxxxxx/ Maybe. I did see Luis's topic, but it seemed to me to be largely orthogonal to what I was interested in talking about. Maybe I'm missing something, but my observations were largely similar to Dave Chinner's comments here: https://lore.kernel.org/r/ZdvXAn1Q%2F+QX5sPQ@xxxxxxxxxxxxxxxxxxx/ To wit, there are two cases here; either the desired untorn write granularity is smaller than the large block size, in which case there really nothing that needs to be done from an API perspective. Alternatively, if the desired untorn granularity is *larger* than the large block size, then the API considerations are the same with or without LBS support. >From the implementation perspective, yes, there is a certain amount of commonality, but that to me is relatively trivial --- or at least, it isn't a particular subtle design. That is, in the writeback code, it needs to know what the desired write granularity, whether it is required by the device because the logical sector size is larger than the page size, or because there is an untorn write granularity requested by the userspace process doing the writing (in practice, pretty much always 16k for databases). In terms of what the writeback code needs to do, it needs to make sure that gathers up pages respecting the alignment and required size, and if a page is locked, we have to wait until it is available, instead of skipping that page in the case of a non-data-integrity writeback. As far as tooling/testing is concerned, against, it appears to me that the requirements of LBA and the desire for untorn writes in units of granularity larger than the block size are quite orthogonal. For LBA, all you need is some kind of synthetic/debug device which has a logical block size larger than the page size. This could be done a number of ways: * via the VMM --- e.g., a QEMU block device that has a 64k logical sector size. * via loop device that exports a larger logical sector size * via blktrace (or its ebpf or ftrace) and making sure that size of every write request is the right multiple of 512 byte sectors For testing untorn writes, life is a bit tricker, because not all writes will be larger than the page size. For example, we might have an ext4 file system with a 4k blocksize, so metadata writes to the inode table, etc., will be in 4k writes. However, when writing to the database file, *those* writes need to be in multiples of 16k, with 16k alignment required, and if a write needs to be broken up it must be at a 16k boundary. The tooling for this, which is untorn write specific, and completely irrelevant for the LBS case, needs to know which parts of the storage device are assigned to the database file --- and which are not. If the database file is not getting deleted or truncated, it's relatively easy to take a blktrace (or ebpf or ftrace equivalent) and validate all of the I/O's, after the fact. The tooling to do this isn't terribly complicated, would involve using filefrag -v if the file system is already mounted, and a file system specific tool (i.e., debugfs for ext4, or xfs_db for xfs) if the file system is not mounted. Cheers, - Ted