On Wed, Apr 10, 2024 at 09:34:36AM +0100, John Garry wrote: > On 08/04/2024 18:50, Luis Chamberlain wrote: > > I agree that when you don't set the sector size to 16k you are not forcing the > > filesystem to use 16k IOs, the metadata can still be 4k. But when you > > use a 16k sector size, the 16k IOs should be respected by the > > filesystem. > > > > Do we break BIOs to below a min order if the sector size is also set to > > 16k? I haven't seen that and its unclear when or how that could happen. > > AFAICS, the only guarantee is to not split below LBS. It would be odd to split a BIO given a inode requirement size spelled out, but indeed I don't recall verifying this gaurantee. > > At least for NVMe we don't need to yell to a device to inform it we want > > a 16k IO issued to it to be atomic, if we read that it has the > > capability for it, it just does it. The IO verificaiton can be done with > > blkalgn [0]. > > > > Does SCSI*require* an 16k atomic prep work, or can it be done implicitly? > > Does it need WRITE_ATOMIC_16? > > physical block size is what we can implicitly write atomically. Yes, and also on flash to avoid read modify writes. > So if you > have a 4K PBS and 512B LBS, then WRITE_ATOMIC_16 would be required to write > 16KB atomically. Ugh. Why does SCSI requires a special command for this? Now we know what would be needed to bump the physical block size, it is certainly a different feature, however I think it would be good to evaluate that world too. For NVMe we don't have such special write requirements. I put together this kludge with the last patches series of LBS + the bdev cache aops stuff (which as I said before needs an alternative solution) and just the scsi atomics topology + physical block size change to easily experiment to see what would break: https://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git/log/?h=20240408-lbs-scsi-kludge Using a larger sector size works but it does not use the special scsi atomic write. > > > To me, O_ATOMIC would be required for buffered atomic writes IO, as we want > > > a fixed-sized IO, so that would mean no mixing of atomic and non-atomic IO. > > Would using the same min and max order for the inode work instead? > > Maybe, I would need to check further. I'd be happy to help review too. Luis