On Tue, Feb 28, 2023 at 10:52:15PM -0500, Theodore Ts'o wrote: > Emulated block devices offered by cloud VM’s can provide functionality > to guest kernels and applications that traditionally have not been > available to users of consumer-grade HDD and SSD’s. For example, > today it’s possible to create a block device in Google’s Persistent > Disk with a 16k physical sector size, which promises that aligned 16k > writes will be atomically. With NVMe, it is possible for a storage > device to promise this without requiring read-modify-write updates for > sub-16k writes. I'm not sure it does. NVMe spec doesn't say AWUN writes are never a RMW operation. NVMe suggests aligning to NPWA is the best way to avoid RMW, but doesn't guarantee that, nor does it require this limit aligns to atomic boundaries. NVMe provides a lot of hints, but stops short of promises. Vendors can promise whatever they want, but that's outside spec. > All that is necessary are some changes in the block > layer so that the kernel does not inadvertently tear a write request > when splitting a bio because it is too large (perhaps because it got > merged with some other request, and then it gets split at an > inconvenient boundary). All the limits needed to optimally split on phyiscal boundaries exist, so I hope we're using them correctly via get_max_io_size(). That said, I was hoping you were going to suggest supporting 16k logical block sizes. Not a problem on some arch's, but still problematic when PAGE_SIZE is 4k. :)