> On Dec 22, 2023, at 6:10 PM, Keith Busch <kbusch@xxxxxxxxxx> wrote: > > <skipped> > > Other applications, though, still need 4k writes. Turning those to RMW > on the host to modify 4k in the middle of a 16k block is obviously a bad > fit. So, if application doesn’t work with raw device directly or not use O_DIRECT, then we always have file system’s page cache in the middle. It sounds like 4K write operation makes dirty the whole 16K logical block, from file system point of view. Finally, file system will need to flush the whole 16K logical block, even if 4k modification was only in the middle of 16K. Potentially, it could sound like increasing write amplification. However, usually, metadata could require smaller granularity (like 4K). But metadata is frequently updated type of data. So, there is significant probability that, at average, 16K logical block with metadata can be evenly updated by 4K write operations before flush operation. If we have cold user data, then logical block size doesn’t matter because write operation can be aligned. I assume that frequently updated user data could be localized at some file’s area(s). It means that 16K logical block size could gather several 4K frequently updated areas Theoretically, it is possible to imagine really nasty even distribution of 4K updates through the whole file with holes in between, but it looks like some stress testing or benchmarking, but not real-life use-case or workload. Let’s imagine that application writes directly to raw device by 4K I/O operations. If block device supports 16K physical sector size, then can we write by 4K I/O operations? From another point of view, if I know that my application updates by 4K I/O, then what’s the point to use device with 16K physical sector size, for example. I hope we will have opportunity to make a choice between devices that supports 4K and 16K physical sector sizes. But, technically speaking, storage device usually receives multiple I/O requests at the same time. Even if it receives 4K updates for different LBAs, then it is possible to combine several 4K updates into 16K NAND flash page. The question here is how to map the updates into LBAs efficiently. Because, the main FTL’s responsibility is mapping (LBA into erase blocks, for example). Thanks, Slava.