On Thu, Oct 17, 2024 at 08:05:38PM +0530, Kanchan Joshi wrote: > Seems per-I/O hints are not getting the love they deserve. > Apart from the block device, the usecase is when all I/Os of VM (or > container) are to be grouped together or placed differently. But that assumes the file system could actually support it. Which is hard when you don't assume the file system isn't simply a passthrough entity, which will not give you great results. > > 2) A per-I/O interface to set these temperature hint conflicts badly > > with how placement works in file systems. If we have an urgent need > > for it on the block device it needs to be opt-in by the file operations > > so it can be enabled on block device, but not on file systems by > > default. This way you can implement it for block device, but not > > provide it on file systems by default. If a given file system finds > > a way to implement it it can still opt into implementing it of course. > > Why do you see this as something that is so different across filesystems > that they would need to "find a way to implement"? If you want to do useful stream separation you need to write data sequentially into the stream. Now with streams or FDP that does not actually imply sequentially in LBA space, but if you want the file system to not actually deal with fragmentation from hell, and be easily track what is grouped together you really want it sequentially in the LBA space as well. In other words, any kind of write placement needs to be intimately tied to the file system block allocator. > Both per-file and per-io hints are supplied by userspace. Inode and > kiocb only happen to be the mean to receive the hint information. > FS is free to use this information (iff it wants) or simply forward this > down. As mentioned above just passing it down is not actually very useful. It might give you nice benchmark numbers when you basically reimplement space management in userspace on a fully preallocated file, but for that you're better of just using the block device. If you actually want to treat the files as files you need full file system involvement. > Per-file hint just gets stored (within inode) without individual FS > involvement. Per-io hint follows the same model (i.e., it is set by > upper layer like io_uring/aio) and uses kiocb to store the hint. It does > not alter the stored inode hint value! Yes, and now you'll get complaints that the file system ignores it when it can't properly support it. This is why we need a per-fop opt in. > The generic code (like fs/direct-io.c, fs/iomap/direct-io.c etc.,) > already forwards the incoming hints, without any intelligence. Yes, and that is a problem. We stopped doing that, but Samsung sneaked some of this back in recently as I noticed. > Overall, I do not see the conflict. It's all user-driven. No? I have the gut feeling that you've just run benchmarks on image files emulating block devices and not actually tried real file system workloads based on this unfortunately.