On Thu, Oct 03, 2024 at 04:14:57PM -0600, Jens Axboe wrote: > On 10/3/24 6:54 AM, Christoph Hellwig wrote: > > For file: yes. The problem is when you have more files than buckets on > > the device or file systems. Typical enterprise SSDs support somewhere > > between 8 and 16 write streams, and there typically is more data than > > that. So trying to group it somehow is good idea as not all files can > > have their own bucket. > > > > Allowing this inside a file like done in this patch set on the other > > hand is pretty crazy. > > I do agree that per-file hints are not ideal. In the spirit of making > some progress, how about we just retain per-io hints initially? We can > certainly make that work over dio. Yes buffered IO won't work initially, > but at least we're getting somewhere. Huh? Per I/O hints at the syscall level are the problem (see also the reply from Martin). Per file make total sense, but we need the file system in control. The real problem is further down the stack. For the SCSI temperature hints just passing them on make sense. But when you map to some kind of stream separation in the device, no matter if that is streams, FDP, or various kinds of streams we don't even support in thing like CF and SDcard, the driver is not the right place to map temperature hint to streams. The requires some kind of intelligence. It could be dirt simple and just do a best effort mapping of the temperature hints 1:1 to separate write streams, or do a little mapping if there is not enough of them which should work fine for a raw block device. But one we have a file system things get more complicated: - the file system will want it's own streams for metadata and GC - even with that on beefy enough hardware you can have more streams then temperature levels, and the file system can and should do intelligen placement (based usually on files) Or to summarize: the per-file temperature hints make sense as a user interface. Per-I/O hints tend to be really messy at least if a file system is involved. Placing the temperatures to separate write streams in the driver does not scale even to the most trivial write stream aware file system implementations. And for anyone who followed the previous discussions of the patches none of this should been new, each point has been made at least three times before.