On Mar 6, 2016, at 6:03 AM, Martin K. Petersen <martin.petersen@xxxxxxxxxx> wrote: > >> "Andreas" == Andreas Dilger <adilger@xxxxxxxxx> writes: > > Andreas, > > Andreas> What are your thoughts on reserving a small number of the > Andreas> stream ID values for filesystem metadata (e.g. the first 31 > Andreas> since 0 == unused)? > > The stream ID address space might be 16-bit but the devices currently > being discussed can only handle a few concurrently open streams (4, 8, > 16). > > Discussions are ongoing about whether the devices should be able to > implicitly close a stream based on LRU or something similar when the hw > max is reached. But as it stands you have to explicitly close an > existing stream with a separate command when the max is reached. > Otherwise a write with a previously unused stream ID will fail. > > I.e. there is a significant cost to switching between stream IDs and > thus I am afraid the current stuff isn't well suited to having a fine > grained tagging approach like the one you are proposing. It makes sense to isolate userspace from the number of streams that are available on a device, otherwise they will have to grub into the details of different kinds of hardware (SCSI mode pages, SATA, complexity with RAID, etc) that is best kept within the kernel. I think everyone agrees that there isn't going to be a 1:1 mapping from the Stream ID given by userspace to the actual device, but there are many different uses for data/metadata labels at the block layer beyond just the SSD write aggregation. If the device can't handle these streams, we can always merge them at the point they are sent to the device, but you can't invent the data you want if you don't have it in the first place. It doesn't cost anything to reserve the first 32 values for filesystem metadata, and they can be aggregated more or less depending on the hardware capabilities. Even if current devices only support 4 or 8 streams, I'm sure that this will improve in the future, so it doesn't make sense to limit ourselves based on the very first devices on the market. Also, this opens up interesting possibilities for blktrace, DM layers like dm-thinp, bcache, etc. that are currently lacking any kind of data on how they should allocate blocks. Ted described the contortions he does to map from block offsets in blktrace to filesystem metadata using debugfs output and scripts, and not everyone is as knowledgeable about filesystem internals as he is, but still wants to be able to diagnose filesystem IO latency issues. Cheers, Andreas
Attachment:
signature.asc
Description: Message signed with OpenPGP using GPGMail