Re: [PATCH 0/11] Update version of write stream ID patchset

Andreas Dilger <adilger@xxxxxxxxx> · Sun, 6 Mar 2016 15:42:27 -0700

On Mar 6, 2016, at 6:03 AM, Martin K. Petersen <martin.petersen@xxxxxxxxxx> wrote:
> 
>> "Andreas" == Andreas Dilger <adilger@xxxxxxxxx> writes:
> 
> Andreas,
> 
> Andreas> What are your thoughts on reserving a small number of the
> Andreas> stream ID values for filesystem metadata (e.g. the first 31
> Andreas> since 0 == unused)?
> 
> The stream ID address space might be 16-bit but the devices currently
> being discussed can only handle a few concurrently open streams (4, 8,
> 16).
> 
> Discussions are ongoing about whether the devices should be able to
> implicitly close a stream based on LRU or something similar when the hw
> max is reached. But as it stands you have to explicitly close an
> existing stream with a separate command when the max is reached.
> Otherwise a write with a previously unused stream ID will fail.
> 
> I.e. there is a significant cost to switching between stream IDs and
> thus I am afraid the current stuff isn't well suited to having a fine
> grained tagging approach like the one you are proposing.

It makes sense to isolate userspace from the number of streams that are
available on a device, otherwise they will have to grub into the details
of different kinds of hardware (SCSI mode pages, SATA, complexity with
RAID, etc) that is best kept within the kernel.

I think everyone agrees that there isn't going to be a 1:1 mapping from
the Stream ID given by userspace to the actual device, but there are many
different uses for data/metadata labels at the block layer beyond just
the SSD write aggregation.  If the device can't handle these streams, we
can always merge them at the point they are sent to the device, but you
can't invent the data you want if you don't have it in the first place.

It doesn't cost anything to reserve the first 32 values for filesystem
metadata, and they can be aggregated more or less depending on the hardware
capabilities.  Even if current devices only support 4 or 8 streams, I'm
sure that this will improve in the future, so it doesn't make sense to
limit ourselves based on the very first devices on the market.

Also, this opens up interesting possibilities for blktrace, DM layers like
dm-thinp, bcache, etc. that are currently lacking any kind of data on how
they should allocate blocks.  Ted described the contortions he does to map
from block offsets in blktrace to filesystem metadata using debugfs output
and scripts, and not everyone is as knowledgeable about filesystem internals
as he is, but still wants to be able to diagnose filesystem IO latency issues.

Cheers, Andreas

Attachment:
signature.asc

Description: Message signed with OpenPGP using GPGMail