Re: [PATCH 0/11] Update version of write stream ID patchset

Dan Williams <dan.j.williams@xxxxxxxxx> · Thu, 17 Mar 2016 16:43:39 -0700

On Tue, Mar 8, 2016 at 1:56 PM, Jens Axboe <axboe@xxxxxx> wrote:
> On 03/05/2016 01:48 PM, Martin K. Petersen wrote:
>>>>>>>
>>>>>>> "Jens" == Jens Axboe <axboe@xxxxxx> writes:
>>
>>
>> Jens,
>>
>>>> OK.  I'm still of the opinion that we should try to make this
>>>> transparent.  I could be swayed by workload descriptions and numbers
>>>> comparing approaches, though.
>>
>>
>> Jens> You can't just waive that flag and not have a solution. Any
>> Jens> solution in that space would imply having policy in the kernel. A
>> Jens> "just use a stream per file" is never going to work.
>>
>> I totally understand the desire to have explicit, long-lived
>> "from-file-open to file-close" streams for things like database journals
>> and whatnot.
>
>
> That is an appealing use case.
>
>> However, I think that you are dismissing the benefits of being able to
>> group I/Os to disjoint LBA ranges within a brief period of time as
>> belonging to a single file. It's something that we know works well on
>> other types of storage. And it's also a much better heuristic for data
>> placement on SSDs than just picking the next available bucket. It does
>> require some pipelining on the drive but they will need some front end
>> logic to handle the proposed stream ID separation in any case.
>
>
> I'm not a huge fan of heuristics based exclusively around the temporal and
> spacial locality. Using that as a hint for a case where no stream ID (or
> write tag) is given would be an improvement, though. And perhaps parts of
> the space should be reserved to just that.
>
> But I don't think that should exclude doing this in a much more managed
> fashion, personally I find that a lot saner than adding this sort of state
> tracking in the kernel.
>
>> Also, in our experiments we essentially got the explicit stream ID for
>> free by virtue of the journal being written often enough that it was
>> rarely if ever evicted as an active stream by the device. With no
>> changes whatsoever to any application.
>
>
> Journal would be an easy one to guess, for sure.
>
>> My gripe with the current stuff is the same as before: The protocol is
>> squarely aimed at papering over issues with current flash technology. It
>> kinda-sorta works for other types of devices but it is very limiting. I
>> appreciate that it is a great fit for the "handful of apps sharing a
>> COTS NVMe drive on a cloud server" use case. But I think it is horrible
>> for NVMe over Fabrics and pretty much everything else. That wouldn't be
>> a big deal if the traditional storage models were going away. But I
>> don't think they are...
>
>
> I don't think erase blocks are going to go away in the near future. We're
> going to have better media as well, that's a given, but cheaper TLC flash is
> just going to make the current problem much worse. The patchset is really
> about tagging the writes with a stream ID, nothing else. That could
> potentially be any type of hinting, it's not exclusive to being used with
> NVMe write directives at all.
>

Maybe I'm misunderstanding, but why does stream-id imply anything more
than just "opaque tag set at the top of the stack that makes it down
to a driver".  Sure NVMe can interpret these as NVMe streams, but any
other driver can have its own transport specific translation of what
the hint means.  I think the minute the opaque number requires
specific driver behavior we'll fall into a rat hole of how to
translate intent across usages.

In other words, I think it will always be the case that the hint has
application + transport/driver meaning, but otherwise the kernel is
just a conduit.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html