Re: [PATCH 0/11] Update version of write stream ID patchset

Shaun Tancheff <shaun.tancheff@xxxxxxxxxxx> · Sun, 6 Mar 2016 14:51:53 -0600

On Sun, Mar 6, 2016 at 10:08 AM, Boaz Harrosh <boaz@xxxxxxxxxxxxx> wrote:
> On 03/06/2016 03:03 PM, Martin K. Petersen wrote:
>>>>>>> "Andreas" == Andreas Dilger <adilger@xxxxxxxxx> writes:
>>
>> Andreas,
>>
>> Andreas> What are your thoughts on reserving a small number of the
>> Andreas> stream ID values for filesystem metadata (e.g. the first 31
>> Andreas> since 0 == unused)?
>>
>> The stream ID address space might be 16-bit but the devices currently
>> being discussed can only handle a few concurrently open streams (4, 8,
>> 16).
>>
>
> So I can't for the life of me understand what is the use of all this.
>
> If what Jens said at beginning for data grouping than what is it at all
> got to do with open and close of anything? Really?
>
> Does it make any sense to you? I mean with multy-queue with HW channels
> for every CPU, parallel IO, and poling because even interrupts are too
> slow, you want me to cram everything and serialize it on 4 open/close
> streams?
>
> Either I'm completely missing something. Or please please explain
> what is going on. How does 4 stream make any sense in today's NvME
> HW? How does open/close anything make any sense?
>
> On the surface it looks like someone is smoking something really
> bad.
>
>> Discussions are ongoing about whether the devices should be able to
>> implicitly close a stream based on LRU or something similar when the hw
>> max is reached. But as it stands you have to explicitly close an
>> existing stream with a separate command when the max is reached.
>> Otherwise a write with a previously unused stream ID will fail.
>>
>
> ?? (see smoking above ;-) )

As I see we can slice it up 3 ways. Either the limited stream resource
management is pushed up to the user space, kernel space, or device
space. I think there are reasonable gains and choices at anyone of
these logical partitions.

In the user space the application has (or should have) a good idea
of what data is related. However each application is essentially
independent from other parts of user space trying to make use of
the same limited resource.

This takes us to the next logical step of stream id management in
kernel space which can better manage the limited resource and
group application stream requests together. The kernel space *could*
handle all the LRU semantics of opening and closing streams or
simply just rotating around and reusing the oldest stream id.
This presupposes that the device itself is attached to a single
machine and the kernel is in full control of the device.

Finally we could push back and demand that the device handle the
LRU semantics and never fail a write or open to a stream.

I think we should be doing all of the above.

In kernel space we know the file system volatile meta data and it's
probably a good thing to keep a stream for that. Optionally it may
be useful to have a second stream for the journal.

The rest of the hinting is best to be seeded from the application with
some sort of fallback if the application fails to set an affinity.
How the applications' hinting is muxed together once the devices
stream id's are exhausted probably doesn't matter a great deal so
long as it is consistent. Be it 4 or 256 underlying streams available
it is still in effect a spectrum of volatility to stability at different
levels of granularity. Larger sized devices will presumably have
more streams.

The only thing we really don't want to see (IMO) is the device
co-mingling streams that really shouldn't be together because
the firmware does a bad job with it's LRU and mixes volatile data
with stable data.

>> I.e. there is a significant cost to switching between stream IDs and
>> thus I am afraid the current stuff isn't well suited to having a fine
>> grained tagging approach like the one you are proposing.
>>
>
> So again who cares for anything that hurts performance? Why would
> I want to use anything that has "significant cost" at all?
>
> Thanks
> Boaz
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  https://urldefense.proofpoint.com/v2/url?u=http-3A__vger.kernel.org_majordomo-2Dinfo.html&d=CwICaQ&c=IGDlg0lD0b-nebmJJ0Kp8A&r=Wg5NqlNlVTT7Ugl8V50qIHLe856QW0qfG3WVYGOrWzA&m=JVe7H9CKksyV104sj3-n5S30M69tK1rmoOYQnc5L2_c&s=V8k2l5c5QZsZzWcWZD8DGQi_wQ_DXbubz-sGvIEfNWA&e=

-- 
Shaun Tancheff
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html