Re: [PATCH 0/11] Add support for write life time hints

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Jun 13, 2017, at 2:13 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
> 
> On 06/13/2017 01:21 PM, Andreas Dilger wrote:
>> On Jun 13, 2017, at 12:26 PM, Jens Axboe <axboe@xxxxxxxxx> wrote:
>>> 
>>> On 06/13/2017 12:04 PM, Andreas Dilger wrote:
>>>> On Jun 13, 2017, at 11:15 AM, Jens Axboe <axboe@xxxxxxxxx> wrote:
>>>>> 
>>>>> A new iteration of this patchset, previously known as write streams.
>>>>> Instead of exposing numeric values for streams, I've previously
>>>>> advocated for just doing a set of hints that makes sense instead. See
>>>>> the coverage from the LSFMM summit this year:
>>>>> 
>>>>> https://lwn.net/Articles/717755/
>>>>> 
>>>>> This patchset attempts to do that. We define 4 flags for the pwritev2
>>>>> system call:
>>>>> 
>>>>> RWF_WRITE_LIFE_SHORT	Data written with this flag is expected to have
>>>>> 			a high overwrite rate, or life time.
>>>>> 
>>>>> RWF_WRITE_LIFE_MEDIUM	Longer life time than SHORT
>>>>> 
>>>>> RWF_WRITE_LIFE_LONG	Longer life time than MEDIUM
>>>>> 
>>>>> RWF_WRITE_LIFE_EXTREME	Longer life time than LONG
>>>>> 
>>>>> The idea is that these are relative values, so an application can
>>>>> use them as they see fit. The underlying device can then place
>>>>> data appropriately, or be free to ignore the hint. It's just a hint.
>>>>> 
>>>>> Comments appreciated.
>>>> 
>>>> I thought that one of the major attractions of numeric stream IDs was
>>>> that they had no semantic meanings, just "N is similar to N" and "M is
>>>> similar to M", and it is up to userspace to define what these mean?
>>>> 
>>>> That allows userspace to use the IDs for lifetimes (as above), but
>>>> also/instead use them for allocation sizes, different applications,
>>>> different users, etc.
>>> 
>>> Right, that is indeed the intent. But we have to attach some naming
>>> to them. Userspace could in theory use these totally randomly, and
>>> things like NVMe would not care. But the semantic meaning of "short"
>>> vs "long" is important on caching infrastructure where you might
>>> want to use the hint for data placement.
>>> 
>>> I think the important part here is that no absolute meaning is
>>> attached to them, only relative.
>> 
>> In both IOCB_WRITE_LIFE_* and RWF_WRITE_LIFE_* this is consuming 4 bits of
>> space (which is itself fine) for only 4 different stream IDs.  Why not just
>> shift a 4-bit arbitrary stream ID to the appropriate offset in those fields,
>> rather than treating them as 4 individual bits and allowing only one of
>> them to be passed down the stack at a time?
> 
> I did think about that, and I'm a bit split on it. It turns a bit mask
> into a hybrid beast, with bits and sets of bits for values.

I don't think that is too confusing for anyone.

> For utilization of the space, yes, we could just use 2 bits instead of
> the 4. Or use the 4 bits and potentially have the app pass in up to 16
> values. For the latter, I'm still very much in favor of keeping the app
> interface super simple and just retaining the 4 life time types.

I don't think anyone cares too much about 4 bits.  Strictly speaking,
the current implementation can't fit into 2 bits because it has 5 values
if one includes "no ID".

It is a bit of overhead to use 32 bits in most of the structs where this
field is actually stored.  I suspect that using only 8 or 16 bits in the
structs is better, and even if it doesn't find an existing hole ("pahole"
is your friend here) it will allow something else to use the remaining
space in the future.

For userspace, I think it is fine to define the WRITE_LIFE values as they
are today, and most users will use them, but IMHO it makes sense to
allow arbitrary IDs as they see fit, especially if the underlying hardware
doesn't care much about the actual values.

> If folks feel strongly about the wasted space, and I can definitely
> revisit and just pack it.

Cheers, Andreas





Attachment: signature.asc
Description: Message signed with OpenPGP


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux