Re: [PATCH 0/6] scattered page writeback

"Yan, Zheng" <zyan@xxxxxxxxxx> · Tue, 19 Jan 2016 21:31:55 +0800

> On Jan 19, 2016, at 17:38, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> 
> On Tue, Jan 19, 2016 at 9:27 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>> On Tue, Jan 19, 2016 at 8:12 AM, Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>>> On Thu, Jan 7, 2016 at 3:05 PM, Yan, Zheng <zyan@xxxxxxxxxx> wrote:
>>>> This series adds scattered page writeback, which uses one OSD request
>>>> to writeback nonconsecutive dirty page within single strip. Scattered
>>>> page writeback can increase performance of buffered random writes.
>>> 
>>> Hi Zheng,
>>> 
>>> I took a cursory look at this and I really wonder if turning an OSD
>>> request with its two messages into an ad-hoc dynamic array and using it
>>> like that is a good idea.  ceph_writepages_start() is one easy to
>>> follow function - is there no way to pre-calculate the number of dirty
>>> pages we'd want to stuff into a single request, or at least estimate
>>> it?  With the geometric expansion you've got in there, you'd go 3 ->
>>> 6 -> 12 -> 16 for a 16 page strip?  That seems suboptimal...
>>> 
>>> A couple of implementation details in libceph that stood out:
>>> 
>>> - Why use ceph_kvmalloc() for r_ops array?  It's at most ~2k, so you'd
>>>  only be using vmalloc() as a fallback and we don't want to do that.
>>> 
>>> - Moving reply_op_{len,result} into ceph_osd_req_op to save memory is
>>>  the right thing, but, continuing on that, why waste r_inline_ops in
>>>  the >CEPH_OSD_INITIAL_OP case?  That's almost a quarter of r_ops max
>>>  capacity - ~500 bytes for each "fat" request.
>>> 
>>> - Why is it that request message's front is enlarged proportionally to
>>>  max_ops, while reply message's front is always enlarged by
>>>  (MAX-INITIAL)*foo?  Shouldn't it be proportional to max_ops as well?
>>>  It definitely deserves a comment, if not.
>>> 
>>> - The "4" in (sizeof(struct ceph_osd_op) + 4) also deserves a comment.
>>>  I had to go and look to figure out that it's for reply_op_result.
>>> 
>>> Thanks,
>>> 
>>>                Ilya
>> 
>> Hi,
>> 
>> I have send V2 patches which address most of your comments. For
>> simplicity, the code still does not use r_inline_ops when num_ops >
>> CEPH_OSD_INITIAL_OP.  I assume the kernel slab allocator is efficient
>> in allocating memory < 4k. The difference between allocating 2k memory
>> and 1.5k memory can be negligible.
> 
> I'm not sure I buy that argument - that's ~500 bytes for each fat OSD
> request, of which there can be many.
> 
> I see in your v2 you pre-calculate the number of dirty pages, but still
> start with a small OSD request and then expand in one go.  Expanding
> that way is better, but what I was getting at is can we pre-calculate
> before allocating an OSD request?  Then, try to allocate a fat request,
> if that fails with -ENOMEM, get a small request from the mempool.  That
> would be a lot simpler and we won't have to argue about r_inline_ops ;)
> 

end v3 for patch 6. please take a look

Regards
Yan, Zheng

> Thanks,
> 
>                Ilya

--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html