Re: why not add (offset,len) to pglog

Ning Yao <zay11022@xxxxxxxxx> · Fri, 22 Jan 2016 19:40:21 +0800

Great! Based on Sage's suggestion, we just add a flag
can_recover_partial to indicate whether.
And I promote a new PR for this https://github.com/ceph/ceph/pull/7325
Please review and comment
Regards
Ning Yao

2015-12-25 22:27 GMT+08:00 Sage Weil <sage@xxxxxxxxxxxx>:
> On Fri, 25 Dec 2015, Ning Yao wrote:
>> Hi, Dong Wu,
>>
>> 1. As I currently work for other things, this proposal is abandon for
>> a long time
>> 2. This is a complicated task as we need to consider a lots such as
>> (not just for writeOp, as well as truncate, delete) and also need to
>> consider the different affects for different backends(Replicated, EC).
>> 3. I don't think it is good time to redo this patch now, since the
>> BlueStore and Kstore  is inprogress, and I'm afraid to bring some
>> side-effect.  We may prepare and propose the whole design in next CDS.
>> 4. Currently, we already have some tricks to deal with recovery (like
>> throttle the max recovery op, set the priority for recovery and so
>> on). So this kind of patch may not solve the critical problem but just
>> make things better, and I am not quite sure that this will really
>> bring a big improvement. Based on my previous test, it works
>> excellently on slow disk (say hdd), and also for a short-time
>> maintaining. Otherwise, it will trigger the backfill process.  So wait
>> for Sage's opinion @sage
>>
>> If you are interest on this, we may cooperate to do this.
>
> I think it's a great idea.  We didn't do it before only because it is
> complicated.  The good news is that if we can't conclusively infer exactly
> which parts of hte object need to be recovered from the log entry we can
> always just fall back to recovering the whole thing.  Also, the place
> where this is currently most visible is RBD small writes:
>
>  - osd goes down
>  - client sends a 4k overwrite and modifies an object
>  - osd comes back up
>  - client sends another 4k overwrite
>  - client io blocks while osd recovers 4mb
>
> So even if we initially ignore truncate and omap and EC and clones and
> anything else complicated I suspect we'll get a nice benefit.
>
> I haven't thought about this too much, but my guess is that the hard part
> is making the primary's missing set representation include a partial delta
> (say, an interval_set<> indicating which ranges of the file have changed)
> in a way that gracefully degrades to recovering the whole object if we're
> not sure.
>
> In any case, we should definitely have the design conversation!
>
> sage
>
>>
>> Regards
>> Ning Yao
>>
>>
>> 2015-12-25 14:23 GMT+08:00 Dong Wu <archer.wudong@xxxxxxxxx>:
>> > Thanks, from this pull request I learned that this issue is not
>> > completed, is there any new progress of this issue?
>> >
>> > 2015-12-25 12:30 GMT+08:00 Xinze Chi (??) <xmdxcxz@xxxxxxxxx>:
>> >> Yeah, This is good idea for recovery, but not for backfill.
>> >> @YaoNing have pull a request about this
>> >> https://github.com/ceph/ceph/pull/3837 this year.
>> >>
>> >> 2015-12-25 11:16 GMT+08:00 Dong Wu <archer.wudong@xxxxxxxxx>:
>> >>> Hi,
>> >>> I have doubt about pglog, the pglog contains (op,object,version) etc.
>> >>> when peering, use pglog to construct missing list,then recover the
>> >>> whole object in missing list even if different data among replicas is
>> >>> less then a whole object data(eg,4MB).
>> >>> why not add (offset,len) to pglog? If so, the missing list can contain
>> >>> (object, offset, len), then we can reduce recover data.
>> >>> _______________________________________________
>> >>> ceph-users mailing list
>> >>> ceph-users@xxxxxxxxxxxxxx
>> >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>> >>
>> >>
>> >> --
>> >> Regards,
>> >> Xinze Chi
>> > _______________________________________________
>> > ceph-users mailing list
>> > ceph-users@xxxxxxxxxxxxxx
>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com