Re: ec overwrite issue

Xinze Chi (信泽) <xmdxcxz@xxxxxxxxx> · Fri, 6 Oct 2017 20:58:14 +0800



I just wonder why we set backfill in ECSubWrite base on should_send_op func.:-)

2017-10-06 1:35 GMT+08:00 Gregory Farnum <gfarnum@xxxxxxxxxx>:
> I...think so? Did you have a specific purpose in mind, though? I might
> have missed something when I was going through it. ;)
> -Greg
>
> On Thu, Oct 5, 2017 at 7:11 AM, Xinze Chi (信泽) <xmdxcxz@xxxxxxxxx> wrote:
>> so we could roll forward no matter object > last_backfill or <
>> last_backfill,  as long as it is backfill target?
>> If so, we could set backfill in ECSubWrite true if it is backfill target?
>>
>> 2017-10-05 2:11 GMT+08:00 Gregory Farnum <gfarnum@xxxxxxxxxx>:
>>> On Fri, Sep 29, 2017 at 5:19 PM, Xinze Chi (信泽) <xmdxcxz@xxxxxxxxx> wrote:
>>>> such as transaction a would modify the object a < last_backfill, so
>>>> transaction_applied would be true. Before transaction a is completed,
>>>> the transaction b which modify the object b > last_backfill,
>>>> so transaction_applied would be false, the current logic would
>>>> roll_forward which including object a and b? is it right?
>>>
>>> Yes, I believe that's the case. It's just that we don't care very much
>>> — if we copied the data while backfilling, we know that our source
>>> peer has the rollback state. Keep in mind that we only have rollback
>>> so that we can avoid the "RAID write hole" — eg, if we manage to write
>>> down an update on 4 nodes in a 5+3 erasure code, we can recover
>>> neither the old nor new data if it was written in-place. So we keep
>>> rollback data in that case and everybody goes back to the previous
>>> state.
>>>
>>> I *think* that if we manage to backfill an object for a particular
>>> shard, then we know that we can roll forward on it anyway or the read
>>> would have failed and the OSDs would have already rolled back. But I
>>> didn't check that. Certainly doing something other than this automatic
>>> roll forward would require a lot more bookkeeping that would make
>>> everything else going on more difficult.
>>> -Greg
>>>
>>>>
>>>>
>>>> 2017-09-30 2:27 GMT+08:00 Gregory Farnum <gfarnum@xxxxxxxxxx>:
>>>>> On Fri, Sep 29, 2017 at 3:02 AM, Xinze Chi (信泽) <xmdxcxz@xxxxxxxxx> wrote:
>>>>>> hi, all
>>>>>>
>>>>>>     I confuse the roll_forward logic in the PG::append_log. The
>>>>>> pg_log.roll_forward func may roll forward the all inflight
>>>>>> transactions which maybe not be completed by all shards.
>>>>>>
>>>>>>     The comment also makes me wonder. so could anyone explain it in
>>>>>> detail. thanks.
>>>>>>
>>>>>>
>>>>>>   if (!transaction_applied) {
>>>>>>      /* We must be a backfill peer, so it's ok if we apply
>>>>>>       * out-of-turn since we won't be considered when
>>>>>>       * determining a min possible last_update.
>>>>>>       */
>>>>>>     pg_log.roll_forward(&handler);
>>>>>>   }
>>>>>>
>>>>>>     /* We don't want to leave the rollforward artifacts around
>>>>>>      * here past last_backfill.  It's ok for the same reason as
>>>>>>      * above */
>>>>>>     if (transaction_applied &&
>>>>>>        p->soid > info.last_backfill) {
>>>>>>       pg_log.roll_forward(&handler);
>>>>>>     }
>>>>>
>>>>> transaction_applied can only be false if we are being backfilled. If
>>>>> we are being backfilled, we may not *have* the older data that we
>>>>> would rollback to, and our peers don't rely on us having that data. So
>>>>> there's no point in our trying to keep rollback data around, and
>>>>> keeping it around would mean finding a way to clean it up later. Thus,
>>>>> delete it now.
>>>>> -Greg
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Xinze Chi
>>
>>
>>
>> --
>> Regards,
>> Xinze Chi


-- 
Regards,
Xinze Chi
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html