Re: ec overwrite issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Oh, if you look at the patch adding the "backfill" member there,
you'll see it was for a specific thing that used to be implied by
other state in the message. It's much newer than most of these code
checks and states, so I think Sam just didn't think about switching
other stuff to rely on it at the time, and nobody else has gone
through and audited which things it might make sense to switch out.
(And we do need to be very careful about these things, unfortunately.)

On Fri, Oct 6, 2017 at 5:58 AM, Xinze Chi (信泽) <xmdxcxz@xxxxxxxxx> wrote:
> I just wonder why we set backfill in ECSubWrite base on should_send_op func.:-)
>
> 2017-10-06 1:35 GMT+08:00 Gregory Farnum <gfarnum@xxxxxxxxxx>:
>> I...think so? Did you have a specific purpose in mind, though? I might
>> have missed something when I was going through it. ;)
>> -Greg
>>
>> On Thu, Oct 5, 2017 at 7:11 AM, Xinze Chi (信泽) <xmdxcxz@xxxxxxxxx> wrote:
>>> so we could roll forward no matter object > last_backfill or <
>>> last_backfill,  as long as it is backfill target?
>>> If so, we could set backfill in ECSubWrite true if it is backfill target?
>>>
>>> 2017-10-05 2:11 GMT+08:00 Gregory Farnum <gfarnum@xxxxxxxxxx>:
>>>> On Fri, Sep 29, 2017 at 5:19 PM, Xinze Chi (信泽) <xmdxcxz@xxxxxxxxx> wrote:
>>>>> such as transaction a would modify the object a < last_backfill, so
>>>>> transaction_applied would be true. Before transaction a is completed,
>>>>> the transaction b which modify the object b > last_backfill,
>>>>> so transaction_applied would be false, the current logic would
>>>>> roll_forward which including object a and b? is it right?
>>>>
>>>> Yes, I believe that's the case. It's just that we don't care very much
>>>> — if we copied the data while backfilling, we know that our source
>>>> peer has the rollback state. Keep in mind that we only have rollback
>>>> so that we can avoid the "RAID write hole" — eg, if we manage to write
>>>> down an update on 4 nodes in a 5+3 erasure code, we can recover
>>>> neither the old nor new data if it was written in-place. So we keep
>>>> rollback data in that case and everybody goes back to the previous
>>>> state.
>>>>
>>>> I *think* that if we manage to backfill an object for a particular
>>>> shard, then we know that we can roll forward on it anyway or the read
>>>> would have failed and the OSDs would have already rolled back. But I
>>>> didn't check that. Certainly doing something other than this automatic
>>>> roll forward would require a lot more bookkeeping that would make
>>>> everything else going on more difficult.
>>>> -Greg
>>>>
>>>>>
>>>>>
>>>>> 2017-09-30 2:27 GMT+08:00 Gregory Farnum <gfarnum@xxxxxxxxxx>:
>>>>>> On Fri, Sep 29, 2017 at 3:02 AM, Xinze Chi (信泽) <xmdxcxz@xxxxxxxxx> wrote:
>>>>>>> hi, all
>>>>>>>
>>>>>>>     I confuse the roll_forward logic in the PG::append_log. The
>>>>>>> pg_log.roll_forward func may roll forward the all inflight
>>>>>>> transactions which maybe not be completed by all shards.
>>>>>>>
>>>>>>>     The comment also makes me wonder. so could anyone explain it in
>>>>>>> detail. thanks.
>>>>>>>
>>>>>>>
>>>>>>>   if (!transaction_applied) {
>>>>>>>      /* We must be a backfill peer, so it's ok if we apply
>>>>>>>       * out-of-turn since we won't be considered when
>>>>>>>       * determining a min possible last_update.
>>>>>>>       */
>>>>>>>     pg_log.roll_forward(&handler);
>>>>>>>   }
>>>>>>>
>>>>>>>     /* We don't want to leave the rollforward artifacts around
>>>>>>>      * here past last_backfill.  It's ok for the same reason as
>>>>>>>      * above */
>>>>>>>     if (transaction_applied &&
>>>>>>>        p->soid > info.last_backfill) {
>>>>>>>       pg_log.roll_forward(&handler);
>>>>>>>     }
>>>>>>
>>>>>> transaction_applied can only be false if we are being backfilled. If
>>>>>> we are being backfilled, we may not *have* the older data that we
>>>>>> would rollback to, and our peers don't rely on us having that data. So
>>>>>> there's no point in our trying to keep rollback data around, and
>>>>>> keeping it around would mean finding a way to clean it up later. Thus,
>>>>>> delete it now.
>>>>>> -Greg
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Xinze Chi
>>>
>>>
>>>
>>> --
>>> Regards,
>>> Xinze Chi
>
>
>
> --
> Regards,
> Xinze Chi
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux