Re: PG: all requests stuck when acting set < min_size

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It's actually worse for an ec pool.  EC pools need to be able to roll
back divergent entries in order to keep the durability requirements
implied by min_size.  If you have 10 data blocks and 4 parity blocks
with min_size set to 12, then one would expect that as long as one can
recover any 12 shards, you can always recover the pg regardless of
intermediate pg degradation states.  However, if we accept a read with
only 10 replicas, the most recent log entries cannot ever be
considered divergent and we might become unable to recover without
exactly those 10 osds if the other 4 happened to not have committed
those updates.
-Sam

On Tue, Oct 27, 2015 at 5:19 PM, Brad Hubbard <bhubbard@xxxxxxxxxx> wrote:
> ----- Original Message -----
>> From: "Samuel Just" <sjust@xxxxxxxxxx>
>> To: "Gregory Farnum" <gfarnum@xxxxxxxxxx>
>> Cc: "GuangYang" <yguang11@xxxxxxxxxxx>, ceph-devel@xxxxxxxxxxxxxxx
>> Sent: Wednesday, 28 October, 2015 7:05:42 AM
>> Subject: Re: PG: all requests stuck when acting set < min_size
>>
>> Actually, we really can't accept reads below min_size and still keep
>> the properties we want it to have.  Suppose we have 3 osds (a, b, and
>> c) which see writes 0...1000.  min_size is 2.  If a and b are then
>> powered off only having committed up to 900 (therefore the client
>> could only have seen up to 900 commit), then c would be able to serve
>> reads based on updates up to 1000 with a and b stopped (no way to know
>> a and b only committed to 900).  If c then stops and a and b are
>> restarted, they would begin serving reads and writes only based on
>> commits up to 900 even though we would have exposed the writes up to
>> 1000 to the client.
>
> If a and/or b then accept a write you have a recipe for split-brain and no one
> wants to see that in Ceph.
>
> Cheers,
> Brad
>
>> -Sam
>>
>> On Tue, Oct 27, 2015 at 12:47 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>> > On Tue, Oct 27, 2015 at 11:47 AM, GuangYang <yguang11@xxxxxxxxxxx> wrote:
>> >> Hi there,
>> >> Is there any reason we stuck read only requests as well for a PG when the
>> >> acting set size is less than min_size?
>> >
>> > A few.
>> > The most important reason: PGs don't have any concept of a read-only
>> > mode in the code. They are "active" or not, and an active PG handles
>> > writes. (The full flags and other things which block writes but allow
>> > reads are at the OSD level, not the PG level, and are handled when ops
>> > come in before they reach the PG.) Allowing read requests against a PG
>> > to complete even when we aren't taking writes on a per-PG level would
>> > take some doing.
>> > Also: it would be weird from several different levels. We'd need to
>> > keep track of client streams because we wouldn't want to let through a
>> > read that is ordered after a write. How would we handle the memory
>> > pressure implied by that? While I can imagine it being useful for some
>> > stuff like RGW reads, in general making data available for read but
>> > not write is a pretty complicated thing to explain to users — how do
>> > we expose that in a useful way?
>> > -Greg
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> > the body of a message to majordomo@xxxxxxxxxxxxxxx
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux