Re: need your help on a ceph pr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 21, 2022 at 12:12 PM 郭 明 <guoracle@xxxxxxxxxxx> wrote:
>
> Hello Dryomov,
> How is your life going?
> I have some questions about the follow pr:
>  https://github.com/ceph/ceph/pull/35326
> Objecter: don't attempt to read from non-primary on EC pools by idryomov · Pull Request #35326 · ceph/ceph
> With BALANCE_READS or LOCALIZE_READS set, the client will hang if the non-primary OSD is picked because the OSD will most likely drop the op (or start waiting for peering that won't actually happen...
> github.com
>
>
> 1). Could you tell me more details of the reason about "the OSD will most likely drop
> the op (or start waiting for peering that won't actually happen)" ?

Hi guoming,

In the EC pool case, all OSDs in the PG have different data.  If
a read is directed at a non-primary OSD, it simply may not have the
data available.  And if it turns out to have the required EC chunk,
servicing a read with that data would most likely be unsafe.

>
> 2).Why only EC pool has this problem, and replicated pool does not have this problem?

In the replicated pool case, all OSDs in the PG have the same data, so
theoretically any OSD has the ability to service any read.  But even in
the replicated case there are safety issues.  Prior to this [1] change
by Sam, BALANCE_READS and LOCALIZE_READS flags were unsafe for general
use and after it there are still a couple of cases when the OSD either
drops the op or returns EAGAIN, expecting the client to resend it to
the primary because otherwise wrong data could be returned to the
client.

I'm adding Sam for more details.

>
> 3).Is this solution unique? Can this problem be solved by modifying the OSD module?

If you are asking whether the OSD can be modified to support
BALANCE_READS or LOCALIZE_READS on EC pools, the answer is no
because each OSD stores its own EC chunk.

[1] https://github.com/ceph/ceph/pull/32381

Thanks,

                Ilya
_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux