On Mon, Feb 21, 2022 at 12:12 PM 郭 明 <guoracle@xxxxxxxxxxx> wrote: > > Hello Dryomov, > How is your life going? > I have some questions about the follow pr: > https://github.com/ceph/ceph/pull/35326 > Objecter: don't attempt to read from non-primary on EC pools by idryomov · Pull Request #35326 · ceph/ceph > With BALANCE_READS or LOCALIZE_READS set, the client will hang if the non-primary OSD is picked because the OSD will most likely drop the op (or start waiting for peering that won't actually happen... > github.com > > > 1). Could you tell me more details of the reason about "the OSD will most likely drop > the op (or start waiting for peering that won't actually happen)" ? Hi guoming, In the EC pool case, all OSDs in the PG have different data. If a read is directed at a non-primary OSD, it simply may not have the data available. And if it turns out to have the required EC chunk, servicing a read with that data would most likely be unsafe. > > 2).Why only EC pool has this problem, and replicated pool does not have this problem? In the replicated pool case, all OSDs in the PG have the same data, so theoretically any OSD has the ability to service any read. But even in the replicated case there are safety issues. Prior to this [1] change by Sam, BALANCE_READS and LOCALIZE_READS flags were unsafe for general use and after it there are still a couple of cases when the OSD either drops the op or returns EAGAIN, expecting the client to resend it to the primary because otherwise wrong data could be returned to the client. I'm adding Sam for more details. > > 3).Is this solution unique? Can this problem be solved by modifying the OSD module? If you are asking whether the OSD can be modified to support BALANCE_READS or LOCALIZE_READS on EC pools, the answer is no because each OSD stores its own EC chunk. [1] https://github.com/ceph/ceph/pull/32381 Thanks, Ilya _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx