I strongly recommend you begin by coming up with a design and posting it to this list (cc me) prior to doing any real development work. I'm not convinced we want to accept a PR introducing EC read-from-replica in the first place because it's not clear to me that the removal of a network hop justifies the substantial maintenance and testing overhead going forward even ignoring the implementation effort. That is probably a conversation you'd want to have before investing a lot of effort in this. Can you give us some insight into the workload and improvements you are looking for? -Sam On Wed, Feb 23, 2022 at 4:09 AM 郭 明 <guoracle@xxxxxxxxxxx> wrote: > > Hello everyone, > Thank you for your attention and valuable insights on this issue. > I will try to implement my ideas. If there are any follow-up questions, I will keep touch with you. > > > > > Sincerely > guoming > ________________________________ > 发件人: Sam Just <sjust@xxxxxxxxxx> > 发送时间: 2022年2月22日 19:19 > 收件人: Ilya Dryomov <idryomov@xxxxxxxxx> > 抄送: 郭 明 <guoracle@xxxxxxxxxxx>; dev <dev@xxxxxxx> > 主题: Re: need your help on a ceph pr > > It would be quite a large amount of work. I don't think it's worthwhile. > -Sam > > On Tue, Feb 22, 2022 at 11:07 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote: > > > > On Tue, Feb 22, 2022 at 1:19 PM 郭 明 <guoracle@xxxxxxxxxxx> wrote: > > > > > > Hi Ilya, > > > > > > Thanks for your reply and valuable insights. > > > > > > If the data in the EC pool is not modified after being written, and only supports reading, can the replica OSD support BALANCE_READS or LOCALIZE_READS by acting as the primary OSD (same as the primary OSD, reading multiple data chunks, just sharing the pressure of the primary OSD) ? > > > > In theory, yes -- something like that could be implemented but the > > "not modified after being written" part is a pretty big assumption > > and I don't know how complicated the implementation would be. > > > > Thanks, > > > > Ilya > > > > > > > > > > > All the best, > > > guoming > > > ________________________________ > > > 发件人: Ilya Dryomov <idryomov@xxxxxxxxx> > > > 发送时间: 2022年2月21日 17:21 > > > 收件人: 郭 明 <guoracle@xxxxxxxxxxx> > > > 抄送: dev <dev@xxxxxxx>; Samuel Just <sjust@xxxxxxxxxx> > > > 主题: Re: need your help on a ceph pr > > > > > > On Mon, Feb 21, 2022 at 12:12 PM 郭 明 <guoracle@xxxxxxxxxxx> wrote: > > > > > > > > Hello Dryomov, > > > > How is your life going? > > > > I have some questions about the follow pr: > > > > https://github.com/ceph/ceph/pull/35326 > > > > Objecter: don't attempt to read from non-primary on EC pools by idryomov · Pull Request #35326 · ceph/ceph > > > > With BALANCE_READS or LOCALIZE_READS set, the client will hang if the non-primary OSD is picked because the OSD will most likely drop the op (or start waiting for peering that won't actually happen... > > > > github.com > > > > > > > > > > > > 1). Could you tell me more details of the reason about "the OSD will most likely drop > > > > the op (or start waiting for peering that won't actually happen)" ? > > > > > > Hi guoming, > > > > > > In the EC pool case, all OSDs in the PG have different data. If > > > a read is directed at a non-primary OSD, it simply may not have the > > > data available. And if it turns out to have the required EC chunk, > > > servicing a read with that data would most likely be unsafe. > > > > > > > > > > > 2).Why only EC pool has this problem, and replicated pool does not have this problem? > > > > > > In the replicated pool case, all OSDs in the PG have the same data, so > > > theoretically any OSD has the ability to service any read. But even in > > > the replicated case there are safety issues. Prior to this [1] change > > > by Sam, BALANCE_READS and LOCALIZE_READS flags were unsafe for general > > > use and after it there are still a couple of cases when the OSD either > > > drops the op or returns EAGAIN, expecting the client to resend it to > > > the primary because otherwise wrong data could be returned to the > > > client. > > > > > > I'm adding Sam for more details. > > > > > > > > > > > 3).Is this solution unique? Can this problem be solved by modifying the OSD module? > > > > > > If you are asking whether the OSD can be modified to support > > > BALANCE_READS or LOCALIZE_READS on EC pools, the answer is no > > > because each OSD stores its own EC chunk. > > > > > > [1] https://github.com/ceph/ceph/pull/32381 > > > > > > Thanks, > > > > > > Ilya > > _______________________________________________ > > Dev mailing list -- dev@xxxxxxx > > To unsubscribe send an email to dev-leave@xxxxxxx > _______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx