Re: need your help on a ceph pr

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I strongly recommend you begin by coming up with a design and posting
it to this list (cc me) prior to doing any real development work.  I'm
not convinced we want to accept a PR introducing EC read-from-replica
in the first place because it's not clear to me that the removal of a
network hop justifies the substantial maintenance and testing overhead
going forward even ignoring the implementation effort.  That is
probably a conversation you'd want to have before investing a lot of
effort in this.  Can you give us some insight into the workload and
improvements you are looking for?
-Sam

On Wed, Feb 23, 2022 at 4:09 AM 郭 明 <guoracle@xxxxxxxxxxx> wrote:
>
> Hello everyone,
> Thank you for your attention and valuable insights on this issue.
> I will try to implement my ideas. If there are any follow-up questions, I will keep touch with you.
>
>
>
>
> Sincerely
>       guoming
> ________________________________
> 发件人: Sam Just <sjust@xxxxxxxxxx>
> 发送时间: 2022年2月22日 19:19
> 收件人: Ilya Dryomov <idryomov@xxxxxxxxx>
> 抄送: 郭 明 <guoracle@xxxxxxxxxxx>; dev <dev@xxxxxxx>
> 主题: Re: need your help on a ceph pr
>
> It would be quite a large amount of work.  I don't think it's worthwhile.
> -Sam
>
> On Tue, Feb 22, 2022 at 11:07 AM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
> >
> > On Tue, Feb 22, 2022 at 1:19 PM 郭 明 <guoracle@xxxxxxxxxxx> wrote:
> > >
> > > Hi Ilya,
> > >
> > > Thanks for your reply and valuable insights.
> > >
> > > If the data in the EC pool is not modified after being written, and only supports reading, can the replica OSD support BALANCE_READS or LOCALIZE_READS by acting as the primary OSD (same as the primary OSD, reading multiple data chunks, just sharing the pressure of the primary OSD) ?
> >
> > In theory, yes -- something like that could be implemented but the
> > "not modified after being written" part is a pretty big assumption
> > and I don't know how complicated the implementation would be.
> >
> > Thanks,
> >
> >                 Ilya
> >
> > >
> > >
> > > All the best,
> > >      guoming
> > > ________________________________
> > > 发件人: Ilya Dryomov <idryomov@xxxxxxxxx>
> > > 发送时间: 2022年2月21日 17:21
> > > 收件人: 郭 明 <guoracle@xxxxxxxxxxx>
> > > 抄送: dev <dev@xxxxxxx>; Samuel Just <sjust@xxxxxxxxxx>
> > > 主题: Re: need your help on a ceph pr
> > >
> > > On Mon, Feb 21, 2022 at 12:12 PM 郭 明 <guoracle@xxxxxxxxxxx> wrote:
> > > >
> > > > Hello Dryomov,
> > > > How is your life going?
> > > > I have some questions about the follow pr:
> > > >  https://github.com/ceph/ceph/pull/35326
> > > > Objecter: don't attempt to read from non-primary on EC pools by idryomov · Pull Request #35326 · ceph/ceph
> > > > With BALANCE_READS or LOCALIZE_READS set, the client will hang if the non-primary OSD is picked because the OSD will most likely drop the op (or start waiting for peering that won't actually happen...
> > > > github.com
> > > > 
> > > >
> > > > 1). Could you tell me more details of the reason about "the OSD will most likely drop
> > > > the op (or start waiting for peering that won't actually happen)" ?
> > >
> > > Hi guoming,
> > >
> > > In the EC pool case, all OSDs in the PG have different data.  If
> > > a read is directed at a non-primary OSD, it simply may not have the
> > > data available.  And if it turns out to have the required EC chunk,
> > > servicing a read with that data would most likely be unsafe.
> > >
> > > >
> > > > 2).Why only EC pool has this problem, and replicated pool does not have this problem?
> > >
> > > In the replicated pool case, all OSDs in the PG have the same data, so
> > > theoretically any OSD has the ability to service any read.  But even in
> > > the replicated case there are safety issues.  Prior to this [1] change
> > > by Sam, BALANCE_READS and LOCALIZE_READS flags were unsafe for general
> > > use and after it there are still a couple of cases when the OSD either
> > > drops the op or returns EAGAIN, expecting the client to resend it to
> > > the primary because otherwise wrong data could be returned to the
> > > client.
> > >
> > > I'm adding Sam for more details.
> > >
> > > >
> > > > 3).Is this solution unique? Can this problem be solved by modifying the OSD module?
> > >
> > > If you are asking whether the OSD can be modified to support
> > > BALANCE_READS or LOCALIZE_READS on EC pools, the answer is no
> > > because each OSD stores its own EC chunk.
> > >
> > > [1] https://github.com/ceph/ceph/pull/32381
> > >
> > > Thanks,
> > >
> > >                 Ilya
> > _______________________________________________
> > Dev mailing list -- dev@xxxxxxx
> > To unsubscribe send an email to dev-leave@xxxxxxx
>

_______________________________________________
Dev mailing list -- dev@xxxxxxx
To unsubscribe send an email to dev-leave@xxxxxxx




[Index of Archives]     [CEPH Users]     [Ceph Devel]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux