Re: Weighted Priority Queue testing

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have run into this same scenarios in terms of the long tail taking much longer on recovery than the initial.

Either time we are adding osd or an osd get taken down. At first we have max-backfill set to 1 so it doesn't kill the cluster with io. As time passes by the single osd is performing the backfill. So we are gradually increasing the max-backfill up to 10 to reduce the amount of time it needs to recover fully. I know there are a few other factors at play here but for us we tend to do this procedure every time.

On Wed, May 11, 2016 at 6:29 PM Christian Balzer <chibi@xxxxxxx> wrote:
On Wed, 11 May 2016 16:10:06 +0000 Somnath Roy wrote:

> I bumped up the backfill/recovery settings to match up Hammer. It is
> probably unlikely that long tail latency is a parallelism issue. If so,
> entire recovery would be suffering not the tail alone. It's probably a
> prioritization issue. Will start looking and update my findings. I can't
> add devl because of the table but needed to add community that's why
> ceph-users :-).. Also, wanted to know from Ceph's user if they are also
> facing similar issues..
>

What I meant with lack of parallelism is that at the start of a rebuild,
there are likely to be many candidate PGs for recovery and backfilling, so
many things happen at the same time, up to the limits of what is
configured (max backfill etc).

>From looking at my test cluster, it starts with 8-10 backfills and
recoveries (out of 140 affected PGs), but later on in the game there are
less and less PGs (and OSDs/nodes) to choose from, so things slow down
around 60 PGs to just 3-4 backfills.
And around 20 PGs it's down to 1-2 backfills, so the parallelism is
clearly gone at that point and recovery speed is down to what a single
PG/OSD can handle.

Christian

> Thanks & Regards
> Somnath
>
> -----Original Message-----
> From: Christian Balzer [mailto:chibi@xxxxxxx]
> Sent: Wednesday, May 11, 2016 12:31 AM
> To: Somnath Roy
> Cc: Mark Nelson; Nick Fisk; ceph-users@xxxxxxxxxxxxxx
> Subject: Re: Weighted Priority Queue testing
>
>
>
> Hello,
>
> not sure if the Cc: to the users ML was intentional or not, but either
> way.
>
> The issue seen in the tracker:
> http://tracker.ceph.com/issues/15763
> and what you have seen (and I as well) feels a lot like the lack of
> parallelism towards the end of rebuilds.
>
> This becomes even more obvious when backfills and recovery settings are
> lowered.
>
> Regards,
>
> Christian
> --
> Christian Balzer        Network/Systems Engineer
> chibi@xxxxxxx   Global OnLine Japan/Rakuten Communications
> http://www.gol.com/
> PLEASE NOTE: The information contained in this electronic mail message
> is intended only for the use of the designated recipient(s) named above.
> If the reader of this message is not the intended recipient, you are
> hereby notified that you have received this message in error and that
> any review, dissemination, distribution, or copying of this message is
> strictly prohibited. If you have received this communication in error,
> please notify the sender by telephone or e-mail (as shown above)
> immediately and destroy any and all copies of this message in your
> possession (whether hard copies or electronically stored copies).
>


--
Christian Balzer        Network/Systems Engineer
chibi@xxxxxxx           Global OnLine Japan/Rakuten Communications
http://www.gol.com/
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux