Re: Increase the recovery throughput

E Taka <0etaka0@xxxxxxxxx> · Thu, 29 Dec 2022 11:30:59 +0100

Ceph 17.2.5, dockerized, Ubuntu 20.04, OSD on HDD with WAL/DB on SSD.

Hi all,

old topic, but the problem still exists. I tested it extensively,
with osd_op_queue set either to mclock_scheduler (and profile set to high
recovery) or wpq and the well known options (sleep_time, max_backfill) from
https://docs.ceph.com/en/quincy/rados/configuration/osd-config-ref/

When removing an OSD with `ceph orch osd rm X` the backfilling always ends
with a large number of misplaced objects at a low recovery rate (right now
"120979/336643536 objects misplaced (0.036%); 10 KiB/s, 2 objects/s
recovering"). The rate drops significantly when there are very few PGs
involved.I wonder if someone a similar installation as we have (see above)
doesn't experience this problem.

Thanks, Erich

Am Mo., 12. Dez. 2022 um 12:28 Uhr schrieb Frank Schilder <frans@xxxxxx>:

> Hi Monish,
>
> you are probably on mclock scheduler, which ignores these settings. You
> might want to set them back to defaults, change the scheduler to wpq and
> then try again if it needs adjusting. there were several threads about
> "broken" recovery ops scheduling with mclock in the latest versions.
>
> So, back to Eugen's answer: go through this list and try solutions of
> earlier cases.
>
> Best regards,
> =================
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> ________________________________________
> From: Monish Selvaraj <monish@xxxxxxxxxxxxxxx>
> Sent: 12 December 2022 11:32:26
> To: Eugen Block
> Cc: ceph-users@xxxxxxx
> Subject:  Re: Increase the recovery throughput
>
> Hi Eugen,
>
> We tried that already. the osd_max_backfills is in 24 and the
> osd_recovery_max_active is in 20.
>
> On Mon, Dec 12, 2022 at 3:47 PM Eugen Block <eblock@xxxxxx> wrote:
>
> > Hi,
> >
> > there are many threads dicussing recovery throughput, have you tried
> > any of the solutions? First thing to try is to increase
> > osd_recovery_max_active and osd_max_backfills. What are the current
> > values in your cluster?
> >
> >
> > Zitat von Monish Selvaraj <monish@xxxxxxxxxxxxxxx>:
> >
> > > Hi,
> > >
> > > Our ceph cluster consists of 20 hosts and 240 osds.
> > >
> > > We used the erasure-coded pool with cache-pool concept.
> > >
> > > Some time back 2 hosts went down and the pg are in a degraded state. We
> > got
> > > the 2 hosts back up in some time. After the pg is started recovering
> but
> > it
> > > takes a long time ( months )  . While this was happening we had the
> > cluster
> > > with 664.4 M objects and 987 TB data. The recovery status is not
> changed;
> > > it remains 88 pgs degraded.
> > >
> > > During this period, we increase the pg size from 256 to 512 for the
> > > data-pool ( erasure-coded pool ).
> > >
> > > We also observed (one week ) the recovery to be very slow, the current
> > > recovery around 750 Mibs.
> > >
> > > Is there any way to increase this recovery throughput ?
> > >
> > > *Ceph-version : quincy*
> > >
> > > [image: image.png]
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx