Re: Ceph reef and (slow) backfilling - how to speed it up

Sridhar Seshasayee <sseshasa@xxxxxxxxxx> · Thu, 2 May 2024 10:19:14 +0530

Hi Götz,

Please see my response below.

On Tue, Apr 30, 2024 at 7:39 PM Pierre Riteau <pierre@xxxxxxxxxxxx> wrote:

> Hi Götz,
>
> You can change the value of osd_max_backfills (for all OSDs or specific
> ones) using `ceph config`, but you need
> enable osd_mclock_override_recovery_settings. See
>
> https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/#steps-to-modify-mclock-max-backfills-recovery-limits
> for more information.
>
>
Did the suggestion from Pierre help improve the backfilling rate? With the
mClock scheduler, this is
the correct way of modifying the value of osd_max_backfills and
osd_recovery_max_active.

As to the observation of slower backfills, this is expected with the
'balanced' and 'high_client_ops'
mClock profiles (see allocations here
<https://docs.ceph.com/en/latest/rados/configuration/mclock-config-ref/#built-in-profiles>).
This is due to backfill operation being classified as
background best-effort service and a lower priority assigned to backfill
operations when
compared to degraded recoveries. Degraded recovery (or background recovery
service) is given
higher priority as there's a higher risk of data unavailability in case
other OSDs in the cluster go
down. Backfill operations are assigned lower priority since it just
involves data movement.

If the 'high_recovery_ops' profile coupled with increasing the above config
parameters is still
not enough to improve the backfilling rate, then the cluster must be
examined to see if there
are other competing services like degraded recoveries, client ops etc. that
could affect the
backfilling rate. The ceph status output should give an idea about this.

-Sridhar
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx