Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

Joshua Baergen <jbaergen@xxxxxxxxxxxxxxxx> · Wed, 3 Apr 2024 11:38:11 -0600

We've had success using osd_async_recovery_min_cost=0 to drastically
reduce slow ops during index recovery.

Josh

On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> wrote:
>
> I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
> supports the RGW index pool causes crippling slow ops. If the OSD is marked
> with primary-affinity of 0 prior to the OSD restart no slow ops are
> observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
> ops only occur during the recovery period of the OMAP data and further only
> occur when client activity is allowed to pass to the cluster. Luckily I am
> able to test this during periods when I can disable all client activity at
> the upstream proxy.
>
> Given the behavior of the primary affinity changes preventing the slow ops
> I think this may be a case of recovery being more detrimental than
> backfill. I am thinking that causing an pg_temp acting set by forcing
> backfill may be the right method to mitigate the issue. [1]
>
> I believe that reducing the PG log entries for these OSDs would accomplish
> that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
> also accomplish something similar. Not sure the appropriate tuning for that
> config at this point or if there may be a better approach. Seeking any
> input here.
>
> Further if this issue sounds familiar or sounds like another condition
> within the OSD may be at hand I would be interested in hearing your input
> or thoughts. Thanks!
>
> [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
> [2]
> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
>
> Respectfully,
>
> *Wes Dillingham*
> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
> wes@xxxxxxxxxxxxxxxxx
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx