Re: Slow ops during recovery for RGW index pool only when degraded OSD is primary

"Anthony D'Atri" <anthony.datri@xxxxxxxxx> · Wed, 3 Apr 2024 16:38:49 -0400

Thanks.  I'll PR up some doc updates reflecting this and run them by the RGW / RADOS folks.

> On Apr 3, 2024, at 16:34, Joshua Baergen <jbaergen@xxxxxxxxxxxxxxxx> wrote:
> 
> Hey Anthony,
> 
> Like with many other options in Ceph, I think what's missing is the
> user-visible effect of what's being altered. I believe the reason why
> synchronous recovery is still used is that, assuming that per-object
> recovery is quick, it's faster to complete than asynchronous recovery,
> which has extra steps on either end of the recovery process. Of
> course, as you know, synchronous recovery blocks I/O, so when
> per-object recovery isn't quick, as in RGW index omap shards,
> particularly large shards, IMO we're better off always doing async
> recovery.
> 
> I don't know enough about the overheads involved here to evaluate
> whether it's worth keeping synchronous recovery at all, but IMO RGW
> index/usage(/log/gc?) pools are always better off using asynchronous
> recovery.
> 
> Josh
> 
> On Wed, Apr 3, 2024 at 1:48 PM Anthony D'Atri <anthony.datri@xxxxxxxxx> wrote:
>> 
>> We currently have in  src/common/options/global.yaml.in
>> 
>> - name: osd_async_recovery_min_cost
>>  type: uint
>>  level: advanced
>>  desc: A mixture measure of number of current log entries difference and historical
>>    missing objects,  above which we switch to use asynchronous recovery when appropriate
>>  default: 100
>>  flags:
>>  - runtime
>> 
>> I'd like to rephrase the description there in a PR, might you be able to share your insight into the dynamics so I can craft a better description?  And do you have any thoughts on the default value?  Might appropriate values vary by pool type and/or media?
>> 
>> 
>> 
>>> On Apr 3, 2024, at 13:38, Joshua Baergen <jbaergen@xxxxxxxxxxxxxxxx> wrote:
>>> 
>>> We've had success using osd_async_recovery_min_cost=0 to drastically
>>> reduce slow ops during index recovery.
>>> 
>>> Josh
>>> 
>>> On Wed, Apr 3, 2024 at 11:29 AM Wesley Dillingham <wes@xxxxxxxxxxxxxxxxx> wrote:
>>>> 
>>>> I am fighting an issue on an 18.2.0 cluster where a restart of an OSD which
>>>> supports the RGW index pool causes crippling slow ops. If the OSD is marked
>>>> with primary-affinity of 0 prior to the OSD restart no slow ops are
>>>> observed. If the OSD has a primary affinity of 1 slow ops occur. The slow
>>>> ops only occur during the recovery period of the OMAP data and further only
>>>> occur when client activity is allowed to pass to the cluster. Luckily I am
>>>> able to test this during periods when I can disable all client activity at
>>>> the upstream proxy.
>>>> 
>>>> Given the behavior of the primary affinity changes preventing the slow ops
>>>> I think this may be a case of recovery being more detrimental than
>>>> backfill. I am thinking that causing an pg_temp acting set by forcing
>>>> backfill may be the right method to mitigate the issue. [1]
>>>> 
>>>> I believe that reducing the PG log entries for these OSDs would accomplish
>>>> that but I am also thinking a tuning of osd_async_recovery_min_cost [2] may
>>>> also accomplish something similar. Not sure the appropriate tuning for that
>>>> config at this point or if there may be a better approach. Seeking any
>>>> input here.
>>>> 
>>>> Further if this issue sounds familiar or sounds like another condition
>>>> within the OSD may be at hand I would be interested in hearing your input
>>>> or thoughts. Thanks!
>>>> 
>>>> [1] https://docs.ceph.com/en/latest/dev/peering/#concepts
>>>> [2]
>>>> https://docs.ceph.com/en/latest/rados/configuration/osd-config-ref/#confval-osd_async_recovery_min_cost
>>>> 
>>>> Respectfully,
>>>> 
>>>> *Wes Dillingham*
>>>> LinkedIn <http://www.linkedin.com/in/wesleydillingham>
>>>> wes@xxxxxxxxxxxxxxxxx
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx