Re: PG_SLOW_SNAP_TRIMMING and possible storage leakage on 16.2.5

David Prude <david@xxxxxxxxxxxxxxxx> · Mon, 31 Jan 2022 10:23:11 -0500

Hello,

  As an update: we were able to clear the queue by repeering all PGs
which had outstanding entries in their snaptrim queues. After this
process completed and we confirmed that no PGs remained with non-zero
length queues, we re-enabled our snapshot schedule. Several days have
now passed and we again see 109 PGs with a cumulative snaptrim queue
length of 33,806. It does not appear as if the system is automatically
processing these items. I have added our experience to the outstanding
bug Dan mentioned however I was hoping others here might have some
further thoughts:

1. Is there a reason these PGs queues seem not to be automatically
processing items?

2. We can script periodically checking for PGs outstanding items in
their snaptrim queues and  programmatically repeering them to ensure the
trimming process proceeds, but is there any danger in this approach?

Thank you,

-David

On 1/24/22 1:29 PM, Dan van der Ster wrote:
> Hi,
>
> Yes, restarting an OSD also works to re-peer and "kick" the
> snaptrimming process.
> (In the ticket we first noticed this because snap trimming restarted
> after an unrelated OSD crashed/restarted).
> Please feel free to add your experience to that ticket.
>
>> monitoring snaptrimq
> This is from our local monitoring probes, based on `ceph pg dump -f json`.
>
> -- Dan
>
>
> -- dan
>
> On Mon, Jan 24, 2022 at 6:31 PM David Prude <david@xxxxxxxxxxxxxxxx> wrote:
>> Dan,
>>
>>   Thank you for replying. Since I posted I did some more digging. It
>> really seemed as if snaptrim simply wasn't being processed. The output
>> of "ceph health detail" showed that PG 3.9b had the longest queue. I
>> examined this PG and saw that it's primary was osd.8 so I manually
>> restarted that daemon. This seems to have kicked off snaptrim on some PGs:
>>
>> ----SNIP----
>> 1513 pgs: 1 active+clean+scrubbing, 1 active+clean+scrubbing+snaptrim,
>> 44 active+clean+snaptrim, 1 active+clean+scrubbing+deep+snaptrim_wait,
>> 1406 active+clean, 2 active+clean+scrubbing+deep, 58
>> active+clean+snaptrim_wait; 114 TiB data, 344 TiB used, 93 TiB / 437 TiB
>> avail; 2.0 KiB/s rd, 64 KiB/s wr, 5 op/s
>> ----SNIP----
>>
>> I can see the "snaptrimq_len* value decreasing for that PG now. I will
>> look into the issue you posted as well as repeering the PGs. Does an osd
>> restart resulting in snaptrim proceeding seem consistent with the
>> behavior you saw?
>>
>> I notice in the bug report you linked, that you are somehow monitoring
>> snaptrimq with grafana. Is this a global value that is readily avilable
>> for monitoring or are you calculating this somehow. If there is an easy
>> way to access it, I would greatly appreciate instructions.
>>
>> Thank you,
>>
>> -David
>>
>> On 1/24/22 11:53 AM, Dan van der Ster wrote:
>>> Hi David,
>>>
>>> We observed the same here: https://tracker.ceph.com/issues/52026
>>> You can poke the trimming by repeering the PGs.
>>>
>>> Also, depending on your hardware, the defaults for osd_snap_trim_sleep
>>> might be far too conservative.
>>> We use osd_snap_trim_sleep = 0.1 on our mixed hdd block / ssd block.db OSDs.
>>>
>>> Cheers, Dan
>>>
>>> On Mon, Jan 24, 2022 at 4:54 PM David Prude <david@xxxxxxxxxxxxxxxx> wrote:
>>>> Hello,
>>>>
>>>>    We have a 5-node, 30 hdd (6 hdds/node) cluster running 16.2.5. We
>>>> utilize a snapshot scheme within cephfs that results in 24 hourly
>>>> snapshots, 7 daily snapshots, and 2 weekly snapshots. This has been
>>>> running without overt issues for several months. As of this weekend, we
>>>> started receiving a  PG_SLOW_SNAP_TRIMMING warning on a single PG. Over
>>>> the last 24 hours we are now seeing that this warning is associated with
>>>> 123 of our 1513 PGs. As recommended by the output of "ceph health
>>>> detail" we have tried tuning the following from their default values:
>>>>
>>>> osd_pg_max_concurrent_snap_trims=4 (default 2)
>>>> osd_snap_trim_sleep_hdd=3 (default 5)
>>>> osd_snap_trim_sleep=0.5 (default 0, it was suggested somewhere in a
>>>> search that 0 actually disables trim?)
>>>>
>>>> I am uncertain how to best measure if the above is having an effect on
>>>> the trimming process. I am unclear on how to clearly monitor the
>>>> progress of the snaptrim process or even of the total queue depth.
>>>> Interestingly, "ceph pg stat" does not show any PGs in the snaptrim state:
>>>>
>>>> ----SNIP----
>>>> 1513 pgs: 2 active+clean+scrubbing+deep, 1511 active+clean; 114 TiB
>>>> data, 344 TiB used, 93 TiB / 437 TiB avail; 6.2 KiB/s rd, 2.2 MiB/s wr,
>>>> 118 op/s
>>>>
>>>> ----SNIP----
>>>>
>>>> We have, for the time being, disabled our snapshots in the hopes that
>>>> the cluster will catch up with the trimming process. Two potential
>>>> things of note:
>>>>
>>>> 1. We are unaware of any particular action which would be associated
>>>> with this happening now (there were no unusual deletions of either live
>>>> data or snapshots).
>>>> 2. For the past month or two it has appeared as if there has been a
>>>> steady unchecked growth in storage utilization as if snapshots have not
>>>> been actually being trimmed.
>>>>
>>>> Any assistance in determining what exactly has prompted this behavior or
>>>> any guidance on how to evaluate the total snaptrim queue size to see if
>>>> we are making progress would be much appreciated.
>>>>
>>>> Thank you,
>>>>
>>>> -David
>>>>
>>>> --
>>>> David Prude
>>>> Systems Administrator
>>>> PGP Fingerprint: 1DAA 4418 7F7F B8AA F50C  6FDF C294 B58F A286 F847
>>>> Democracy Now!
>>>> www.democracynow.org
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> --
>> David Prude
>> Systems Administrator
>> PGP Fingerprint: 1DAA 4418 7F7F B8AA F50C  6FDF C294 B58F A286 F847
>> Democracy Now!
>> www.democracynow.org
>>
>>
-- 
David Prude
Systems Administrator
PGP Fingerprint: 1DAA 4418 7F7F B8AA F50C  6FDF C294 B58F A286 F847
Democracy Now!
www.democracynow.org

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx