Re: deleting snapshots in batches?

Wyllys Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> · Wed, 11 Oct 2017 14:46:34 -0400

I checked and this is what our current trim settings are:

    "osd_snap_trim_sleep": "0",
    "osd_pg_max_concurrent_snap_trims": "2",
    "osd_max_trimming_pgs": "2",
    "osd_preserve_trimmed_log": "false",
    "osd_pg_log_trim_min": "100",
    "osd_snap_trim_priority": "5",
    "osd_snap_trim_cost": "1048576",

Its not clear to me how to tune these to minimize the impact on the
cluster for large snapshot deletions. Can you give some insight here -
how does changing something like "max_trimming_pgs" affect the OSD
operations?
I did watch your presentation, but the impact of changing these
individual parameters is not clear.

On Wed, Oct 11, 2017 at 2:23 PM, Wyllys Ingersoll
<wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
> Probably not, I'll need to go look those up.
>
> On Wed, Oct 11, 2017 at 2:13 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>> Have you adjusted any of the snapshot trimming tunables that were
>> added in the later Jewel releases, and explicitly designed to throttle
>> down trimming and prevent these issues? They're discussed pretty
>> extensively in past threads on the list and in my presentation at the
>> latest OpenStack Boston Ceph Day.
>> -Greg
>>
>> On Tue, Oct 10, 2017 at 5:46 AM, Wyllys Ingersoll
>> <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>>> The "rmdir" command takes seconds.
>>>
>>> However, the resulting storm of activity on the cluster AFTER the
>>> deletion is bringing our cluster down completely.   The blocked
>>> requests count goes into the thousands.  The individual OSD processes
>>> begin taking up all of the memory that they can grab which causes the
>>> kernel to kill them off, which further throws the cluster into
>>> disarray due to down/out OSDs.  It takes multiple DAYS to completely
>>> recover from deleting 1 snapshot and constant monitoring to make sure
>>> OSDs come up and stay up after they get killed for eating too much
>>> memory.  This is a serious issue that we have been fighting with for
>>> over a month now.  The obvious solution is to destroy the cephfs
>>> entirely, but that would mean we have to then recover about 40TB of
>>> data, which could take a very long time and we'd prefer not to do
>>> that.
>>>
>>> For example:
>>> 2521055 ceph      20   0 16.908g 0.013t  29172 S  28.4 10.6  36:39.52
>>> ceph-osd
>>> 2507582 ceph      20   0 22.919g 0.019t  42076 S  17.6 15.5  58:48.00
>>> ceph-osd
>>> 2501393 ceph      20   0 22.024g 0.018t  39648 S  14.7 14.9  79:05.28
>>> ceph-osd
>>> 2547090 ceph      20   0 21.316g 0.017t  26584 S   7.8 14.0  18:14.76
>>> ceph-osd
>>> 2455703 ceph      20   0 20.872g 0.017t  19784 S   4.9 13.8 111:02.06
>>> ceph-osd
>>>  246368 ceph      20   0 22.657g 0.018t  37416 S   3.9 14.5 462:31.79
>>> ceph-osd
>>>
>>>
>>>
>>>
>>> On Tue, Oct 10, 2017 at 12:03 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>>> On Tue, Oct 10, 2017 at 12:13 AM, Wyllys Ingersoll
>>>> <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>>>>> We have a cluster (10.2.9 based) with a cephfs filesytem that has
>>>>> 4800+ snapshots. We want to delete most of the very old ones to get it
>>>>> to a more manageable number (such as 0).  However, deleting even 1
>>>>> snapshot right now takes up to a full 24 hours due to their age and
>>>>> size. It would literally take 13 years to delete all of them at the
>>>>> current pace.
>>>>>
>>>>> Here is one snapshot directory statistics:
>>>>>
>>>>> # file: cephfs/.snap/snapshot.2017-02-24_22_17_01-1487992621
>>>>> ceph.dir.entries="3"
>>>>> ceph.dir.files="0"
>>>>> ceph.dir.rbytes="30500769204664"
>>>>> ceph.dir.rctime="1504695439.09966088000"
>>>>> ceph.dir.rentries="7802785"
>>>>> ceph.dir.rfiles="7758691"
>>>>> ceph.dir.rsubdirs="44094"
>>>>> ceph.dir.subdirs="3"
>>>>>
>>>>> There is a bug filed with details here: http://tracker.ceph.com/issues/21412
>>>>>
>>>>> Im wondering if there is a faster, undocumented, "backdoor" way to
>>>>> clean up our snapshot mess without destroying the entire filesystem
>>>>> and recreating it.
>>>>
>>>> deleting snapshot in cephfs is a simple operation, it should complete
>>>> in seconds. something must  go wrong If 'rmdir .snap/xxx' tooks hours.
>>>> please set debug_mds to 10, retry deleting a snapshot and send us the
>>>> log. (it's better to stop all other fs activities while deleting
>>>> snapshot)
>>>>
>>>> Regards
>>>> Yan, Zheng
>>>>
>>>>>
>>>>> -Wyllys Ingersoll
>>>>>  Keeper Technology, LLC
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html