Re: deleting snapshots in batches?

Wyllys Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> · Wed, 11 Oct 2017 14:23:08 -0400

Probably not, I'll need to go look those up.

On Wed, Oct 11, 2017 at 2:13 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
> Have you adjusted any of the snapshot trimming tunables that were
> added in the later Jewel releases, and explicitly designed to throttle
> down trimming and prevent these issues? They're discussed pretty
> extensively in past threads on the list and in my presentation at the
> latest OpenStack Boston Ceph Day.
> -Greg
>
> On Tue, Oct 10, 2017 at 5:46 AM, Wyllys Ingersoll
> <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>> The "rmdir" command takes seconds.
>>
>> However, the resulting storm of activity on the cluster AFTER the
>> deletion is bringing our cluster down completely.   The blocked
>> requests count goes into the thousands.  The individual OSD processes
>> begin taking up all of the memory that they can grab which causes the
>> kernel to kill them off, which further throws the cluster into
>> disarray due to down/out OSDs.  It takes multiple DAYS to completely
>> recover from deleting 1 snapshot and constant monitoring to make sure
>> OSDs come up and stay up after they get killed for eating too much
>> memory.  This is a serious issue that we have been fighting with for
>> over a month now.  The obvious solution is to destroy the cephfs
>> entirely, but that would mean we have to then recover about 40TB of
>> data, which could take a very long time and we'd prefer not to do
>> that.
>>
>> For example:
>> 2521055 ceph      20   0 16.908g 0.013t  29172 S  28.4 10.6  36:39.52
>> ceph-osd
>> 2507582 ceph      20   0 22.919g 0.019t  42076 S  17.6 15.5  58:48.00
>> ceph-osd
>> 2501393 ceph      20   0 22.024g 0.018t  39648 S  14.7 14.9  79:05.28
>> ceph-osd
>> 2547090 ceph      20   0 21.316g 0.017t  26584 S   7.8 14.0  18:14.76
>> ceph-osd
>> 2455703 ceph      20   0 20.872g 0.017t  19784 S   4.9 13.8 111:02.06
>> ceph-osd
>>  246368 ceph      20   0 22.657g 0.018t  37416 S   3.9 14.5 462:31.79
>> ceph-osd
>>
>>
>>
>>
>> On Tue, Oct 10, 2017 at 12:03 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>>> On Tue, Oct 10, 2017 at 12:13 AM, Wyllys Ingersoll
>>> <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>>>> We have a cluster (10.2.9 based) with a cephfs filesytem that has
>>>> 4800+ snapshots. We want to delete most of the very old ones to get it
>>>> to a more manageable number (such as 0).  However, deleting even 1
>>>> snapshot right now takes up to a full 24 hours due to their age and
>>>> size. It would literally take 13 years to delete all of them at the
>>>> current pace.
>>>>
>>>> Here is one snapshot directory statistics:
>>>>
>>>> # file: cephfs/.snap/snapshot.2017-02-24_22_17_01-1487992621
>>>> ceph.dir.entries="3"
>>>> ceph.dir.files="0"
>>>> ceph.dir.rbytes="30500769204664"
>>>> ceph.dir.rctime="1504695439.09966088000"
>>>> ceph.dir.rentries="7802785"
>>>> ceph.dir.rfiles="7758691"
>>>> ceph.dir.rsubdirs="44094"
>>>> ceph.dir.subdirs="3"
>>>>
>>>> There is a bug filed with details here: http://tracker.ceph.com/issues/21412
>>>>
>>>> Im wondering if there is a faster, undocumented, "backdoor" way to
>>>> clean up our snapshot mess without destroying the entire filesystem
>>>> and recreating it.
>>>
>>> deleting snapshot in cephfs is a simple operation, it should complete
>>> in seconds. something must  go wrong If 'rmdir .snap/xxx' tooks hours.
>>> please set debug_mds to 10, retry deleting a snapshot and send us the
>>> log. (it's better to stop all other fs activities while deleting
>>> snapshot)
>>>
>>> Regards
>>> Yan, Zheng
>>>
>>>>
>>>> -Wyllys Ingersoll
>>>>  Keeper Technology, LLC
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> --
>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html