Re: deleting snapshots in batches?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Have you adjusted any of the snapshot trimming tunables that were
added in the later Jewel releases, and explicitly designed to throttle
down trimming and prevent these issues? They're discussed pretty
extensively in past threads on the list and in my presentation at the
latest OpenStack Boston Ceph Day.
-Greg

On Tue, Oct 10, 2017 at 5:46 AM, Wyllys Ingersoll
<wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
> The "rmdir" command takes seconds.
>
> However, the resulting storm of activity on the cluster AFTER the
> deletion is bringing our cluster down completely.   The blocked
> requests count goes into the thousands.  The individual OSD processes
> begin taking up all of the memory that they can grab which causes the
> kernel to kill them off, which further throws the cluster into
> disarray due to down/out OSDs.  It takes multiple DAYS to completely
> recover from deleting 1 snapshot and constant monitoring to make sure
> OSDs come up and stay up after they get killed for eating too much
> memory.  This is a serious issue that we have been fighting with for
> over a month now.  The obvious solution is to destroy the cephfs
> entirely, but that would mean we have to then recover about 40TB of
> data, which could take a very long time and we'd prefer not to do
> that.
>
> For example:
> 2521055 ceph      20   0 16.908g 0.013t  29172 S  28.4 10.6  36:39.52
> ceph-osd
> 2507582 ceph      20   0 22.919g 0.019t  42076 S  17.6 15.5  58:48.00
> ceph-osd
> 2501393 ceph      20   0 22.024g 0.018t  39648 S  14.7 14.9  79:05.28
> ceph-osd
> 2547090 ceph      20   0 21.316g 0.017t  26584 S   7.8 14.0  18:14.76
> ceph-osd
> 2455703 ceph      20   0 20.872g 0.017t  19784 S   4.9 13.8 111:02.06
> ceph-osd
>  246368 ceph      20   0 22.657g 0.018t  37416 S   3.9 14.5 462:31.79
> ceph-osd
>
>
>
>
> On Tue, Oct 10, 2017 at 12:03 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote:
>> On Tue, Oct 10, 2017 at 12:13 AM, Wyllys Ingersoll
>> <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote:
>>> We have a cluster (10.2.9 based) with a cephfs filesytem that has
>>> 4800+ snapshots. We want to delete most of the very old ones to get it
>>> to a more manageable number (such as 0).  However, deleting even 1
>>> snapshot right now takes up to a full 24 hours due to their age and
>>> size. It would literally take 13 years to delete all of them at the
>>> current pace.
>>>
>>> Here is one snapshot directory statistics:
>>>
>>> # file: cephfs/.snap/snapshot.2017-02-24_22_17_01-1487992621
>>> ceph.dir.entries="3"
>>> ceph.dir.files="0"
>>> ceph.dir.rbytes="30500769204664"
>>> ceph.dir.rctime="1504695439.09966088000"
>>> ceph.dir.rentries="7802785"
>>> ceph.dir.rfiles="7758691"
>>> ceph.dir.rsubdirs="44094"
>>> ceph.dir.subdirs="3"
>>>
>>> There is a bug filed with details here: http://tracker.ceph.com/issues/21412
>>>
>>> Im wondering if there is a faster, undocumented, "backdoor" way to
>>> clean up our snapshot mess without destroying the entire filesystem
>>> and recreating it.
>>
>> deleting snapshot in cephfs is a simple operation, it should complete
>> in seconds. something must  go wrong If 'rmdir .snap/xxx' tooks hours.
>> please set debug_mds to 10, retry deleting a snapshot and send us the
>> log. (it's better to stop all other fs activities while deleting
>> snapshot)
>>
>> Regards
>> Yan, Zheng
>>
>>>
>>> -Wyllys Ingersoll
>>>  Keeper Technology, LLC
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux