I checked and this is what our current trim settings are: "osd_snap_trim_sleep": "0", "osd_pg_max_concurrent_snap_trims": "2", "osd_max_trimming_pgs": "2", "osd_preserve_trimmed_log": "false", "osd_pg_log_trim_min": "100", "osd_snap_trim_priority": "5", "osd_snap_trim_cost": "1048576", Its not clear to me how to tune these to minimize the impact on the cluster for large snapshot deletions. Can you give some insight here - how does changing something like "max_trimming_pgs" affect the OSD operations? I did watch your presentation, but the impact of changing these individual parameters is not clear. On Wed, Oct 11, 2017 at 2:23 PM, Wyllys Ingersoll <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote: > Probably not, I'll need to go look those up. > > On Wed, Oct 11, 2017 at 2:13 PM, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >> Have you adjusted any of the snapshot trimming tunables that were >> added in the later Jewel releases, and explicitly designed to throttle >> down trimming and prevent these issues? They're discussed pretty >> extensively in past threads on the list and in my presentation at the >> latest OpenStack Boston Ceph Day. >> -Greg >> >> On Tue, Oct 10, 2017 at 5:46 AM, Wyllys Ingersoll >> <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote: >>> The "rmdir" command takes seconds. >>> >>> However, the resulting storm of activity on the cluster AFTER the >>> deletion is bringing our cluster down completely. The blocked >>> requests count goes into the thousands. The individual OSD processes >>> begin taking up all of the memory that they can grab which causes the >>> kernel to kill them off, which further throws the cluster into >>> disarray due to down/out OSDs. It takes multiple DAYS to completely >>> recover from deleting 1 snapshot and constant monitoring to make sure >>> OSDs come up and stay up after they get killed for eating too much >>> memory. This is a serious issue that we have been fighting with for >>> over a month now. The obvious solution is to destroy the cephfs >>> entirely, but that would mean we have to then recover about 40TB of >>> data, which could take a very long time and we'd prefer not to do >>> that. >>> >>> For example: >>> 2521055 ceph 20 0 16.908g 0.013t 29172 S 28.4 10.6 36:39.52 >>> ceph-osd >>> 2507582 ceph 20 0 22.919g 0.019t 42076 S 17.6 15.5 58:48.00 >>> ceph-osd >>> 2501393 ceph 20 0 22.024g 0.018t 39648 S 14.7 14.9 79:05.28 >>> ceph-osd >>> 2547090 ceph 20 0 21.316g 0.017t 26584 S 7.8 14.0 18:14.76 >>> ceph-osd >>> 2455703 ceph 20 0 20.872g 0.017t 19784 S 4.9 13.8 111:02.06 >>> ceph-osd >>> 246368 ceph 20 0 22.657g 0.018t 37416 S 3.9 14.5 462:31.79 >>> ceph-osd >>> >>> >>> >>> >>> On Tue, Oct 10, 2017 at 12:03 AM, Yan, Zheng <ukernel@xxxxxxxxx> wrote: >>>> On Tue, Oct 10, 2017 at 12:13 AM, Wyllys Ingersoll >>>> <wyllys.ingersoll@xxxxxxxxxxxxxx> wrote: >>>>> We have a cluster (10.2.9 based) with a cephfs filesytem that has >>>>> 4800+ snapshots. We want to delete most of the very old ones to get it >>>>> to a more manageable number (such as 0). However, deleting even 1 >>>>> snapshot right now takes up to a full 24 hours due to their age and >>>>> size. It would literally take 13 years to delete all of them at the >>>>> current pace. >>>>> >>>>> Here is one snapshot directory statistics: >>>>> >>>>> # file: cephfs/.snap/snapshot.2017-02-24_22_17_01-1487992621 >>>>> ceph.dir.entries="3" >>>>> ceph.dir.files="0" >>>>> ceph.dir.rbytes="30500769204664" >>>>> ceph.dir.rctime="1504695439.09966088000" >>>>> ceph.dir.rentries="7802785" >>>>> ceph.dir.rfiles="7758691" >>>>> ceph.dir.rsubdirs="44094" >>>>> ceph.dir.subdirs="3" >>>>> >>>>> There is a bug filed with details here: http://tracker.ceph.com/issues/21412 >>>>> >>>>> Im wondering if there is a faster, undocumented, "backdoor" way to >>>>> clean up our snapshot mess without destroying the entire filesystem >>>>> and recreating it. >>>> >>>> deleting snapshot in cephfs is a simple operation, it should complete >>>> in seconds. something must go wrong If 'rmdir .snap/xxx' tooks hours. >>>> please set debug_mds to 10, retry deleting a snapshot and send us the >>>> log. (it's better to stop all other fs activities while deleting >>>> snapshot) >>>> >>>> Regards >>>> Yan, Zheng >>>> >>>>> >>>>> -Wyllys Ingersoll >>>>> Keeper Technology, LLC >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>> More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html