Hello, I have upgraded our Ceph cluster from Nautilus to Octopus (15.2.15) over the weekend. The upgrade went well as far as I can tell. Earlier today, noticing that our CephFS data pool was approaching capacity, I removed some old CephFS snapshots (taken weekly at the root of the filesystem), keeping only the most recent one (created today, 2022-02-21). As expected, a good fraction of the PGs transitioned from active+clean to active+clean+snaptrim or active+clean+snaptrim_wait. In previous occasions when I removed a snapshot it took a few days for snaptrimming to complete. This would happen without noticeably impacting other workloads, and would also free up an appreciable amount of disk space. This time around, after a few hours of snaptrimming, users complained of high IO latency, and indeed Ceph reported "slow ops" on a number of OSDs and on the active MDS. I attributed this to the snaptrimming and decided to reduce it by initially setting osd_pg_max_concurrent_snap_trims to 1, which didn't seem to help much, so I then set it to 0, which had the surprising effect of transitioning all PGs back to active+clean (is this intended?). I also restarted the MDS which seemed to be struggling. IO latency went back to normal immediately. Outside of users' working hours, I decided to resume snaptrimming by setting osd_pg_max_concurrent_snap_trims back to 1. Much to my surprise, nothing happened. All PGs remained (and still remain at time of writing) in the state active+clean, even after restarting some of them. This definitely seems abnormal, as I mentioned earlier, snaptrimming this FS previously would take in the order of multiple days. Moreover, if snaptrim were truly complete, I would expect pool usage to have dropped by appreciable amounts (at least a dozen terabytes), but that doesn't seem to be the case. A du on the CephFS root gives: # du -sh /mnt/pve/cephfs 31T /mnt/pve/cephfs But: # ceph df <snip> --- POOLS --- POOL ID PGS STORED OBJECTS USED %USED MAX AVAIL cephfs_data 7 512 43 TiB 190.83M 147 TiB 93.22 3.6 TiB cephfs_metadata 8 32 89 GiB 694.60k 266 GiB 1.32 6.4 TiB <snip> ceph pg dump reports a SNAPTRIMQ_LEN of 0 on all PGs. Did CephFS just leak a massive 12 TiB worth of objects...? It seems to me that the snaptrim operation did not complete at all. Perhaps relatedly: # ceph daemon mds.choi dump snaps { "last_created": 93, "last_destroyed": 94, "snaps": [ { "snapid": 93, "ino": 1, "stamp": "2022-02-21T00:00:01.245459+0800", "name": "2022-02-21" } ] } How can last_destroyed > last_created? The last snapshot to have been taken on this FS is indeed #93, and the removed snapshots were all created on previous weeks. Could someone shed some light please? Assuming that snaptrim didn't run to completion, how can I manually delete objects from now-removed snapshots? I believe this is what the Ceph documentation calls a "backwards scrub" - but I didn't find anything in the Ceph suite that can run such a scrub. This pool is filling up fast, I'll throw in some more OSDs for the moment to buy some time, but I certainly would appreciate your help! Happy to attach any logs or info you deem necessary. Regards, LRT _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx