This might be due to tombstone accumulation in rocksdb. You can try to issue a compact to all of your OSDs and see if that helps (ceph tell osd.XXX compact). I usually prefer to do this one host at a time just in case it causes issues, though on a reasonably fast RBD cluster you can often get away with compacting everything at once. Josh On Fri, Jan 27, 2023 at 6:52 AM Victor Rodriguez <vrodriguez@xxxxxxxxxxxxx> wrote: > > Hello, > > Asking for help with an issue. Maybe someone has a clue about what's > going on. > > Using ceph 15.2.17 on Proxmox 7.3. A big VM had a snapshot and I removed > it. A bit later, nearly half of the PGs of the pool entered snaptrim and > snaptrim_wait state, as expected. The problem is that such operations > ran extremely slow and client I/O was nearly nothing, so all VMs in the > cluster got stuck as they could not I/O to the storage. Taking and > removing big snapshots is a normal operation that we do often and this > is the first time I see this issue in any of my clusters. > > Disks are all Samsung PM1733 and network is 25G. It gives us plenty of > performance for the use case and never had an issue with the hardware. > > Both disk I/O and network I/O was very low. Still, client I/O seemed to > get queued forever. Disabling snaptrim (ceph osd set nosnaptrim) stops > any active snaptrim operation and client I/O resumes back to normal. > Enabling snaptrim again makes client I/O to almost halt again. > > I've been playing with some settings: > > ceph tell 'osd.*' injectargs '--osd-max-trimming-pgs 1' > ceph tell 'osd.*' injectargs '--osd-snap-trim-sleep 30' > ceph tell 'osd.*' injectargs '--osd-snap-trim-sleep-ssd 30' > ceph tell 'osd.*' injectargs '--osd-pg-max-concurrent-snap-trims 1' > > None really seemed to help. Also tried restarting OSD services. > > This cluster was upgraded from 14.2.x to 15.2.17 a couple of months. Is > there any setting that must be changed which may cause this problem? > > I have scheduled a maintenance window, what should I look for to > diagnose this problem? > > Any help is very appreciated. Thanks in advance. > > Victor > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx