Hi, Quoting Stefan Kooman (stefan@xxxxxx): > Hi, > > After the upgrade to 13.2.8 deep-scrub has a big impact on client IO: > loads of SLOW_OPS and high latency. We hardly ever had SLOW_OPS, but > since the upgrade the impact is so big that we even have OSDs marking > each other out (OSD op thread timeout) multiple times during the scrub > window. Plenty of CPU / RAM / IOPS left, hardly any load on these OSD > servers. Has there anything changed in this release that can explain > this behaviour? > > Besides this the impact of rebalance is very severe as well. With only > the balancer remapping a couple of PGs at a time there are loads of > (MDS_)SLOW_OPS. This morning the cephfs metadata pool got rebalanced ... > and that triggered a lot of SLOW_OPS. One particular OSD was pegged at > 1000% CPU for more than half an hour (not doing that much IO): that's 10 > cores going full throttle! After a restart this issue was gone. We can now also trigger SLOW_OPS on a bunch of OSDs when we do a "rbd du -p $POOL", something that has never been an issue. The images in the rbd pools have the following features enabled: layering, exclusive-lock, object-map, fast-diff, deep-flatten. Has there anything changed in 13.2.8 that affects these kind of operations? Gr. Stefan -- | BIT BV https://www.bit.nl/ Kamer van Koophandel 09090351 | GPG: 0xD14839C6 +31 318 648 688 / info@xxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx