deep-scrub / backfilling: large amount of SLOW_OPS after upgrade to 13.2.8

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

After the upgrade to 13.2.8 deep-scrub has a big impact on client IO:
loads of SLOW_OPS and high latency. We hardly ever had SLOW_OPS, but
since the upgrade the impact is so big that we even have OSDs marking
each other out (OSD op thread timeout) multiple times during the scrub
window. Plenty of CPU / RAM / IOPS left, hardly any load on these OSD
servers. Has there anything changed in this release that can explain
this behaviour?

Besides this the impact of rebalance is very severe as well. With only
the balancer remapping a couple of PGs at a time there are loads of
(MDS_)SLOW_OPS. This morning the cephfs metadata pool got rebalanced ...
and that triggered a lot of SLOW_OPS. One particular OSD was pegged at
1000% CPU for more than half an hour (not doing that much IO): that's 10
cores going full throttle! After a restart this issue was gone.

Thanks,

Stefan



-- 
| BIT BV  https://www.bit.nl/        Kamer van Koophandel 09090351
| GPG: 0xD14839C6                   +31 318 648 688 / info@xxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux