Requests stuck for > 2 hours cannot be attributed to "IO load on the cluster". Looks like some OSDs really are stuck, things to try: * run "ceph daemon osd.X dump_blocked_ops" on one of the affected OSDs to see what is stuck * try restarting OSDs to see if it clears up automatically Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Thu, Oct 31, 2019 at 2:27 PM Thomas Schneider <74cmonty@xxxxxxxxx> wrote: > > Hi, > > after enabling ceph balancer (with command ceph balancer on) the health > status changed to error. > This is the current output of ceph health detail: > root@ld3955:~# ceph health detail > HEALTH_ERR 1438 slow requests are blocked > 32 sec; 861 stuck requests > are blocked > 4096 sec; mon ld5505 is low on available space > REQUEST_SLOW 1438 slow requests are blocked > 32 sec > 683 ops are blocked > 2097.15 sec > 436 ops are blocked > 1048.58 sec > 191 ops are blocked > 524.288 sec > 78 ops are blocked > 262.144 sec > 35 ops are blocked > 131.072 sec > 11 ops are blocked > 65.536 sec > 4 ops are blocked > 32.768 sec > osd.62 has blocked requests > 65.536 sec > osds 39,72 have blocked requests > 262.144 sec > osds 6,19,67,173,174,187,188,269,434 have blocked requests > 524.288 sec > osds > 8,16,35,36,37,61,63,64,68,73,75,178,186,271,369,420,429,431,433,436 have > blocked requests > 1048.58 sec > osds 3,5,7,24,34,38,40,41,59,66,69,74,180,270,370,421,432,435 have > blocked requests > 2097.15 sec > REQUEST_STUCK 861 stuck requests are blocked > 4096 sec > 25 ops are blocked > 8388.61 sec > 836 ops are blocked > 4194.3 sec > osds 2,28,29,32,60,65,181,185,268,368,423,424,426 have stuck > requests > 4194.3 sec > osds 0,30,70,71,184 have stuck requests > 8388.61 sec > > I understand that when balancer starts shifting PGs to other OSDs that > this caused IO load on the cluster. > However I don't understand why this is affecting OSD so heavily. > And I don't understand why OSD of specific type (SSD, NVME) suffer > although there's no balancing occuring on them. > > Regards > Thomas > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx