Hello, We have observed that our cluster is often moving back and forth from HEALTH_OK to HEALTH_WARN states due to "blocked requests". We have also observed "blocked ops". For instance: # ceph status cluster 905a1185-b4f0-4664-b881-f0ad2d8be964 health HEALTH_WARN 1 requests are blocked > 32 sec monmap e5: 5 mons at {ceph-host-1=192.168.0.65:6789/0,ceph-host-2=192.168.0.66:6789/0,ceph-host-3=192.168.0.67:6789/0,ceph-host-4=192.168.0.68:6789/0,ceph-host-5=192.168.0.69:6789/0} election epoch 44, quorum 0,1,2,3,4 ceph-host-1,ceph-host-2,ceph-host-3,ceph-host-4,ceph-host-5 osdmap e5091: 120 osds: 100 up, 100 in pgmap v473436: 2048 pgs, 2 pools, 4373 GB data, 1093 kobjects 13164 GB used, 168 TB / 181 TB avail 2048 active+clean client io 10574 kB/s rd, 33883 kB/s wr, 655 op/s # ceph health detail HEALTH_WARN 1 requests are blocked > 32 sec; 1 osds have slow requests 1 ops are blocked > 67108.9 sec 1 ops are blocked > 67108.9 sec on osd.71 1 osds have slow requests My questions are: (1) Is it normal to have "slow requests" in a cluster? (2) Or is it a symptom that indicates that something is wrong? (for example, a disk is about to fail) (3) How can we fix the "slow requests"? (4) What's the meaning of "blocked ops", and how can they be blocked so long? (67000 seconds is more than 18 hours!) (5) How can we fix the "blocked ops"? Thank you very much for your help. Best regards, - Xavier Serrano - LCAC, Laboratori de C?lcul - Departament d'Arquitectura de Computadors, UPC