Hey, What number do you have for a replication factor? As for three, 1.5k IOPS may be a little bit high for 36 disks, and your OSD ids looks a bit suspicious - there should not be 60+ OSDs based on calculation from numbers below. On 11/28/2013 12:45 AM, Oliver Schulz wrote: > Dear Ceph Experts, > > our Ceph cluster suddenly went into a state of OSDs constantly having > blocked or slow requests, rendering the cluster unusable. This happened > during normal use, there were no updates, etc. > > All disks seem to be healthy (smartctl, iostat, etc.). A complete > hardware reboot including system update on all nodes has not helped. > The network equipment also shows no trouble. > > We'd be glad for any advice on how to diagnose and solve this, as > the cluster is basically at a standstill and we urgently need > to get it back into operation. > > Cluster structure: 6 Nodes, 6x 3TB disks plus 1x System/Journal SSD > per node, one OSD per disk. We're running ceph version 0.67.4-1precise > on Ubuntu 12.04.3 with kernel 3.8.0-33-generic (x86_64). > > "ceph status" shows something like (it varies): > > cluster 899509fe-afe4-42f4-a555-bb044ca0f52d > health HEALTH_WARN 77 requests are blocked > 32 sec > monmap e1: 3 mons at > {a=134.107.24.179:6789/0,b=134.107.24.181:6789/0,c=134.107.24.183:6789/0}, > election epoch 312, quorum 0,1,2 a,b,c > osdmap e32600: 36 osds: 36 up, 36 in > pgmap v16404527: 14304 pgs: 14304 active+clean; 20153 GB data, > 60630 GB used, 39923 GB / 100553 GB avail; 1506KB/s rd, 21246B/s wr, > 545op/s > mdsmap e478: 1/1/1 up {0=c=up:active}, 1 up:standby-replay > > "ceph health detail" shows something like (it varies): > > HEALTH_WARN 363 requests are blocked > 32 sec; 22 osds have slow > requests > 363 ops are blocked > 32.768 sec > 1 ops are blocked > 32.768 sec on osd.0 > 8 ops are blocked > 32.768 sec on osd.3 > 37 ops are blocked > 32.768 sec on osd.12 > [...] > 11 ops are blocked > 32.768 sec on osd.62 > 45 ops are blocked > 32.768 sec on osd.65 > 22 osds have slow requests > > The number and identity of affected OSDs constantly changes > (sometimes health even goes to OK for a moment). > > > Cheers and thanks for any ideas, > > Oliver > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com