Hello list, We've recently moved our 588TB Ceph cluster into production by moving VMs on there, but this morning we started receiving the following message: cluster xxxxxxxxetc health HEALTH_WARN 20 requests are blocked > 32 sec monmap e1: 3 mons at {ceph-mon-a=172.29.234.11:6789/0,ceph-mon-b=172.29.234.12:6789/0,ceph-mon-c=172.29.234.13:6789/0}, election epoch 306, quorum 0,1,2 ceph-mon-a,ceph-mon-b,ceph-mon-c osdmap e14763: 216 osds: 215 up, 215 in pgmap v6758511: 18072 pgs, 20 pools, 49884 GB data, 12451 kobjects 99837 GB used, 488 TB / 586 TB avail 18063 active+clean 9 active+clean+scrubbing+deep client io 3365 kB/s rd, 22576 kB/s wr, 1105 op/s (We're down an OSD pending a disk replacement) These delays appear to have happened all over our cluster, on different hosts, at around the same time, so it doesn't appear to have been a single host that had a problem: root@ceph-mon-a:~# grep 'slow request' /var/log/ceph/ceph.log | awk '{print $3}' | sort | uniq -c | sort -n 2 osd.67 3 osd.11 4 osd.104 4 osd.148 4 osd.193 5 osd.151 6 osd.0 6 osd.190 9 osd.136 10 osd.215 15 osd.47 18 osd.202 44 osd.124 62 osd.51 74 osd.34 84 osd.9 186 osd.86 root@ceph-mon-a:~# for x in c1m1 c1m2 c2m1 c2m2 c3m1 c3m2; do echo ${x} && ssh ${x} mount | egrep "\-(67|11|104|148|193|151|0|190|136|215|47|202|124|51|34|9|86) "; done; c1m1 /dev/sdaa1 on /var/lib/ceph/osd/ceph-0 type xfs (rw,noatime) c1m2 /dev/sdm1 on /var/lib/ceph/osd/ceph-67 type xfs (rw,noatime) /dev/sdai1 on /var/lib/ceph/osd/ceph-193 type xfs (rw,noatime) /dev/sdal1 on /var/lib/ceph/osd/ceph-202 type xfs (rw,noatime) c2m1 /dev/sds1 on /var/lib/ceph/osd/ceph-104 type xfs (rw,noatime) /dev/sdp1 on /var/lib/ceph/osd/ceph-86 type xfs (rw,noatime) /dev/sdaa1 on /var/lib/ceph/osd/ceph-151 type xfs (rw,noatime) c2m2 /dev/sdg1 on /var/lib/ceph/osd/ceph-34 type xfs (rw,noatime) /dev/sdag1 on /var/lib/ceph/osd/ceph-215 type xfs (rw,noatime) c3m1 /dev/sdi1 on /var/lib/ceph/osd/ceph-47 type xfs (rw,noatime) /dev/sdc1 on /var/lib/ceph/osd/ceph-11 type xfs (rw,noatime) /dev/sdv1 on /var/lib/ceph/osd/ceph-136 type xfs (rw,noatime) /dev/sdab1 on /var/lib/ceph/osd/ceph-124 type xfs (rw,noatime) /dev/sdad1 on /var/lib/ceph/osd/ceph-148 type xfs (rw,noatime) c3m2 /dev/sdd1 on /var/lib/ceph/osd/ceph-9 type xfs (rw,noatime) /dev/sdm1 on /var/lib/ceph/osd/ceph-51 type xfs (rw,noatime) /dev/sdal1 on /var/lib/ceph/osd/ceph-190 type xfs (rw,noatime) Log excerpt from around the middle of the time this was happening: 2014-02-19 08:42:41.246842 osd.151 172.29.234.23:6807/179269 6696 : [WRN] slow request 30.866614 seconds old, received at 2014-02-19 08:42:10.378934: osd_op(client.94103.1:136181794 rb.0.16fe8.238e1f29.00000005c3de [write 188416~4096] 5.7a1ac3a3 e14763) currently waiting for subops from [51] 2014-02-19 08:42:41.247117 osd.151 172.29.234.23:6807/179269 6697 : [WRN] slow request 30.866401 seconds old, received at 2014-02-19 08:42:10.379147: osd_op(client.94103.1:136181795 rb.0.16fe8.238e1f29.00000005c3de [write 1077248~12288] 5.7a1ac3a3 e14763) currently waiting for subops from [51] 2014-02-19 08:42:41.247125 osd.151 172.29.234.23:6807/179269 6698 : [WRN] slow request 30.865893 seconds old, received at 2014-02-19 08:42:10.379655: osd_op(client.94103.1:136181796 rb.0.16fe8.238e1f29.00000005c3de [write 1093632~12288] 5.7a1ac3a3 e14763) currently waiting for subops from [51] 2014-02-19 08:42:41.247133 osd.151 172.29.234.23:6807/179269 6699 : [WRN] slow request 30.865366 seconds old, received at 2014-02-19 08:42:10.380182: osd_op(client.94103.1:136181797 rb.0.16fe8.238e1f29.00000005c3de [write 1126400~12288] 5.7a1ac3a3 e14763) currently waiting for subops from [51] 2014-02-19 08:42:49.096313 osd.202 172.29.234.22:6842/6625 4937 : [WRN] 12 slow requests, 1 included below; oldest blocked for > 45.689579 secs 2014-02-19 08:42:49.096324 osd.202 172.29.234.22:6842/6625 4938 : [WRN] slow request 30.538718 seconds old, received at 2014-02-19 08:42:18.557537: osd_op(client.94103.1:136187133 rb.0.3df06.238e1f29.000000023481 [write 118784~4096] 5.a3a00d5a e14763) currently waiting for subops from [51] We're running Ubuntu 12.04 with a 3.8 kernel (package 'linux-image-generic-lts-raring' from official repo) and Ceph version 0.72-1precise. Storage hardware is SGI MIS[1] servers with 32-40 disks (and therefore OSDs) per host, 24 CPU cores and 64GB of RAM. FWIW, the three monitors are vastly overpowered Dell R710s with 16 cores and 32GB of RAM each. Load on the OSD hosts fluctuates between 4.00-8.00, iostat shows iowait around 10% normally, but don't have these for when the issue was occurring. I suspect there's not much I can do until the problem happens again and I can diagnose it during the issue, but any help or advice towards diagnosing the root cause of this would be much appreciated! [1] - https://www.sgi.com/products/storage/modular/server.html Regards Dane _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com