On Wed, Feb 19, 2014 at 2:50 AM, Dane Elwell <dane.elwell@xxxxxxxxx> wrote: > Hello list, > > We've recently moved our 588TB Ceph cluster into production by moving > VMs on there, but this morning we started receiving the following > message: > > cluster xxxxxxxxetc > health HEALTH_WARN 20 requests are blocked > 32 sec > monmap e1: 3 mons at > {ceph-mon-a=172.29.234.11:6789/0,ceph-mon-b=172.29.234.12:6789/0,ceph-mon-c=172.29.234.13:6789/0}, > election epoch 306, quorum 0,1,2 ceph-mon-a,ceph-mon-b,ceph-mon-c > osdmap e14763: 216 osds: 215 up, 215 in > pgmap v6758511: 18072 pgs, 20 pools, 49884 GB data, 12451 kobjects > 99837 GB used, 488 TB / 586 TB avail > 18063 active+clean > 9 active+clean+scrubbing+deep > client io 3365 kB/s rd, 22576 kB/s wr, 1105 op/s > > (We're down an OSD pending a disk replacement) > > These delays appear to have happened all over our cluster, on > different hosts, at around the same time, so it doesn't appear to have > been a single host that had a problem: > > root@ceph-mon-a:~# grep 'slow request' /var/log/ceph/ceph.log | awk > '{print $3}' | sort | uniq -c | sort -n > 2 osd.67 > 3 osd.11 > 4 osd.104 > 4 osd.148 > 4 osd.193 > 5 osd.151 > 6 osd.0 > 6 osd.190 > 9 osd.136 > 10 osd.215 > 15 osd.47 > 18 osd.202 > 44 osd.124 > 62 osd.51 > 74 osd.34 > 84 osd.9 > 186 osd.86 > > root@ceph-mon-a:~# for x in c1m1 c1m2 c2m1 c2m2 c3m1 c3m2; do echo > ${x} && ssh ${x} mount | egrep > "\-(67|11|104|148|193|151|0|190|136|215|47|202|124|51|34|9|86) "; > done; > c1m1 > /dev/sdaa1 on /var/lib/ceph/osd/ceph-0 type xfs (rw,noatime) > c1m2 > /dev/sdm1 on /var/lib/ceph/osd/ceph-67 type xfs (rw,noatime) > /dev/sdai1 on /var/lib/ceph/osd/ceph-193 type xfs (rw,noatime) > /dev/sdal1 on /var/lib/ceph/osd/ceph-202 type xfs (rw,noatime) > c2m1 > /dev/sds1 on /var/lib/ceph/osd/ceph-104 type xfs (rw,noatime) > /dev/sdp1 on /var/lib/ceph/osd/ceph-86 type xfs (rw,noatime) > /dev/sdaa1 on /var/lib/ceph/osd/ceph-151 type xfs (rw,noatime) > c2m2 > /dev/sdg1 on /var/lib/ceph/osd/ceph-34 type xfs (rw,noatime) > /dev/sdag1 on /var/lib/ceph/osd/ceph-215 type xfs (rw,noatime) > c3m1 > /dev/sdi1 on /var/lib/ceph/osd/ceph-47 type xfs (rw,noatime) > /dev/sdc1 on /var/lib/ceph/osd/ceph-11 type xfs (rw,noatime) > /dev/sdv1 on /var/lib/ceph/osd/ceph-136 type xfs (rw,noatime) > /dev/sdab1 on /var/lib/ceph/osd/ceph-124 type xfs (rw,noatime) > /dev/sdad1 on /var/lib/ceph/osd/ceph-148 type xfs (rw,noatime) > c3m2 > /dev/sdd1 on /var/lib/ceph/osd/ceph-9 type xfs (rw,noatime) > /dev/sdm1 on /var/lib/ceph/osd/ceph-51 type xfs (rw,noatime) > /dev/sdal1 on /var/lib/ceph/osd/ceph-190 type xfs (rw,noatime) > > Log excerpt from around the middle of the time this was happening: > > 2014-02-19 08:42:41.246842 osd.151 172.29.234.23:6807/179269 6696 : > [WRN] slow request 30.866614 seconds old, received at 2014-02-19 > 08:42:10.378934: osd_op(client.94103.1:136181794 > rb.0.16fe8.238e1f29.00000005c3de [write 188416~4096] 5.7a1ac3a3 > e14763) currently waiting for subops from [51] > 2014-02-19 08:42:41.247117 osd.151 172.29.234.23:6807/179269 6697 : > [WRN] slow request 30.866401 seconds old, received at 2014-02-19 > 08:42:10.379147: osd_op(client.94103.1:136181795 > rb.0.16fe8.238e1f29.00000005c3de [write 1077248~12288] 5.7a1ac3a3 > e14763) currently waiting for subops from [51] > 2014-02-19 08:42:41.247125 osd.151 172.29.234.23:6807/179269 6698 : > [WRN] slow request 30.865893 seconds old, received at 2014-02-19 > 08:42:10.379655: osd_op(client.94103.1:136181796 > rb.0.16fe8.238e1f29.00000005c3de [write 1093632~12288] 5.7a1ac3a3 > e14763) currently waiting for subops from [51] > 2014-02-19 08:42:41.247133 osd.151 172.29.234.23:6807/179269 6699 : > [WRN] slow request 30.865366 seconds old, received at 2014-02-19 > 08:42:10.380182: osd_op(client.94103.1:136181797 > rb.0.16fe8.238e1f29.00000005c3de [write 1126400~12288] 5.7a1ac3a3 > e14763) currently waiting for subops from [51] > 2014-02-19 08:42:49.096313 osd.202 172.29.234.22:6842/6625 4937 : > [WRN] 12 slow requests, 1 included below; oldest blocked for > > 45.689579 secs > 2014-02-19 08:42:49.096324 osd.202 172.29.234.22:6842/6625 4938 : > [WRN] slow request 30.538718 seconds old, received at 2014-02-19 > 08:42:18.557537: osd_op(client.94103.1:136187133 > rb.0.3df06.238e1f29.000000023481 [write 118784~4096] 5.a3a00d5a > e14763) currently waiting for subops from [51] You'll want to gather more data to be certain, but each of these say "waiting for subops from [51]", meaning that it sent off a replication request to OSD 51 and is waiting on the response before it continues. Given that these are a couple different OSDs, it looks like maybe something happened to osd.151, so I'd look at what it says has happened. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com > > We're running Ubuntu 12.04 with a 3.8 kernel (package > 'linux-image-generic-lts-raring' from official repo) and Ceph version > 0.72-1precise. Storage hardware is SGI MIS[1] servers with 32-40 disks > (and therefore OSDs) per host, 24 CPU cores and 64GB of RAM. FWIW, the > three monitors are vastly overpowered Dell R710s with 16 cores and > 32GB of RAM each. Load on the OSD hosts fluctuates between 4.00-8.00, > iostat shows iowait around 10% normally, but don't have these for when > the issue was occurring. > > I suspect there's not much I can do until the problem happens again > and I can diagnose it during the issue, but any help or advice towards > diagnosing the root cause of this would be much appreciated! > > [1] - https://www.sgi.com/products/storage/modular/server.html > > Regards > > Dane > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com