Potential OSD deadlock?

Robert LeBlanc <robert@xxxxxxxxxxxxx> · Sat, 19 Sep 2015 01:03:13 -0600

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

We have had two situations where I/O just seems to be indefinitely
blocked on our production cluster today (0.94.3). In the case this
morning, it was just normal I/O traffic, no recovery or backfill. The
case this evening, we were backfilling to some new OSDs. I would have
loved to have bumped up the debugging to get an idea of what was going
on, but time was exhausted. The incident this evening I was able to do
some additional troubleshooting, but got real anxious after I/O had
been blocked for 10 minutes and OPs was getting hot around the collar.

Here are the important parts of the logs:
[osd.30]
2015-09-18 23:05:36.188251 7efed0ef0700  0 log_channel(cluster) log
[WRN] : slow request 30.662958 seconds old,
 received at 2015-09-18 23:05:05.525220: osd_op(client.3117179.0:18654441
 rbd_data.1099d2f67aaea.0000000000000f62 [set-alloc-hint object_size
8388608 write_size 8388608,write 1048576~643072] 4.5ba1672c
ack+ondisk+write+known_if_redirected e55919)
 currently waiting for subops from 32,70,72

[osd.72]
2015-09-18 23:05:19.302985 7f3fa19f8700  0 log_channel(cluster) log
[WRN] : slow request 30.200408 seconds old,
 received at 2015-09-18 23:04:49.102519: osd_op(client.4267090.0:3510311
 rbd_data.3f41d41bd65b28.0000000000009e2b [set-alloc-hint object_size
4194304 write_size 4194304,write 1048576~421888] 17.40adcada
ack+ondisk+write+known_if_redirected e55919)
 currently waiting for subops from 2,30,90

The other OSDs listed (32,70,2,90) did not have any errors in the logs
about blocked I/O. It seems that osd.30 was waiting for osd.72 and
visa versa. I looked at top and iostat of these two hosts and the OSD
processes and disk I/O were pretty idle.

I know that this isn't a lot to go on. Our cluster is under very heavy
load and we get several blocked I/Os every hour, but they usually
clear up within 15 seconds. We seem to get I/O blocked when the op
latency of the cluster goes above 1 (average from all OSDs as seen by
Graphite).

Has anyone seen this infinite blocked I/O? Bouncing osd.72 immediately
cleared all the blocked I/O and then it was fine after rejoining the
cluster. Increasing what logs and to what level would be most
beneficial in this case for troubleshooting?

I hope this makes sense, it has been a long day.

- ----------------
Robert LeBlanc
PGP Fingerprint 79A2 9CA4 6CC4 45DD A904  C70E E654 3BB2 FA62 B9F1
-----BEGIN PGP SIGNATURE-----
Version: Mailvelope v1.1.0
Comment: https://www.mailvelope.com

wsFcBAEBCAAQBQJV/QiuCRDmVDuy+mK58QAAfskP/A0+RRAtq49pwfJcmuaV
LKMsdaOFu0WL1zNLgnj4KOTR1oYyEShXW3Xn0axw1C2U2qXkJQfvMyQ7PTj7
cKqNeZl7rcgwkgXlij1hPYs9tjsetjYXBmmui+CqbSyNNo95aPrtUnWPcYnc
K7blP6wuv7p0ddaF8wgw3Jf0GhzlHyykvVlxLYjQWwBh1CTrSzNWcEiHz5NE
9Y/GU5VZn7o8jeJDh6tQGgSbUjdk4NM2WuhyWNEP1klV+x1P51krXYDR7cNC
DSWaud1hNtqYdquVPzx0UCcUVR0JfVlEX26uxRLgNd0dDkq+CRXIGhakVU75
Yxf8jwVdbAg1CpGtgHx6bWyho2rrsTzxeul8AFLWtELfod0e5nLsSUfQuQ2c
MXrIoyHUcs7ySP3ozazPOdxwBEpiovUZOBy1gl2sCSGvYsmYokHEO0eop2rl
kVS4dSAvDezmDhWumH60Y661uzySBGtrMlV/u3nw8vfvLhEAbuE+lLybMmtY
nJvJIzbTqFzxaeX4PTWcUhXRNaPp8PDS5obmx5Fpn+AYOeLet/S1Alz1qNM2
4w34JKwKO92PtDYqzA6cj628fltdLkxFNoz7DFfqxr80DM7ndLukmSkPY+Oq
qYOQMoownMnHuL0IrC9Jo8vK07H8agQyLF8/m4c3oTqnzZhh/rPRlPfyHEio
Roj5
=ut4B
-----END PGP SIGNATURE-----
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com