We set the logging on an OSD that had problems pretty frequently, but cleared up in less than 30 seconds. The logs are at http://162.144.87.113/files/ceph-osd.112.log.xz and are uncompressed at 8.6GB. Some of the messages we were seeing in ceph -w are: 2015-09-20 20:55:44.029041 osd.112 [WRN] 10 slow requests, 10 included below; oldest blocked for > 30.132696 secs 2015-09-20 20:55:44.029047 osd.112 [WRN] slow request 30.132696 seconds old, received at 2015-09-20 20:55:13.896286: osd_op(client.3289538.0:62497509 rbd_data.29b9ae3f960770.0000000000000200 [stat,set-alloc-hint object_size 8388608 write_size 8388608,write 2588672~4096] 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently reached_pg 2015-09-20 20:55:44.029051 osd.112 [WRN] slow request 30.132619 seconds old, received at 2015-09-20 20:55:13.896363: osd_op(client.3289538.0:62497510 rbd_data.29b9ae3f960770.0000000000000200 [stat,set-alloc-hint object_size 8388608 write_size 8388608,write 2908160~12288] 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently waiting for rw locks 2015-09-20 20:55:44.029054 osd.112 [WRN] slow request 30.132520 seconds old, received at 2015-09-20 20:55:13.896462: osd_op(client.3289538.0:62497511 rbd_data.29b9ae3f960770.0000000000000200 [stat,set-alloc-hint object_size 8388608 write_size 8388608,write 2949120~4096] 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently waiting for rw locks 2015-09-20 20:55:44.029058 osd.112 [WRN] slow request 30.132415 seconds old, received at 2015-09-20 20:55:13.896567: osd_op(client.3289538.0:62497512 rbd_data.29b9ae3f960770.0000000000000200 [stat,set-alloc-hint object_size 8388608 write_size 8388608,write 2957312~4096] 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently waiting for rw locks 2015-09-20 20:55:44.029061 osd.112 [WRN] slow request 30.132302 seconds old, received at 2015-09-20 20:55:13.896680: osd_op(client.3289538.0:62497513 rbd_data.29b9ae3f960770.0000000000000200 [stat,set-alloc-hint object_size 8388608 write_size 8388608,write 2998272~4096] 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently waiting for rw locks 2015-09-20 20:55:45.029290 osd.112 [WRN] 9 slow requests, 5 included below; oldest blocked for > 31.132843 secs 2015-09-20 20:55:45.029298 osd.112 [WRN] slow request 31.132447 seconds old, received at 2015-09-20 20:55:13.896759: osd_op(client.3289538.0:62497514 rbd_data.29b9ae3f960770.0000000000000200 [stat,set-alloc-hint object_size 8388608 write_size 8388608,write 3035136~4096] 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently waiting for rw locks 2015-09-20 20:55:45.029303 osd.112 [WRN] slow request 31.132362 seconds old, received at 2015-09-20 20:55:13.896845: osd_op(client.3289538.0:62497515 rbd_data.29b9ae3f960770.0000000000000200 [stat,set-alloc-hint object_size 8388608 write_size 8388608,write 3047424~4096] 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently waiting for rw locks 2015-09-20 20:55:45.029309 osd.112 [WRN] slow request 31.132276 seconds old, received at 2015-09-20 20:55:13.896931: osd_op(client.3289538.0:62497516 rbd_data.29b9ae3f960770.0000000000000200 [stat,set-alloc-hint object_size 8388608 write_size 8388608,write 3072000~4096] 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently waiting for rw locks 2015-09-20 20:55:45.029315 osd.112 [WRN] slow request 31.132199 seconds old, received at 2015-09-20 20:55:13.897008: osd_op(client.3289538.0:62497517 rbd_data.29b9ae3f960770.0000000000000200 [stat,set-alloc-hint object_size 8388608 write_size 8388608,write 3211264~4096] 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently waiting for rw locks 2015-09-20 20:55:45.029326 osd.112 [WRN] slow request 31.132127 seconds old, received at 2015-09-20 20:55:13.897079: osd_op(client.3289538.0:62497518 rbd_data.29b9ae3f960770.0000000000000200 [stat,set-alloc-hint object_size 8388608 write_size 8388608,write 3235840~4096] 17.118f0c67 ack+ondisk+write+known_if_redirected e57590) currently waiting for rw locks ---------------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Sun, Sep 20, 2015 at 7:02 PM, Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA256 > > I was able to catch the tail end of one of these and increased the > logging on it. I had to kill it a minute or two after the logging was > increased because of the time of the day. > > I've put the logs at https://robert.leblancnet.us/ceph-osd.8.log.xz . > > Thanks, > - ---------------- > Robert LeBlanc > PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 > > > On Sun, Sep 20, 2015 at 9:03 AM, Robert LeBlanc wrote: >> -----BEGIN PGP SIGNED MESSAGE----- >> Hash: SHA256 >> >> We had another incident of 100 long blocked I/O this morning, but I >> didn't get to it in time. I wound up clearing itself after almost >> 1,000 seconds. On interesting note is that the blocked I/O kept >> creeping up until I see a bunch of entrys in the log like: >> >> 2015-09-20 08:20:01.870141 7f5fbe05c700 0 -- 10.217.72.12:6812/1468 >>>> 10.217.72.35:0/4027675 pipe(0x2f7a3000 sd=363 :6812 s=0 pgs=0 cs=0 >> l=1 c=0x1bd40840).accept replacing existing (lossy) channel (new one >> lossy=1) >> 2015-09-20 08:20:02.061539 7f5f43de3700 0 -- 10.217.72.12:6812/1468 >>>> 10.217.72.33:0/2012691 pipe(0x28857000 sd=408 :6812 s=0 pgs=0 cs=0 >> l=1 c=0x1bd43020).accept replacing existing (lossy) channel (new one >> lossy=1) >> 2015-09-20 08:20:02.817884 7f5fb2ca9700 0 -- 10.217.72.12:6812/1468 >>>> 10.217.72.33:0/2040360 pipe(0x283ff000 sd=605 :6812 s=0 pgs=0 cs=0 >> l=1 c=0x1bd402c0).accept replacing existing (lossy) channel (new one >> lossy=1) >> >> after almost 100 of these in about a 2 second period, the I/O starts >> draining over the next 3-4 seconds. What does this message mean? 1468 >> appears to be the PID of one of the OSD processes, but the number in >> the same position on the dest is not a PID. What would cause a channel >> to be dropped and recreated? >> >> There are 10 OSDs on this host, but only two other OSDs show anything >> and only two messages (but it is many seconds AFTER the problem clears >> up). >> >> OSD.128 >> >> 2015-09-20 08:25:32.331268 7fe92e7e0700 0 -- 10.217.72.12:6816/7060 >> submit_message osd_op_reply(5553069 >> rbd_data.3e30b0275d493f.000000000000d800 [set-alloc-hint object_size >> 4194304 write_size 4194304,write 143360~4096] v57084'16557680 >> uv16557680 ack = 0) v6 remote, 10.217.72.33:0/4005620, failed lossy >> con, dropping message 0xa099180 >> 2015-09-20 08:25:32.331401 7fe92e7e0700 0 -- 10.217.72.12:6816/7060 >> submit_message osd_op_reply(5553069 >> rbd_data.3e30b0275d493f.000000000000d800 [set-alloc-hint object_size >> 4194304 write_size 4194304,write 143360~4096] v57084'16557680 >> uv16557680 ondisk = 0) v6 remote, 10.217.72.33:0/4005620, failed lossy >> con, dropping message 0x16217600 >> >> OSD.121 >> >> 2015-09-20 08:29:15.192055 7f51c07e2700 0 -- 10.217.72.12:6802/25568 >> submit_message osd_op_reply(1483497 >> rbd_data.3fc0847f32a8f4.0000000000003200 [set-alloc-hint object_size >> 4194304 write_size 4194304,write 135168~4096] v57086'14949171 >> uv14949171 ack = 0) v6 remote, 10.217.72.31:0/5002159, failed lossy >> con, dropping message 0x1dd8a840 >> 2015-09-20 08:29:15.192612 7f51c07e2700 0 -- 10.217.72.12:6802/25568 >> submit_message osd_op_reply(1483497 >> rbd_data.3fc0847f32a8f4.0000000000003200 [set-alloc-hint object_size >> 4194304 write_size 4194304,write 135168~4096] v57086'14949171 >> uv14949171 ondisk = 0) v6 remote, 10.217.72.31:0/5002159, failed lossy >> con, dropping message 0x1476ec00 >> >> Our Ceph processes start with some real high limits: >> root 1465 0.0 0.0 127700 3492 ? Ss Jun11 0:00 >> /bin/bash -c ulimit -n 32768; /usr/bin/ceph-osd -i 126 --pid-file >> /var/run/ceph/osd.126.pid -c /etc/ceph/ceph.conf --cluster ceph -f >> root 1468 9.7 1.4 4849560 923536 ? Sl Jun11 14190:59 >> /usr/bin/ceph-osd -i 126 --pid-file /var/run/ceph/osd.126.pid -c >> /etc/ceph/ceph.conf --cluster ceph -f >> >> [root@ceph2 1468]# cat /etc/security/limits.d/90-local.conf >> * - nofile 16384 >> >> We have 130 OSDs on 13 hosts. We also have ~50 KVM VM running on this storage. >> >> cluster 48de182b-5488-42bb-a6d2-62e8e47b435c >> health HEALTH_WARN >> 198 pgs backfill >> 4 pgs backfilling >> 169 pgs degraded >> 150 pgs recovery_wait >> 169 pgs stuck degraded >> 352 pgs stuck unclean >> 12 pgs stuck undersized >> 12 pgs undersized >> recovery 161065/41285858 objects degraded (0.390%) >> recovery 2871014/41285858 objects misplaced (6.954%) >> noscrub,nodeep-scrub flag(s) set >> monmap e2: 3 mons at >> {mon1=10.217.72.27:6789/0,mon2=10.217.72.28:6789/0,mon3=10.217.72.29:6789/0} >> election epoch 180, quorum 0,1,2 mon1,mon2,mon3 >> osdmap e57086: 130 osds: 130 up, 130 in; 270 remapped pgs >> flags noscrub,nodeep-scrub >> pgmap v10921036: 2308 pgs, 3 pools, 39046 GB data, 9735 kobjects >> 151 TB used, 320 TB / 472 TB avail >> 161065/41285858 objects degraded (0.390%) >> 2871014/41285858 objects misplaced (6.954%) >> 1956 active+clean >> 183 active+remapped+wait_backfill >> 82 active+recovery_wait+degraded >> 67 active+recovery_wait+degraded+remapped >> 9 active+undersized+degraded+remapped+wait_backfill >> 6 active+degraded+remapped+wait_backfill >> 2 active+undersized+degraded+remapped+backfilling >> 2 active+degraded+remapped+backfilling >> 1 active+recovery_wait+undersized+degraded+remapped >> recovery io 25770 kB/s, 6 objects/s >> client io 78274 kB/s rd, 119 MB/s wr, 5949 op/s >> >> Any ideas would be helpful. >> -----BEGIN PGP SIGNATURE----- >> Version: Mailvelope v1.1.0 >> Comment: https://www.mailvelope.com >> >> wsFcBAEBCAAQBQJV/sqiCRDmVDuy+mK58QAAdp0QAMEU7JW88NDSzpIidGfT >> EZpUkN3UqcOWeb/szCasHzjEs+IPPNGVEICs4KoTfQQobJaVIi3T6RmioOER >> RrI0S4SO/LCtxyb/3iAfSKgFd9nEDlZKBmywQ1HgZZR0a2uMwzbSj3zfnOaq >> tuLqVj10HGoO3KnjYjv4z+itpGrOOPew1Zjj4mgani+feaR5KxnopVKTNaut >> ZivEkjEseHbtj6rxio8w4XKj8LzZCARUeiLB6c1d1/NOegcK4sPZvCXV7syx >> eOeP9spr4YWrBnO0HNKqxNcKW2GfWbVGEIEmEsZYFoj8E3u2uUZ9Y9s41BtT >> Sy67Edr8prfe2+9Jb29buXnj9k2bnzQ87SjtbGdym8/WgkHRmKeV8enGJ+X4 >> IpaYj3HuLfGL724tigxt685uYOdqS8ahyeED/1tW6194lvarXgfYycXt1lj5 >> TGan/hsQQUCDFYA76Gqvfzj//Umytg8Ub1B38X/H1XlMDinqFfcmW8/R7Wwe >> PubXTM3zNS3j3Fl4/+MZS1T3qNlKEMk+jWRC5nYwE7e1aomABY0QbHHGxgPK >> pFl9sm9cOKfWCRXQX4w7mMRspiMosW1X1WbmLe2cU17xtudc0rsEZycPOt3n >> XCwin7/+yxKwp/MSWCk/vR/pY7Q/73Pi4kKRzXpFHLpJnteZ3moATTmuSSQm >> 3Ggb >> =SJmS >> -----END PGP SIGNATURE----- >> ---------------- >> Robert LeBlanc >> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >> >> >> On Sat, Sep 19, 2015 at 1:03 AM, Robert LeBlanc wrote: >>> -----BEGIN PGP SIGNED MESSAGE----- >>> Hash: SHA256 >>> >>> We have had two situations where I/O just seems to be indefinitely >>> blocked on our production cluster today (0.94.3). In the case this >>> morning, it was just normal I/O traffic, no recovery or backfill. The >>> case this evening, we were backfilling to some new OSDs. I would have >>> loved to have bumped up the debugging to get an idea of what was going >>> on, but time was exhausted. The incident this evening I was able to do >>> some additional troubleshooting, but got real anxious after I/O had >>> been blocked for 10 minutes and OPs was getting hot around the collar. >>> >>> Here are the important parts of the logs: >>> [osd.30] >>> 2015-09-18 23:05:36.188251 7efed0ef0700 0 log_channel(cluster) log >>> [WRN] : slow request 30.662958 seconds old, >>> received at 2015-09-18 23:05:05.525220: osd_op(client.3117179.0:18654441 >>> rbd_data.1099d2f67aaea.0000000000000f62 [set-alloc-hint object_size >>> 8388608 write_size 8388608,write 1048576~643072] 4.5ba1672c >>> ack+ondisk+write+known_if_redirected e55919) >>> currently waiting for subops from 32,70,72 >>> >>> [osd.72] >>> 2015-09-18 23:05:19.302985 7f3fa19f8700 0 log_channel(cluster) log >>> [WRN] : slow request 30.200408 seconds old, >>> received at 2015-09-18 23:04:49.102519: osd_op(client.4267090.0:3510311 >>> rbd_data.3f41d41bd65b28.0000000000009e2b [set-alloc-hint object_size >>> 4194304 write_size 4194304,write 1048576~421888] 17.40adcada >>> ack+ondisk+write+known_if_redirected e55919) >>> currently waiting for subops from 2,30,90 >>> >>> The other OSDs listed (32,70,2,90) did not have any errors in the logs >>> about blocked I/O. It seems that osd.30 was waiting for osd.72 and >>> visa versa. I looked at top and iostat of these two hosts and the OSD >>> processes and disk I/O were pretty idle. >>> >>> I know that this isn't a lot to go on. Our cluster is under very heavy >>> load and we get several blocked I/Os every hour, but they usually >>> clear up within 15 seconds. We seem to get I/O blocked when the op >>> latency of the cluster goes above 1 (average from all OSDs as seen by >>> Graphite). >>> >>> Has anyone seen this infinite blocked I/O? Bouncing osd.72 immediately >>> cleared all the blocked I/O and then it was fine after rejoining the >>> cluster. Increasing what logs and to what level would be most >>> beneficial in this case for troubleshooting? >>> >>> I hope this makes sense, it has been a long day. >>> >>> - ---------------- >>> Robert LeBlanc >>> PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 >>> -----BEGIN PGP SIGNATURE----- >>> Version: Mailvelope v1.1.0 >>> Comment: https://www.mailvelope.com >>> >>> wsFcBAEBCAAQBQJV/QiuCRDmVDuy+mK58QAAfskP/A0+RRAtq49pwfJcmuaV >>> LKMsdaOFu0WL1zNLgnj4KOTR1oYyEShXW3Xn0axw1C2U2qXkJQfvMyQ7PTj7 >>> cKqNeZl7rcgwkgXlij1hPYs9tjsetjYXBmmui+CqbSyNNo95aPrtUnWPcYnc >>> K7blP6wuv7p0ddaF8wgw3Jf0GhzlHyykvVlxLYjQWwBh1CTrSzNWcEiHz5NE >>> 9Y/GU5VZn7o8jeJDh6tQGgSbUjdk4NM2WuhyWNEP1klV+x1P51krXYDR7cNC >>> DSWaud1hNtqYdquVPzx0UCcUVR0JfVlEX26uxRLgNd0dDkq+CRXIGhakVU75 >>> Yxf8jwVdbAg1CpGtgHx6bWyho2rrsTzxeul8AFLWtELfod0e5nLsSUfQuQ2c >>> MXrIoyHUcs7ySP3ozazPOdxwBEpiovUZOBy1gl2sCSGvYsmYokHEO0eop2rl >>> kVS4dSAvDezmDhWumH60Y661uzySBGtrMlV/u3nw8vfvLhEAbuE+lLybMmtY >>> nJvJIzbTqFzxaeX4PTWcUhXRNaPp8PDS5obmx5Fpn+AYOeLet/S1Alz1qNM2 >>> 4w34JKwKO92PtDYqzA6cj628fltdLkxFNoz7DFfqxr80DM7ndLukmSkPY+Oq >>> qYOQMoownMnHuL0IrC9Jo8vK07H8agQyLF8/m4c3oTqnzZhh/rPRlPfyHEio >>> Roj5 >>> =ut4B >>> -----END PGP SIGNATURE----- > > -----BEGIN PGP SIGNATURE----- > Version: Mailvelope v1.1.0 > Comment: https://www.mailvelope.com > > wsFcBAEBCAAQBQJV/1coCRDmVDuy+mK58QAAV14QAKzoivHyvrlUBNa3ltti > 7xOQv6QIvRNuaYX1jwUb0JA2JwpcUoZFMukTtrXJmon/MFTbIelIO4tzijLc > kXGqOcV+z0RpXydOR5Rdu+wdb59TxIjmE/mkUHZfdPVSJGQnTjnCJ7WkAwtz > D0RREo7odc8C6AfSYRzJ0RPz9grNflKh4fTAkDGdkWVwRPDMHtB0D7FEviZs > JQlBsBuSNT3OIRqvjQsc0Yl+Xodmr4uaK3yj6+Sx7fQpilS6I5nqRQx0HuRk > K41425/hkAH0Es9PjR54fl6xi8xPf/zLDw8brMjmZQxu0d+sJWkOTDXmmqk8 > l0s0UcC/W2RDxkiTudNPvx4PIy8bm5/l0NGFXiA9uaq42w6l45vARrCHWCc6 > FQkKmQLlW89CzSyCDwYVZEc73F/cr97DFkBQctSXND2FmeLWQNOTsUo02C8n > qTksSKjH8qI0aAj3vnf1GKxFq62dPejnnmOhwxKiTulksAzbIRmr9F2uPDBo > +wDD5acAqVCOYCX7QwxoHSW2HR3AUzf6dLOnM1bSYOM5RF09ZAGw7sBmXYvf > b1CwpIS1hK98Ww8+A+RkEFlyFWJPIO3KEK7z1K5Hjrp0dIdCLcpl3D66v/c8 > SB6B+KNoNh5IP6GTFBu72sOaAyeSnnPrt5oZbJ8Kt7vJ+wg3fvNhh7at/B1l > E/IO > =AVQ7 > -----END PGP SIGNATURE----- _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com