OSD troubles on FS+Tiering

Kenneth.Waegeman@xxxxxxxx (Kenneth Waegeman) · Mon, 15 Sep 2014 14:43:54 +0200

Hi,

I have some strange OSD problems. Before the weekend I started some  
rsync tests over CephFS, on a cache pool with underlying EC KV pool.  
Today the cluster is completely degraded:

[root at ceph003 ~]# ceph status
     cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d
      health HEALTH_WARN 19 pgs backfill_toofull; 403 pgs degraded;  
168 pgs down; 8 pgs incomplete; 168 pgs peering; 61 pgs stale; 403 pgs  
stuck degraded; 176 pgs stuck inactive; 61 pgs stuck stale; 589 pgs  
stuck unclean; 403 pgs stuck undersized; 403 pgs undersized; 300  
requests are blocked > 32 sec; recovery 15170/27902361 objects  
degraded (0.054%); 1922/27902361 objects misplaced (0.007%); 1 near  
full osd(s)
      monmap e1: 3 mons at  
{ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 8, quorum 0,1,2  
ceph001,ceph002,ceph003
      mdsmap e5: 1/1/1 up {0=ceph003=up:active}, 2 up:standby
      osdmap e719: 48 osds: 18 up, 18 in
       pgmap v144887: 1344 pgs, 4 pools, 4139 GB data, 2624 kobjects
             2282 GB used, 31397 GB / 33680 GB avail
             15170/27902361 objects degraded (0.054%); 1922/27902361  
objects misplaced (0.007%)
                   68 down+remapped+peering
                    1 active
                  754 active+clean
                    1 stale+incomplete
                    1 stale+active+clean+scrubbing
                   14 active+undersized+degraded+remapped
                    7 incomplete
                  100 down+peering
                    9 active+remapped
                   59 stale+active+undersized+degraded
                   19 active+undersized+degraded+remapped+backfill_toofull
                  311 active+undersized+degraded

I tried to figure out what happened in the global logs:

2014-09-13 08:01:19.433313 mon.0 10.141.8.180:6789/0 66076 : [INF]  
pgmap v65892: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used,  
126 TB / 129 TB avail; 4159 kB/s wr, 45 op/s
2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF]  
pgmap v65893: 1344 pgs: 1344
2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF]  
pgmap v65893: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used,  
126 TB / 129 TB avail; 561 kB/s wr, 11 op/s
2014-09-13 08:01:20.777988 mon.0 10.141.8.180:6789/0 66081 : [INF]  
osd.19 10.141.8.181:6809/29664 failed (3 reports from 3 peers after  
20.000079 >= grace 20.000000)
2014-09-13 08:01:21.455887 mon.0 10.141.8.180:6789/0 66083 : [INF]  
osdmap e117: 48 osds: 47 up, 48 in
2014-09-13 08:01:21.462084 mon.0 10.141.8.180:6789/0 66084 : [INF]  
pgmap v65894: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used,  
126 TB / 129 TB avail; 1353 kB/s wr, 13 op/s
2014-09-13 08:01:21.477007 mon.0 10.141.8.180:6789/0 66085 : [INF]  
pgmap v65895: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 2300 kB/s wr, 21 op/s
2014-09-13 08:01:22.456055 mon.0 10.141.8.180:6789/0 66086 : [INF]  
osdmap e118: 48 osds: 47 up, 48 in
2014-09-13 08:01:22.462590 mon.0 10.141.8.180:6789/0 66087 : [INF]  
pgmap v65896: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 13686 kB/s wr, 5 op/s
2014-09-13 08:01:23.464302 mon.0 10.141.8.180:6789/0 66088 : [INF]  
pgmap v65897: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 11075 kB/s wr, 4 op/s
2014-09-13 08:01:24.477467 mon.0 10.141.8.180:6789/0 66089 : [INF]  
pgmap v65898: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 4932 kB/s wr, 38 op/s
2014-09-13 08:01:25.481027 mon.0 10.141.8.180:6789/0 66090 : [INF]  
pgmap v65899: 1344 pgs: 187 stale+active+clean, 1157 active+clean;  
2606 GB data, 3116 GB used, 126 TB / 129 TB avail; 5726 kB/s wr, 64 op/s
2014-09-13 08:01:19.336173 osd.1 10.141.8.180:6803/26712 54442 : [WRN]  
1 slow requests, 1 included below; oldest blocked for > 30.000137 secs
2014-09-13 08:01:19.336341 osd.1 10.141.8.180:6803/26712 54443 : [WRN]  
slow request 30.000137 seconds old, received at 2014-09-13  
08:00:49.335339: osd_op(client.7448.1:17751783 10000203eac.0000000e  
[write 0~319488 [1 at -1],startsync 0~0] 1.b
6c3a3a9 snapc 1=[] ondisk+write e116) currently reached pg
2014-09-13 08:01:20.337602 osd.1 10.141.8.180:6803/26712 54444 : [WRN]  
7 slow requests, 6 included below; oldest blocked for > 31.001947 secs
2014-09-13 08:01:20.337688 osd.1 10.141.8.180:6803/26712 54445 : [WRN]  
slow request 30.998110 seconds old, received at 2014-09-13  
08:00:49.339176: osd_op(client.7448.1:17751787 10000203eac.0000000e  
[write 319488~65536 [1 at -1],startsync 0~0]

This is happening OSD after OSD..

I tried to check the individual log of the osds, but all the  
individual logs stop abruptly (also from the osds that are still  
running):

2014-09-12 14:25:51.205276 7f3517209700  0 log [WRN] : 41 slow  
requests, 1 included below; oldest blocked for > 38.118088 secs
2014-09-12 14:25:51.205337 7f3517209700  0 log [WRN] : slow request  
36.558286 seconds old, received at 2014-09-12 14:25:14.646836:  
osd_op(client.7448.1:2458392 1000006328f.0000000b [write  
3989504~204800 [1 at -1],startsync 0~0] 1.9337bf4b snapc 1=[]  
ondisk+write e116) currently reached pg
2014-09-12 14:25:53.205586 7f3517209700  0 log [WRN] : 30 slow  
requests, 1 included below; oldest blocked for > 40.118530 secs
2014-09-12 14:25:53.205679 7f3517209700  0 log [WRN] : slow request  
30.541026 seconds old, received at 2014-09-12 14:25:22.664538:  
osd_op(client.7448.1:2460291 100000632b7.00000000 [write 0~691  
[1 at -1],startsync 0~0] 1.994248a8 snapc 1=[] ondisk+write e116)  
currently reached pg
2014-09-12 17:52:40.503917 7f34e8ed2700  0 -- 10.141.8.181:6809/29664  
 >> 10.141.8.181:6847/62389 pipe(0x247ce040 sd=327 :6809 s=0 pgs=0  
cs=0 l=1 c=0x1bc8b9c0).accept replacing existing (lossy) channel (new  
one lossy=1)

I *think* the absence of the logs is some issue related to another  
issue I just found (http://tracker.ceph.com/issues/9470).

So I can't found out the original problem with the log files..

Is there any other way I can find out what started the crashing of 30 osds ?

Thanks!!

Kenneth