----- Message from Gregory Farnum <greg at inktank.com> --------- Date: Mon, 15 Sep 2014 10:37:07 -0700 From: Gregory Farnum <greg at inktank.com> Subject: Re: OSD troubles on FS+Tiering To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> Cc: ceph-users <ceph-users at lists.ceph.com> > The pidfile bug is already fixed in master/giant branches. > > As for the crashing, I'd try killing all the osd processes and turning > them back on again. It might just be some daemon restart failed, or > your cluster could be sufficiently overloaded that the node disks are > going unresponsive and they're suiciding, or... I restarted them that way, and they eventually got clean again. 'ceph status' printed that 'ecdata' pool had too few pgs, so I changed the amount of pgs from 128 to 256 (with EC k+m=11) After a few minutes I checked the cluster state again: [root at ceph001 ~]# ceph status cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d health HEALTH_WARN 100 pgs down; 155 pgs peering; 81 pgs stale; 240 pgs stuck inactive; 81 pgs stuck stale; 240 pgs stuck unclean; 746 requests are blocked > 32 sec; 'cache' at/near target max; pool ecdata pg_num 256 > pgp_num 128 monmap e1: 3 mons at {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, election epoch 8, quorum 0,1,2 ceph001,ceph002,ceph003 mdsmap e6993: 1/1/1 up {0=ceph003=up:active}, 2 up:standby osdmap e11023: 48 osds: 14 up, 14 in pgmap v160466: 1472 pgs, 4 pools, 3899 GB data, 2374 kobjects 624 GB used, 7615 GB / 8240 GB avail 75 creating 1215 active+clean 100 down+peering 1 active+clean+scrubbing 10 stale 16 stale+active+clean Again 34 OSDS are down.. This time I have the error log, I checked a few osd logs : I checked the first host that was marked down: -17> 2014-09-16 13:27:49.962938 7f5dfe6a3700 5 osd.7 pg_epoch: 8912 pg[2.b0s3(unlocked)] enter Initial -16> 2014-09-16 13:27:50.008842 7f5e02eac700 1 -- 10.143.8.180:6833/53810 <== osd.30 10.141.8.181:0/37396 2524 ==== osd_ping(ping e8912 stamp 2014-09-16 13:27:50.008514) v2 ==== 47+0+0 (3868888299 0 0) 0x18ef7080 con 0x6961600 -15> 2014-09-16 13:27:50.008892 7f5e02eac700 1 -- 10.143.8.180:6833/53810 --> 10.141.8.181:0/37396 -- osd_ping(ping_reply e8912 stamp 2014-09-16 13:27:50.008514) v2 -- ?+0 0x7326900 con 0x6961600 -14> 2014-09-16 13:27:50.009159 7f5e046af700 1 -- 10.141.8.180:6847/53810 <== osd.30 10.141.8.181:0/37396 2524 ==== osd_ping(ping e8912 stamp 2014-09-16 13:27:50.008514) v2 ==== 47+0+0 (3868888299 0 0) 0x2210a760 con 0xadd0420 -13> 2014-09-16 13:27:50.009202 7f5e046af700 1 -- 10.141.8.180:6847/53810 --> 10.141.8.181:0/37396 -- osd_ping(ping_reply e8912 stamp 2014-09-16 13:27:50.008514) v2 -- ?+0 0x14e35a00 con 0xadd0420 -12> 2014-09-16 13:27:50.034378 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] exit Reset 0.127612 1 0.000123 -11> 2014-09-16 13:27:50.034432 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] enter Started -10> 2014-09-16 13:27:50.034452 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] enter Start -9> 2014-09-16 13:27:50.034469 7f5dfeea4700 1 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] state<Start>: transitioning to Stray -8> 2014-09-16 13:27:50.034491 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] exit Start 0.000038 0 0.000000 -7> 2014-09-16 13:27:50.034521 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 pi=104-8911/54 crt=8864'33359 inactive NOTIFY] enter Started/Stray -6> 2014-09-16 13:27:50.034664 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 pi=104-813/36 luod=0'0 crt=8885'35261 active] exit Started/ReplicaActive/RepNotRecovering 7944.878905 22472 0.038180 -5> 2014-09-16 13:27:50.034689 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 pi=104-813/36 luod=0'0 crt=8885'35261 active] exit Started/ReplicaActive 7944.878946 0 0.000000 -4> 2014-09-16 13:27:50.034711 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 pi=104-813/36 luod=0'0 crt=8885'35261 active] exit Started 7945.924923 0 0.000000 -3> 2014-09-16 13:27:50.034732 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 pi=104-813/36 luod=0'0 crt=8885'35261 active] enter Reset -2> 2014-09-16 13:27:50.034869 7f5dfeea4700 5 osd.7 pg_epoch: 8912 pg[2.87s10(unlocked)] enter Initial -1> 2014-09-16 13:27:50.042055 7f5e11981700 5 osd.7 8912 tick 0> 2014-09-16 13:27:50.045856 7f5e1015f700 -1 *** Caught signal (Aborted) ** in thread 7f5e1015f700 ceph version 0.85 (a0c22842db9eaee9840136784e94e50fabe77187) 1: /usr/bin/ceph-osd() [0xa72096] 2: (()+0xf130) [0x7f5e193d7130] 3: (gsignal()+0x39) [0x7f5e17dd5989] 4: (abort()+0x148) [0x7f5e17dd7098] 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f5e186e89d5] 6: (()+0x5e946) [0x7f5e186e6946] 7: (()+0x5e973) [0x7f5e186e6973] 8: (()+0x5eb9f) [0x7f5e186e6b9f] 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x1ef) [0xb5e58f] 10: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x45e) [0xa3b1ee] 11: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x274) [0x9042d4] 12: (KeyValueStore::_split_collection(coll_t, unsigned int, unsigned int, coll_t, KeyValueStore::BufferTransaction&)+0x421) [0x91f091] 13: (KeyValueStore::_do_transaction(ObjectStore::Transaction&, KeyValueStore::BufferTransaction&, ThreadPool::TPHandle*)+0xa4c) [0x920f2c] 14: (KeyValueStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x13f) [0x92385f] 15: (KeyValueStore::_do_op(KeyValueStore::OpSequencer*, ThreadPool::TPHandle&)+0xac) [0x923a7c] 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb10) [0xb4ef50] 17: (ThreadPool::WorkThread::entry()+0x10) [0xb50040] 18: (()+0x7df3) [0x7f5e193cfdf3] 19: (clone()+0x6d) [0x7f5e17e963dd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. On some hosts that followed the crashing : =0 cs=0 l=1 c=0x139ed280).connect error 10.141.8.181:6822/36408, (111) Connection refused -7> 2014-09-16 13:28:36.028858 7f74eb007700 2 -- 10.141.8.180:0/52318 >> 10.141.8.181:6822/36408 pipe(0x18147bc0 sd=41 :0 s=1 pgs =0 cs=0 l=1 c=0x139ed280).fault (111) Connection refused -6> 2014-09-16 13:28:36.029423 7f74e4c96700 2 -- 10.141.8.180:0/52318 >> 10.143.8.181:6815/36408 pipe(0x18147640 sd=81 :0 s=1 pgs =0 cs=0 l=1 c=0x139e91e0).connect error 10.143.8.181:6815/36408, (111) Connection refused -5> 2014-09-16 13:28:36.029443 7f74e4c96700 2 -- 10.141.8.180:0/52318 >> 10.143.8.181:6815/36408 pipe(0x18147640 sd=81 :0 s=1 pgs =0 cs=0 l=1 c=0x139e91e0).fault (111) Connection refused -4> 2014-09-16 13:28:36.101914 7f7509534700 1 -- 10.143.8.180:6801/52318 <== osd.32 10.141.8.182:0/54784 2520 ==== osd_ping(ping e8964 stamp 2014-09-16 13:28:36.101604) v2 ==== 47+0+0 (411091961 0 0) 0x189b50a0 con 0x14a0f7a0 -3> 2014-09-16 13:28:36.101952 7f7509534700 1 -- 10.143.8.180:6801/52318 --> 10.141.8.182:0/54784 -- osd_ping(ping_reply e8941 st amp 2014-09-16 13:28:36.101604) v2 -- ?+0 0x1a0feea0 con 0x14a0f7a0 -2> 2014-09-16 13:28:36.101950 7f750ad37700 1 -- 10.141.8.180:6801/52318 <== osd.32 10.141.8.182:0/54784 2520 ==== osd_ping(ping e8964 stamp 2014-09-16 13:28:36.101604) v2 ==== 47+0+0 (411091961 0 0) 0x1178cce0 con 0x143944c0 -1> 2014-09-16 13:28:36.102005 7f750ad37700 1 -- 10.141.8.180:6801/52318 --> 10.141.8.182:0/54784 -- osd_ping(ping_reply e8941 st amp 2014-09-16 13:28:36.101604) v2 -- ?+0 0x14b0f440 con 0x143944c0 0> 2014-09-16 13:28:36.183818 7f751681f700 -1 os/GenericObjectMap.cc: In function 'int GenericObjectMap::list_objects(const coll_ t&, ghobject_t, int, std::vector<ghobject_t>*, ghobject_t*)' thread 7f751681f700 time 2014-09-16 13:28:36.181333 os/GenericObjectMap.cc: 1094: FAILED assert(start <= header.oid) ceph version 0.85 (a0c22842db9eaee9840136784e94e50fabe77187) 1: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t* )+0x45e) [0xa3b1ee] 2: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x274) [0x9042d4] 3: (KeyValueStore::_split_collection(coll_t, unsigned int, unsigned int, coll_t, KeyValueStore::BufferTransaction&)+0x421) [0x91f091] 4: (KeyValueStore::_do_transaction(ObjectStore::Transaction&, KeyValueStore::BufferTransaction&, ThreadPool::TPHandle*)+0xa4c) [0x920 f2c] 5: (KeyValueStore::_do_transactions(std::list<ObjectStore::Transaction*, std::allocator<ObjectStore::Transaction*> >&, unsigned long, ThreadPool::TPHandle*)+0x13f) [0x92385f] 6: (KeyValueStore::_do_op(KeyValueStore::OpSequencer*, ThreadPool::TPHandle&)+0xac) [0x923a7c] 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb10) [0xb4ef50] 8: (ThreadPool::WorkThread::entry()+0x10) [0xb50040] 9: (()+0x7df3) [0x7f7520317df3] 10: (clone()+0x6d) [0x7f751edde3dd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. I tried to restart the crashed OSDS again, but it always fails instantly now with the above stacktrace .. Any ideas with this? Thanks a lot! Kenneth > -Greg > Software Engineer #42 @ http://inktank.com | http://ceph.com > > > On Mon, Sep 15, 2014 at 5:43 AM, Kenneth Waegeman > <Kenneth.Waegeman at ugent.be> wrote: >> Hi, >> >> I have some strange OSD problems. Before the weekend I started some rsync >> tests over CephFS, on a cache pool with underlying EC KV pool. Today the >> cluster is completely degraded: >> >> [root at ceph003 ~]# ceph status >> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >> health HEALTH_WARN 19 pgs backfill_toofull; 403 pgs degraded; 168 pgs >> down; 8 pgs incomplete; 168 pgs peering; 61 pgs stale; 403 pgs stuck >> degraded; 176 pgs stuck inactive; 61 pgs stuck stale; 589 pgs stuck unclean; >> 403 pgs stuck undersized; 403 pgs undersized; 300 requests are blocked > 32 >> sec; recovery 15170/27902361 objects degraded (0.054%); 1922/27902361 >> objects misplaced (0.007%); 1 near full osd(s) >> monmap e1: 3 mons at >> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, >> election epoch 8, quorum 0,1,2 ceph001,ceph002,ceph003 >> mdsmap e5: 1/1/1 up {0=ceph003=up:active}, 2 up:standby >> osdmap e719: 48 osds: 18 up, 18 in >> pgmap v144887: 1344 pgs, 4 pools, 4139 GB data, 2624 kobjects >> 2282 GB used, 31397 GB / 33680 GB avail >> 15170/27902361 objects degraded (0.054%); 1922/27902361 objects >> misplaced (0.007%) >> 68 down+remapped+peering >> 1 active >> 754 active+clean >> 1 stale+incomplete >> 1 stale+active+clean+scrubbing >> 14 active+undersized+degraded+remapped >> 7 incomplete >> 100 down+peering >> 9 active+remapped >> 59 stale+active+undersized+degraded >> 19 active+undersized+degraded+remapped+backfill_toofull >> 311 active+undersized+degraded >> >> I tried to figure out what happened in the global logs: >> >> 2014-09-13 08:01:19.433313 mon.0 10.141.8.180:6789/0 66076 : [INF] pgmap >> v65892: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used, 126 TB / >> 129 TB avail; 4159 kB/s wr, 45 op/s >> 2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF] pgmap >> v65893: 1344 pgs: 1344 >> 2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF] pgmap >> v65893: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used, 126 TB / >> 129 TB avail; 561 kB/s wr, 11 op/s >> 2014-09-13 08:01:20.777988 mon.0 10.141.8.180:6789/0 66081 : [INF] osd.19 >> 10.141.8.181:6809/29664 failed (3 reports from 3 peers after 20.000079 >= >> grace 20.000000) >> 2014-09-13 08:01:21.455887 mon.0 10.141.8.180:6789/0 66083 : [INF] osdmap >> e117: 48 osds: 47 up, 48 in >> 2014-09-13 08:01:21.462084 mon.0 10.141.8.180:6789/0 66084 : [INF] pgmap >> v65894: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used, 126 TB / >> 129 TB avail; 1353 kB/s wr, 13 op/s >> 2014-09-13 08:01:21.477007 mon.0 10.141.8.180:6789/0 66085 : [INF] pgmap >> v65895: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB data, >> 3116 GB used, 126 TB / 129 TB avail; 2300 kB/s wr, 21 op/s >> 2014-09-13 08:01:22.456055 mon.0 10.141.8.180:6789/0 66086 : [INF] osdmap >> e118: 48 osds: 47 up, 48 in >> 2014-09-13 08:01:22.462590 mon.0 10.141.8.180:6789/0 66087 : [INF] pgmap >> v65896: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB data, >> 3116 GB used, 126 TB / 129 TB avail; 13686 kB/s wr, 5 op/s >> 2014-09-13 08:01:23.464302 mon.0 10.141.8.180:6789/0 66088 : [INF] pgmap >> v65897: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB data, >> 3116 GB used, 126 TB / 129 TB avail; 11075 kB/s wr, 4 op/s >> 2014-09-13 08:01:24.477467 mon.0 10.141.8.180:6789/0 66089 : [INF] pgmap >> v65898: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB data, >> 3116 GB used, 126 TB / 129 TB avail; 4932 kB/s wr, 38 op/s >> 2014-09-13 08:01:25.481027 mon.0 10.141.8.180:6789/0 66090 : [INF] pgmap >> v65899: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB data, >> 3116 GB used, 126 TB / 129 TB avail; 5726 kB/s wr, 64 op/s >> 2014-09-13 08:01:19.336173 osd.1 10.141.8.180:6803/26712 54442 : [WRN] 1 >> slow requests, 1 included below; oldest blocked for > 30.000137 secs >> 2014-09-13 08:01:19.336341 osd.1 10.141.8.180:6803/26712 54443 : [WRN] slow >> request 30.000137 seconds old, received at 2014-09-13 08:00:49.335339: >> osd_op(client.7448.1:17751783 10000203eac.0000000e [write 0~319488 >> [1 at -1],startsync 0~0] 1.b >> 6c3a3a9 snapc 1=[] ondisk+write e116) currently reached pg >> 2014-09-13 08:01:20.337602 osd.1 10.141.8.180:6803/26712 54444 : [WRN] 7 >> slow requests, 6 included below; oldest blocked for > 31.001947 secs >> 2014-09-13 08:01:20.337688 osd.1 10.141.8.180:6803/26712 54445 : [WRN] slow >> request 30.998110 seconds old, received at 2014-09-13 08:00:49.339176: >> osd_op(client.7448.1:17751787 10000203eac.0000000e [write 319488~65536 >> [1 at -1],startsync 0~0] >> >> >> This is happening OSD after OSD.. >> >> I tried to check the individual log of the osds, but all the individual logs >> stop abruptly (also from the osds that are still running): >> >> 2014-09-12 14:25:51.205276 7f3517209700 0 log [WRN] : 41 slow requests, 1 >> included below; oldest blocked for > 38.118088 secs >> 2014-09-12 14:25:51.205337 7f3517209700 0 log [WRN] : slow request >> 36.558286 seconds old, received at 2014-09-12 14:25:14.646836: >> osd_op(client.7448.1:2458392 1000006328f.0000000b [write 3989504~204800 >> [1 at -1],startsync 0~0] 1.9337bf4b snapc 1=[] ondisk+write e116) currently >> reached pg >> 2014-09-12 14:25:53.205586 7f3517209700 0 log [WRN] : 30 slow requests, 1 >> included below; oldest blocked for > 40.118530 secs >> 2014-09-12 14:25:53.205679 7f3517209700 0 log [WRN] : slow request >> 30.541026 seconds old, received at 2014-09-12 14:25:22.664538: >> osd_op(client.7448.1:2460291 100000632b7.00000000 [write 0~691 >> [1 at -1],startsync 0~0] 1.994248a8 snapc 1=[] ondisk+write e116) currently >> reached pg >> 2014-09-12 17:52:40.503917 7f34e8ed2700 0 -- 10.141.8.181:6809/29664 >> >> 10.141.8.181:6847/62389 pipe(0x247ce040 sd=327 :6809 s=0 pgs=0 cs=0 l=1 >> c=0x1bc8b9c0).accept replacing existing (lossy) channel (new one lossy=1) >> >> I *think* the absence of the logs is some issue related to another issue I >> just found (http://tracker.ceph.com/issues/9470). >> >> So I can't found out the original problem with the log files.. >> >> Is there any other way I can find out what started the crashing of 30 osds ? >> >> Thanks!! >> >> Kenneth >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users at lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com ----- End message from Gregory Farnum <greg at inktank.com> ----- -- Met vriendelijke groeten, Kenneth Waegeman