Heh, you'll have to talk to Haomai about issues with the KeyValueStore, but I know he's found a number of issues in the version of it that went to 0.85. In future please flag when you're running with experimental stuff; it helps direct attention to the right places! ;) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Tue, Sep 16, 2014 at 5:28 AM, Kenneth Waegeman <Kenneth.Waegeman at ugent.be> wrote: > > ----- Message from Gregory Farnum <greg at inktank.com> --------- > Date: Mon, 15 Sep 2014 10:37:07 -0700 > From: Gregory Farnum <greg at inktank.com> > Subject: Re: OSD troubles on FS+Tiering > To: Kenneth Waegeman <Kenneth.Waegeman at ugent.be> > Cc: ceph-users <ceph-users at lists.ceph.com> > > >> The pidfile bug is already fixed in master/giant branches. >> >> As for the crashing, I'd try killing all the osd processes and turning >> them back on again. It might just be some daemon restart failed, or >> your cluster could be sufficiently overloaded that the node disks are >> going unresponsive and they're suiciding, or... > > > I restarted them that way, and they eventually got clean again. > 'ceph status' printed that 'ecdata' pool had too few pgs, so I changed the > amount of pgs from 128 to 256 (with EC k+m=11) > After a few minutes I checked the cluster state again: > > [root at ceph001 ~]# ceph status > cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d > health HEALTH_WARN 100 pgs down; 155 pgs peering; 81 pgs stale; 240 pgs > stuck inactive; 81 pgs stuck stale; 240 pgs stuck unclean; 746 requests are > blocked > 32 sec; 'cache' at/near target max; pool ecdata pg_num 256 > > pgp_num 128 > monmap e1: 3 mons at > {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, > election epoch 8, quorum 0,1,2 ceph001,ceph002,ceph003 > mdsmap e6993: 1/1/1 up {0=ceph003=up:active}, 2 up:standby > osdmap e11023: 48 osds: 14 up, 14 in > pgmap v160466: 1472 pgs, 4 pools, 3899 GB data, 2374 kobjects > 624 GB used, 7615 GB / 8240 GB avail > 75 creating > 1215 active+clean > 100 down+peering > 1 active+clean+scrubbing > 10 stale > 16 stale+active+clean > > Again 34 OSDS are down.. This time I have the error log, I checked a few osd > logs : > > I checked the first host that was marked down: > > -17> 2014-09-16 13:27:49.962938 7f5dfe6a3700 5 osd.7 pg_epoch: 8912 > pg[2.b0s3(unlocked)] enter Initial > -16> 2014-09-16 13:27:50.008842 7f5e02eac700 1 -- > 10.143.8.180:6833/53810 <== osd.30 10.141.8.181:0/37396 2524 ==== > osd_ping(ping e8912 stamp 2014-09-16 13:27:50.008514) v2 ==== 47+0+0 > (3868888299 0 0) 0x18ef7080 con 0x6961600 > -15> 2014-09-16 13:27:50.008892 7f5e02eac700 1 -- > 10.143.8.180:6833/53810 --> 10.141.8.181:0/37396 -- osd_ping(ping_reply > e8912 stamp 2014-09-16 13:27:50.008514) v2 -- ?+0 0x7326900 con 0x6961600 > -14> 2014-09-16 13:27:50.009159 7f5e046af700 1 -- > 10.141.8.180:6847/53810 <== osd.30 10.141.8.181:0/37396 2524 ==== > osd_ping(ping e8912 stamp 2014-09-16 13:27:50.008514) v2 ==== 47+0+0 > (3868888299 0 0) 0x2210a760 con 0xadd0420 > -13> 2014-09-16 13:27:50.009202 7f5e046af700 1 -- > 10.141.8.180:6847/53810 --> 10.141.8.181:0/37396 -- osd_ping(ping_reply > e8912 stamp 2014-09-16 13:27:50.008514) v2 -- ?+0 0x14e35a00 con 0xadd0420 > -12> 2014-09-16 13:27:50.034378 7f5dfeea4700 5 osd.7 pg_epoch: 8912 > pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 > les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 > pi=104-8911/54 crt=8864'33359 inactive NOTIFY] exit Reset 0.127612 1 > 0.000123 > -11> 2014-09-16 13:27:50.034432 7f5dfeea4700 5 osd.7 pg_epoch: 8912 > pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 > les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 > pi=104-8911/54 crt=8864'33359 inactive NOTIFY] enter Started > -10> 2014-09-16 13:27:50.034452 7f5dfeea4700 5 osd.7 pg_epoch: 8912 > pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 > les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 > pi=104-8911/54 crt=8864'33359 inactive NOTIFY] enter Start > -9> 2014-09-16 13:27:50.034469 7f5dfeea4700 1 osd.7 pg_epoch: 8912 > pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 > les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 > pi=104-8911/54 crt=8864'33359 inactive NOTIFY] state<Start>: transitioning > to Stray > -8> 2014-09-16 13:27:50.034491 7f5dfeea4700 5 osd.7 pg_epoch: 8912 > pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 > les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 > pi=104-8911/54 crt=8864'33359 inactive NOTIFY] exit Start 0.000038 0 > 0.000000 > -7> 2014-09-16 13:27:50.034521 7f5dfeea4700 5 osd.7 pg_epoch: 8912 > pg[2.71s3( v 8864'33363 (374'30362,8864'33363] local-les=813 n=16075 ec=104 > les/c 813/815 805/8912/791) [24,10,8,7,45,27,30,46,38,4,23] r=3 lpr=8912 > pi=104-8911/54 crt=8864'33359 inactive NOTIFY] enter Started/Stray > -6> 2014-09-16 13:27:50.034664 7f5dfeea4700 5 osd.7 pg_epoch: 8912 > pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 > les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 > pi=104-813/36 luod=0'0 crt=8885'35261 active] exit > Started/ReplicaActive/RepNotRecovering 7944.878905 22472 0.038180 > -5> 2014-09-16 13:27:50.034689 7f5dfeea4700 5 osd.7 pg_epoch: 8912 > pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 > les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 > pi=104-813/36 luod=0'0 crt=8885'35261 active] exit Started/ReplicaActive > 7944.878946 0 0.000000 > -4> 2014-09-16 13:27:50.034711 7f5dfeea4700 5 osd.7 pg_epoch: 8912 > pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 > les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 > pi=104-813/36 luod=0'0 crt=8885'35261 active] exit Started 7945.924923 0 > 0.000000 > -3> 2014-09-16 13:27:50.034732 7f5dfeea4700 5 osd.7 pg_epoch: 8912 > pg[2.7s10( v 8890'35265 (374'32264,8890'35265] local-les=816 n=32002 ec=104 > les/c 816/818 805/814/730) [6,30,22,13,39,15,12,5,11,42,7] r=10 lpr=814 > pi=104-813/36 luod=0'0 crt=8885'35261 active] enter Reset > -2> 2014-09-16 13:27:50.034869 7f5dfeea4700 5 osd.7 pg_epoch: 8912 > pg[2.87s10(unlocked)] enter Initial > -1> 2014-09-16 13:27:50.042055 7f5e11981700 5 osd.7 8912 tick > 0> 2014-09-16 13:27:50.045856 7f5e1015f700 -1 *** Caught signal > (Aborted) ** > in thread 7f5e1015f700 > > ceph version 0.85 (a0c22842db9eaee9840136784e94e50fabe77187) > 1: /usr/bin/ceph-osd() [0xa72096] > 2: (()+0xf130) [0x7f5e193d7130] > 3: (gsignal()+0x39) [0x7f5e17dd5989] > 4: (abort()+0x148) [0x7f5e17dd7098] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x165) [0x7f5e186e89d5] > 6: (()+0x5e946) [0x7f5e186e6946] > 7: (()+0x5e973) [0x7f5e186e6973] > 8: (()+0x5eb9f) [0x7f5e186e6b9f] > 9: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x1ef) [0xb5e58f] > 10: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, > std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t*)+0x45e) > [0xa3b1ee] > 11: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, > snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> >*, > ghobject_t*)+0x274) [0x9042d4] > 12: (KeyValueStore::_split_collection(coll_t, unsigned int, unsigned int, > coll_t, KeyValueStore::BufferTransaction&)+0x421) [0x91f091] > 13: (KeyValueStore::_do_transaction(ObjectStore::Transaction&, > KeyValueStore::BufferTransaction&, ThreadPool::TPHandle*)+0xa4c) [0x920f2c] > 14: (KeyValueStore::_do_transactions(std::list<ObjectStore::Transaction*, > std::allocator<ObjectStore::Transaction*> >&, unsigned long, > ThreadPool::TPHandle*)+0x13f) [0x92385f] > 15: (KeyValueStore::_do_op(KeyValueStore::OpSequencer*, > ThreadPool::TPHandle&)+0xac) [0x923a7c] > 16: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb10) [0xb4ef50] > 17: (ThreadPool::WorkThread::entry()+0x10) [0xb50040] > 18: (()+0x7df3) [0x7f5e193cfdf3] > 19: (clone()+0x6d) [0x7f5e17e963dd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > On some hosts that followed the crashing : > > =0 cs=0 l=1 c=0x139ed280).connect error 10.141.8.181:6822/36408, (111) > Connection refused > -7> 2014-09-16 13:28:36.028858 7f74eb007700 2 -- 10.141.8.180:0/52318 >>> 10.141.8.181:6822/36408 pipe(0x18147bc0 sd=41 :0 s=1 pgs > =0 cs=0 l=1 c=0x139ed280).fault (111) Connection refused > -6> 2014-09-16 13:28:36.029423 7f74e4c96700 2 -- 10.141.8.180:0/52318 >>> 10.143.8.181:6815/36408 pipe(0x18147640 sd=81 :0 s=1 pgs > =0 cs=0 l=1 c=0x139e91e0).connect error 10.143.8.181:6815/36408, (111) > Connection refused > -5> 2014-09-16 13:28:36.029443 7f74e4c96700 2 -- 10.141.8.180:0/52318 >>> 10.143.8.181:6815/36408 pipe(0x18147640 sd=81 :0 s=1 pgs > =0 cs=0 l=1 c=0x139e91e0).fault (111) Connection refused > -4> 2014-09-16 13:28:36.101914 7f7509534700 1 -- > 10.143.8.180:6801/52318 <== osd.32 10.141.8.182:0/54784 2520 ==== > osd_ping(ping > e8964 stamp 2014-09-16 13:28:36.101604) v2 ==== 47+0+0 (411091961 0 0) > 0x189b50a0 con 0x14a0f7a0 > -3> 2014-09-16 13:28:36.101952 7f7509534700 1 -- > 10.143.8.180:6801/52318 --> 10.141.8.182:0/54784 -- osd_ping(ping_reply > e8941 st > amp 2014-09-16 13:28:36.101604) v2 -- ?+0 0x1a0feea0 con 0x14a0f7a0 > -2> 2014-09-16 13:28:36.101950 7f750ad37700 1 -- > 10.141.8.180:6801/52318 <== osd.32 10.141.8.182:0/54784 2520 ==== > osd_ping(ping > e8964 stamp 2014-09-16 13:28:36.101604) v2 ==== 47+0+0 (411091961 0 0) > 0x1178cce0 con 0x143944c0 > -1> 2014-09-16 13:28:36.102005 7f750ad37700 1 -- > 10.141.8.180:6801/52318 --> 10.141.8.182:0/54784 -- osd_ping(ping_reply > e8941 st > amp 2014-09-16 13:28:36.101604) v2 -- ?+0 0x14b0f440 con 0x143944c0 > 0> 2014-09-16 13:28:36.183818 7f751681f700 -1 os/GenericObjectMap.cc: > In function 'int GenericObjectMap::list_objects(const coll_ > t&, ghobject_t, int, std::vector<ghobject_t>*, ghobject_t*)' thread > 7f751681f700 time 2014-09-16 13:28:36.181333 > os/GenericObjectMap.cc: 1094: FAILED assert(start <= header.oid) > > ceph version 0.85 (a0c22842db9eaee9840136784e94e50fabe77187) > 1: (GenericObjectMap::list_objects(coll_t const&, ghobject_t, int, > std::vector<ghobject_t, std::allocator<ghobject_t> >*, ghobject_t* > )+0x45e) [0xa3b1ee] > 2: (KeyValueStore::collection_list_partial(coll_t, ghobject_t, int, int, > snapid_t, std::vector<ghobject_t, std::allocator<ghobject_t> > >*, ghobject_t*)+0x274) [0x9042d4] > 3: (KeyValueStore::_split_collection(coll_t, unsigned int, unsigned int, > coll_t, KeyValueStore::BufferTransaction&)+0x421) [0x91f091] > 4: (KeyValueStore::_do_transaction(ObjectStore::Transaction&, > KeyValueStore::BufferTransaction&, ThreadPool::TPHandle*)+0xa4c) [0x920 > f2c] > 5: (KeyValueStore::_do_transactions(std::list<ObjectStore::Transaction*, > std::allocator<ObjectStore::Transaction*> >&, unsigned long, > ThreadPool::TPHandle*)+0x13f) [0x92385f] > 6: (KeyValueStore::_do_op(KeyValueStore::OpSequencer*, > ThreadPool::TPHandle&)+0xac) [0x923a7c] > 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xb10) [0xb4ef50] > 8: (ThreadPool::WorkThread::entry()+0x10) [0xb50040] > 9: (()+0x7df3) [0x7f7520317df3] > 10: (clone()+0x6d) [0x7f751edde3dd] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > I tried to restart the crashed OSDS again, but it always fails instantly now > with the above stacktrace .. > > Any ideas with this? > Thanks a lot! > Kenneth > > >> -Greg >> Software Engineer #42 @ http://inktank.com | http://ceph.com >> >> >> On Mon, Sep 15, 2014 at 5:43 AM, Kenneth Waegeman >> <Kenneth.Waegeman at ugent.be> wrote: >>> >>> Hi, >>> >>> I have some strange OSD problems. Before the weekend I started some rsync >>> tests over CephFS, on a cache pool with underlying EC KV pool. Today the >>> cluster is completely degraded: >>> >>> [root at ceph003 ~]# ceph status >>> cluster 82766e04-585b-49a6-a0ac-c13d9ffd0a7d >>> health HEALTH_WARN 19 pgs backfill_toofull; 403 pgs degraded; 168 >>> pgs >>> down; 8 pgs incomplete; 168 pgs peering; 61 pgs stale; 403 pgs stuck >>> degraded; 176 pgs stuck inactive; 61 pgs stuck stale; 589 pgs stuck >>> unclean; >>> 403 pgs stuck undersized; 403 pgs undersized; 300 requests are blocked > >>> 32 >>> sec; recovery 15170/27902361 objects degraded (0.054%); 1922/27902361 >>> objects misplaced (0.007%); 1 near full osd(s) >>> monmap e1: 3 mons at >>> >>> {ceph001=10.141.8.180:6789/0,ceph002=10.141.8.181:6789/0,ceph003=10.141.8.182:6789/0}, >>> election epoch 8, quorum 0,1,2 ceph001,ceph002,ceph003 >>> mdsmap e5: 1/1/1 up {0=ceph003=up:active}, 2 up:standby >>> osdmap e719: 48 osds: 18 up, 18 in >>> pgmap v144887: 1344 pgs, 4 pools, 4139 GB data, 2624 kobjects >>> 2282 GB used, 31397 GB / 33680 GB avail >>> 15170/27902361 objects degraded (0.054%); 1922/27902361 >>> objects >>> misplaced (0.007%) >>> 68 down+remapped+peering >>> 1 active >>> 754 active+clean >>> 1 stale+incomplete >>> 1 stale+active+clean+scrubbing >>> 14 active+undersized+degraded+remapped >>> 7 incomplete >>> 100 down+peering >>> 9 active+remapped >>> 59 stale+active+undersized+degraded >>> 19 active+undersized+degraded+remapped+backfill_toofull >>> 311 active+undersized+degraded >>> >>> I tried to figure out what happened in the global logs: >>> >>> 2014-09-13 08:01:19.433313 mon.0 10.141.8.180:6789/0 66076 : [INF] pgmap >>> v65892: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used, 126 TB / >>> 129 TB avail; 4159 kB/s wr, 45 op/s >>> 2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF] pgmap >>> v65893: 1344 pgs: 1344 >>> 2014-09-13 08:01:20.443019 mon.0 10.141.8.180:6789/0 66078 : [INF] pgmap >>> v65893: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used, 126 TB / >>> 129 TB avail; 561 kB/s wr, 11 op/s >>> 2014-09-13 08:01:20.777988 mon.0 10.141.8.180:6789/0 66081 : [INF] osd.19 >>> 10.141.8.181:6809/29664 failed (3 reports from 3 peers after 20.000079 >= >>> grace 20.000000) >>> 2014-09-13 08:01:21.455887 mon.0 10.141.8.180:6789/0 66083 : [INF] osdmap >>> e117: 48 osds: 47 up, 48 in >>> 2014-09-13 08:01:21.462084 mon.0 10.141.8.180:6789/0 66084 : [INF] pgmap >>> v65894: 1344 pgs: 1344 active+clean; 2606 GB data, 3116 GB used, 126 TB / >>> 129 TB avail; 1353 kB/s wr, 13 op/s >>> 2014-09-13 08:01:21.477007 mon.0 10.141.8.180:6789/0 66085 : [INF] pgmap >>> v65895: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB >>> data, >>> 3116 GB used, 126 TB / 129 TB avail; 2300 kB/s wr, 21 op/s >>> 2014-09-13 08:01:22.456055 mon.0 10.141.8.180:6789/0 66086 : [INF] osdmap >>> e118: 48 osds: 47 up, 48 in >>> 2014-09-13 08:01:22.462590 mon.0 10.141.8.180:6789/0 66087 : [INF] pgmap >>> v65896: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB >>> data, >>> 3116 GB used, 126 TB / 129 TB avail; 13686 kB/s wr, 5 op/s >>> 2014-09-13 08:01:23.464302 mon.0 10.141.8.180:6789/0 66088 : [INF] pgmap >>> v65897: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB >>> data, >>> 3116 GB used, 126 TB / 129 TB avail; 11075 kB/s wr, 4 op/s >>> 2014-09-13 08:01:24.477467 mon.0 10.141.8.180:6789/0 66089 : [INF] pgmap >>> v65898: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB >>> data, >>> 3116 GB used, 126 TB / 129 TB avail; 4932 kB/s wr, 38 op/s >>> 2014-09-13 08:01:25.481027 mon.0 10.141.8.180:6789/0 66090 : [INF] pgmap >>> v65899: 1344 pgs: 187 stale+active+clean, 1157 active+clean; 2606 GB >>> data, >>> 3116 GB used, 126 TB / 129 TB avail; 5726 kB/s wr, 64 op/s >>> 2014-09-13 08:01:19.336173 osd.1 10.141.8.180:6803/26712 54442 : [WRN] 1 >>> slow requests, 1 included below; oldest blocked for > 30.000137 secs >>> 2014-09-13 08:01:19.336341 osd.1 10.141.8.180:6803/26712 54443 : [WRN] >>> slow >>> request 30.000137 seconds old, received at 2014-09-13 08:00:49.335339: >>> osd_op(client.7448.1:17751783 10000203eac.0000000e [write 0~319488 >>> [1 at -1],startsync 0~0] 1.b >>> 6c3a3a9 snapc 1=[] ondisk+write e116) currently reached pg >>> 2014-09-13 08:01:20.337602 osd.1 10.141.8.180:6803/26712 54444 : [WRN] 7 >>> slow requests, 6 included below; oldest blocked for > 31.001947 secs >>> 2014-09-13 08:01:20.337688 osd.1 10.141.8.180:6803/26712 54445 : [WRN] >>> slow >>> request 30.998110 seconds old, received at 2014-09-13 08:00:49.339176: >>> osd_op(client.7448.1:17751787 10000203eac.0000000e [write 319488~65536 >>> [1 at -1],startsync 0~0] >>> >>> >>> This is happening OSD after OSD.. >>> >>> I tried to check the individual log of the osds, but all the individual >>> logs >>> stop abruptly (also from the osds that are still running): >>> >>> 2014-09-12 14:25:51.205276 7f3517209700 0 log [WRN] : 41 slow requests, >>> 1 >>> included below; oldest blocked for > 38.118088 secs >>> 2014-09-12 14:25:51.205337 7f3517209700 0 log [WRN] : slow request >>> 36.558286 seconds old, received at 2014-09-12 14:25:14.646836: >>> osd_op(client.7448.1:2458392 1000006328f.0000000b [write 3989504~204800 >>> [1 at -1],startsync 0~0] 1.9337bf4b snapc 1=[] ondisk+write e116) currently >>> reached pg >>> 2014-09-12 14:25:53.205586 7f3517209700 0 log [WRN] : 30 slow requests, >>> 1 >>> included below; oldest blocked for > 40.118530 secs >>> 2014-09-12 14:25:53.205679 7f3517209700 0 log [WRN] : slow request >>> 30.541026 seconds old, received at 2014-09-12 14:25:22.664538: >>> osd_op(client.7448.1:2460291 100000632b7.00000000 [write 0~691 >>> [1 at -1],startsync 0~0] 1.994248a8 snapc 1=[] ondisk+write e116) currently >>> reached pg >>> 2014-09-12 17:52:40.503917 7f34e8ed2700 0 -- 10.141.8.181:6809/29664 >> >>> 10.141.8.181:6847/62389 pipe(0x247ce040 sd=327 :6809 s=0 pgs=0 cs=0 l=1 >>> c=0x1bc8b9c0).accept replacing existing (lossy) channel (new one lossy=1) >>> >>> I *think* the absence of the logs is some issue related to another issue >>> I >>> just found (http://tracker.ceph.com/issues/9470). >>> >>> So I can't found out the original problem with the log files.. >>> >>> Is there any other way I can find out what started the crashing of 30 >>> osds ? >>> >>> Thanks!! >>> >>> Kenneth >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users at lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > ----- End message from Gregory Farnum <greg at inktank.com> ----- > > -- > > Met vriendelijke groeten, > Kenneth Waegeman > >