Hello, got new logs - if this snip is not sufficent, I can provide the full log https://pastebin.com/dKBzL9AW br+thx wolfgang On 2018-09-05 01:55, Radoslaw Zarzynski wrote: > In the log following trace can be found: > > 0> 2018-08-30 13:11:01.014708 7ff2dd344700 -1 *** Caught signal > (Segmentation fault) ** > in thread 7ff2dd344700 thread_name:osd_srv_agent > > ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) > luminous (stable) > 1: (()+0xa48ec1) [0x5652900ffec1] > 2: (()+0xf6d0) [0x7ff2f7c206d0] > 3: (BlueStore::_wctx_finish(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>, BlueStore::WriteContext*, > std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, > std::allocator<BlueStore::SharedBlob*> >*)+0xb4) [0x56528ffe3954] > 4: (BlueStore::_do_truncate(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>, unsigned long, > std::set<BlueStore::SharedBlob*, std::less<BlueStore::SharedBlob*>, > std::allocator<BlueStore::SharedBlob*> >*)+0x2c2) [0x56528fffd642] > 5: (BlueStore::_do_remove(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>)+0xc6) [0x56528fffdf86] > 6: (BlueStore::_remove(BlueStore::TransContext*, > boost::intrusive_ptr<BlueStore::Collection>&, > boost::intrusive_ptr<BlueStore::Onode>&)+0x94) [0x56528ffff9f4] > 7: (BlueStore::_txc_add_transaction(BlueStore::TransContext*, > ObjectStore::Transaction*)+0x15af) [0x56529001280f] > 8: ... > > This looks quite similar to #25001 [1]. The corruption *might* be caused by > the racy SharedBlob::put() [2] that was fixed in 12.2.6. However, more logs > (debug_bluestore=20, debug_bdev=20) would be useful. Also you might > want to carefully use fsck -- please take a look on the Igor's (CCed) post > and Troy's response. > > Best regards, > Radoslaw Zarzynski > > [1] http://tracker.ceph.com/issues/25001 > [2] http://tracker.ceph.com/issues/24211 > [3] http://tracker.ceph.com/issues/25001#note-6 > > On Tue, Sep 4, 2018 at 12:54 PM, Alfredo Deza <adeza@xxxxxxxxxx> wrote: >> On Tue, Sep 4, 2018 at 3:59 AM, Wolfgang Lendl >> <wolfgang.lendl@xxxxxxxxxxxxxxxx> wrote: >>> is downgrading from 12.2.7 to 12.2.5 an option? - I'm still suffering >>> from high frequent osd crashes. >>> my hopes are with 12.2.9 - but hope wasn't always my best strategy >> 12.2.8 just went out. I think that Adam or Radoslaw might have some >> time to check those logs now >> >>> br >>> wolfgang >>> >>> On 2018-08-30 19:18, Alfredo Deza wrote: >>>> On Thu, Aug 30, 2018 at 5:24 AM, Wolfgang Lendl >>>> <wolfgang.lendl@xxxxxxxxxxxxxxxx> wrote: >>>>> Hi Alfredo, >>>>> >>>>> >>>>> caught some logs: >>>>> https://pastebin.com/b3URiA7p >>>> That looks like there is an issue with bluestore. Maybe Radoslaw or >>>> Adam might know a bit more. >>>> >>>> >>>>> br >>>>> wolfgang >>>>> >>>>> On 2018-08-29 15:51, Alfredo Deza wrote: >>>>>> On Wed, Aug 29, 2018 at 2:06 AM, Wolfgang Lendl >>>>>> <wolfgang.lendl@xxxxxxxxxxxxxxxx> wrote: >>>>>>> Hi, >>>>>>> >>>>>>> after upgrading my ceph clusters from 12.2.5 to 12.2.7 I'm experiencing random crashes from SSD OSDs (bluestore) - it seems that HDD OSDs are not affected. >>>>>>> I destroyed and recreated some of the SSD OSDs which seemed to help. >>>>>>> >>>>>>> this happens on centos 7.5 (different kernels tested) >>>>>>> >>>>>>> /var/log/messages: >>>>>>> Aug 29 10:24:08 ceph-osd: *** Caught signal (Segmentation fault) ** >>>>>>> Aug 29 10:24:08 ceph-osd: in thread 7f8a8e69e700 thread_name:bstore_kv_final >>>>>>> Aug 29 10:24:08 kernel: traps: bstore_kv_final[187470] general protection ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in libtcmalloc.so.4.4.5[7f8a997a8000+46000] >>>>>>> Aug 29 10:24:08 systemd: ceph-osd@2.service: main process exited, code=killed, status=11/SEGV >>>>>>> Aug 29 10:24:08 systemd: Unit ceph-osd@2.service entered failed state. >>>>>>> Aug 29 10:24:08 systemd: ceph-osd@2.service failed. >>>>>>> Aug 29 10:24:28 systemd: ceph-osd@2.service holdoff time over, scheduling restart. >>>>>>> Aug 29 10:24:28 systemd: Starting Ceph object storage daemon osd.2... >>>>>>> Aug 29 10:24:28 systemd: Started Ceph object storage daemon osd.2. >>>>>>> Aug 29 10:24:28 ceph-osd: starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal >>>>>>> Aug 29 10:24:35 ceph-osd: *** Caught signal (Segmentation fault) ** >>>>>>> Aug 29 10:24:35 ceph-osd: in thread 7f5f1e790700 thread_name:tp_osd_tp >>>>>>> Aug 29 10:24:35 kernel: traps: tp_osd_tp[186933] general protection ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in libtcmalloc.so.4.4.5[7f5f430cd000+46000] >>>>>>> Aug 29 10:24:35 systemd: ceph-osd@0.service: main process exited, code=killed, status=11/SEGV >>>>>>> Aug 29 10:24:35 systemd: Unit ceph-osd@0.service entered failed state. >>>>>>> Aug 29 10:24:35 systemd: ceph-osd@0.service failed >>>>>> These systemd messages aren't usually helpful, try poking around >>>>>> /var/log/ceph/ for the output on that one OSD. >>>>>> >>>>>> If those logs aren't useful either, try bumping up the verbosity (see >>>>>> http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#boot-time >>>>>> ) >>>>>>> did I hit a known issue? >>>>>>> any suggestions are highly appreciated >>>>>>> >>>>>>> >>>>>>> br >>>>>>> wolfgang >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> ceph-users mailing list >>>>>>> ceph-users@xxxxxxxxxxxxxx >>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>>> >>>>> -- >>>>> Wolfgang Lendl >>>>> IT Systems & Communications >>>>> Medizinische Universität Wien >>>>> Spitalgasse 23 / BT 88 /Ebene 00 >>>>> A-1090 Wien >>>>> Tel: +43 1 40160-21231 >>>>> Fax: +43 1 40160-921200 >>>>> >>>>> >>> -- >>> Wolfgang Lendl >>> IT Systems & Communications >>> Medizinische Universität Wien >>> Spitalgasse 23 / BT 88 /Ebene 00 >>> A-1090 Wien >>> Tel: +43 1 40160-21231 >>> Fax: +43 1 40160-921200 >>> >>> _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com