On Tue, Sep 4, 2018 at 3:59 AM, Wolfgang Lendl <wolfgang.lendl@xxxxxxxxxxxxxxxx> wrote: > is downgrading from 12.2.7 to 12.2.5 an option? - I'm still suffering > from high frequent osd crashes. > my hopes are with 12.2.9 - but hope wasn't always my best strategy 12.2.8 just went out. I think that Adam or Radoslaw might have some time to check those logs now > > br > wolfgang > > On 2018-08-30 19:18, Alfredo Deza wrote: >> On Thu, Aug 30, 2018 at 5:24 AM, Wolfgang Lendl >> <wolfgang.lendl@xxxxxxxxxxxxxxxx> wrote: >>> Hi Alfredo, >>> >>> >>> caught some logs: >>> https://pastebin.com/b3URiA7p >> That looks like there is an issue with bluestore. Maybe Radoslaw or >> Adam might know a bit more. >> >> >>> br >>> wolfgang >>> >>> On 2018-08-29 15:51, Alfredo Deza wrote: >>>> On Wed, Aug 29, 2018 at 2:06 AM, Wolfgang Lendl >>>> <wolfgang.lendl@xxxxxxxxxxxxxxxx> wrote: >>>>> Hi, >>>>> >>>>> after upgrading my ceph clusters from 12.2.5 to 12.2.7 I'm experiencing random crashes from SSD OSDs (bluestore) - it seems that HDD OSDs are not affected. >>>>> I destroyed and recreated some of the SSD OSDs which seemed to help. >>>>> >>>>> this happens on centos 7.5 (different kernels tested) >>>>> >>>>> /var/log/messages: >>>>> Aug 29 10:24:08 ceph-osd: *** Caught signal (Segmentation fault) ** >>>>> Aug 29 10:24:08 ceph-osd: in thread 7f8a8e69e700 thread_name:bstore_kv_final >>>>> Aug 29 10:24:08 kernel: traps: bstore_kv_final[187470] general protection ip:7f8a997cf42b sp:7f8a8e69abc0 error:0 in libtcmalloc.so.4.4.5[7f8a997a8000+46000] >>>>> Aug 29 10:24:08 systemd: ceph-osd@2.service: main process exited, code=killed, status=11/SEGV >>>>> Aug 29 10:24:08 systemd: Unit ceph-osd@2.service entered failed state. >>>>> Aug 29 10:24:08 systemd: ceph-osd@2.service failed. >>>>> Aug 29 10:24:28 systemd: ceph-osd@2.service holdoff time over, scheduling restart. >>>>> Aug 29 10:24:28 systemd: Starting Ceph object storage daemon osd.2... >>>>> Aug 29 10:24:28 systemd: Started Ceph object storage daemon osd.2. >>>>> Aug 29 10:24:28 ceph-osd: starting osd.2 at - osd_data /var/lib/ceph/osd/ceph-2 /var/lib/ceph/osd/ceph-2/journal >>>>> Aug 29 10:24:35 ceph-osd: *** Caught signal (Segmentation fault) ** >>>>> Aug 29 10:24:35 ceph-osd: in thread 7f5f1e790700 thread_name:tp_osd_tp >>>>> Aug 29 10:24:35 kernel: traps: tp_osd_tp[186933] general protection ip:7f5f43103e63 sp:7f5f1e78a1c8 error:0 in libtcmalloc.so.4.4.5[7f5f430cd000+46000] >>>>> Aug 29 10:24:35 systemd: ceph-osd@0.service: main process exited, code=killed, status=11/SEGV >>>>> Aug 29 10:24:35 systemd: Unit ceph-osd@0.service entered failed state. >>>>> Aug 29 10:24:35 systemd: ceph-osd@0.service failed >>>> These systemd messages aren't usually helpful, try poking around >>>> /var/log/ceph/ for the output on that one OSD. >>>> >>>> If those logs aren't useful either, try bumping up the verbosity (see >>>> http://docs.ceph.com/docs/master/rados/troubleshooting/log-and-debug/#boot-time >>>> ) >>>>> did I hit a known issue? >>>>> any suggestions are highly appreciated >>>>> >>>>> >>>>> br >>>>> wolfgang >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> ceph-users mailing list >>>>> ceph-users@xxxxxxxxxxxxxx >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>> >>> -- >>> Wolfgang Lendl >>> IT Systems & Communications >>> Medizinische Universität Wien >>> Spitalgasse 23 / BT 88 /Ebene 00 >>> A-1090 Wien >>> Tel: +43 1 40160-21231 >>> Fax: +43 1 40160-921200 >>> >>> > > -- > Wolfgang Lendl > IT Systems & Communications > Medizinische Universität Wien > Spitalgasse 23 / BT 88 /Ebene 00 > A-1090 Wien > Tel: +43 1 40160-21231 > Fax: +43 1 40160-921200 > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com