Re: Core dump blue store luminous 12.2.7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

the missing "ln -snf ..." is probably related to missing LV tags. When we had to migrate OSD journals to another SSD because of a failed SSD we noticed the same difference to new (healthy) OSDs. Compare the tags of your Logical Volumes to their actual UUIDs and all the other information to match to the actual setup.

lvs --noheadings -o +lv_tags [YOUR_LV]

shows you the tags of one or all your LVs. Probably, there's a wrong UUID of the respective wal.db LV, the tag is something like ceph.wal_uuid (I don't have access to our cluster right now).

If you find the false UUID, just delete the existing tag with

lvchange --deltag ...

and add the correct tag with

lvchange --addtag ...

I hope this helps to resolve your issue.

Regards,
Eugen


Zitat von Benoit Hudzia <benoit@xxxxxxxxxxxxxxx>:

Hi , I got another failure and this time was able to investigate a bit.

1. If i delete the OSD and recreate it with the exact same setup, the OSD
boot up successfully
2., however, diffing the log between the failed run and the successful one
I noticed something odd: https://www.diffchecker.com/sSHrxwC9

We have in every successful OSD startup the following lines executed

Running command: ln -snf
/dev/inaugurator/ed5a15e8-20b9-4312-991c-1a4d91b284bd-wal
/var/lib/ceph/osd/ceph-5/block.wal
Running command: chown -h ceph:ceph
/dev/inaugurator/ed5a15e8-20b9-4312-991c-1a4d91b284bd-wal


However, in every failed run this two line are missing . Any idea why this
would occur?


Last but not least: I have setup the log level to 20, however, it seems
that the bluestore crash before even getting to the point where things are
logged.

Regards
Benoit



On Mon, 6 Aug 2018 at 13:07, Benoit Hudzia <benoit@xxxxxxxxxxxxxxx> wrote:

Thanks, I ll try to check if i can reproduce it. It's really sporadic and
occurs every 20-30 runs , I might check if it always occurs on the same
server , maybe an HW issue.

On Mon, 6 Aug 2018 at 06:12, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:

This isn't very complete as it just indicates that something went wrong
with a read. Since I presume it happens on every startup, it may help if
you set "debug bluestore = 20" in the OSD's config and provide that log
(perhaps with ceph-post-file if it's large).
I also went through my email and see
https://tracker.ceph.com/issues/24639, if you have anything in common
with that deployment. (But you probably don't; an error on read generally
is about bad state on disk that was created somewhere else.)
-Greg

On Sun, Aug 5, 2018 at 3:19 PM Benoit Hudzia <benoit@xxxxxxxxxxxxxxx>
wrote:

Hi,

We start to see core dump occurring with luminous 12.2.7. Any idea where
this is coming from ?? We started having issues with bluestore core dumping when we moved to 12.2.6 and hoped that 12.2.7 would have fixed it. We might
need to revert back to 12.2.5 as it seems a lot more stable.

Pastebin link for full log: https://pastebin.com/na4E3m3N


Core dump :

starting osd.7 at - osd_data /var/lib/ceph/osd/ceph-7 /var/lib/ceph/osd/ceph-7/journal
*** Caught signal (Segmentation fault) **
 in thread 7fa8830cfd80 thread_name:ceph-osd
ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0xa48ec1) [0x55e010afcec1]
 2: (()+0xf6d0) [0x7fa8807966d0]
3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72]
 4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f]
 5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4]
 6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7]
 7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e]
 8: (OSD::init()+0x3bd) [0x55e0105c934d]
 9: (main()+0x2d07) [0x55e0104ce947]
 10: (__libc_start_main()+0xf5) [0x7fa87f7a3445]
 11: (()+0x4b9003) [0x55e01056d003]
2018-08-03 21:58:12.248736 7fa8830cfd80 -1 *** Caught signal (Segmentation fault) **
 in thread 7fa8830cfd80 thread_name:ceph-osd

ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0xa48ec1) [0x55e010afcec1]
 2: (()+0xf6d0) [0x7fa8807966d0]
3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72]
 4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f]
 5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4]
 6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7]
 7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e]
 8: (OSD::init()+0x3bd) [0x55e0105c934d]
 9: (main()+0x2d07) [0x55e0104ce947]
 10: (__libc_start_main()+0xf5) [0x7fa87f7a3445]
 11: (()+0x4b9003) [0x55e01056d003]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

0> 2018-08-03 21:58:12.248736 7fa8830cfd80 -1 *** Caught signal (Segmentation fault) **
 in thread 7fa8830cfd80 thread_name:ceph-osd

ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0xa48ec1) [0x55e010afcec1]
 2: (()+0xf6d0) [0x7fa8807966d0]
3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72]
 4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f]
 5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4]
 6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7]
 7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e]
 8: (OSD::init()+0x3bd) [0x55e0105c934d]
 9: (main()+0x2d07) [0x55e0104ce947]
 10: (__libc_start_main()+0xf5) [0x7fa87f7a3445]
 11: (()+0x4b9003) [0x55e01056d003]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

/osd_entrypoint: line 98: 119388 Segmentation fault (core dumped) /usr/bin/ceph-osd -f --cluster "${CEPH_CLUSTERNAME}" --id "${OSD_ID}" --setuser root --setgroup root




--
Dr. Benoit Hudzia

Mobile (UK): +44 (0) 75 346 78673
Mobile (IE):  +353 (0) 89 219 3675 <+353%2089%20219%203675>
Email: benoit@xxxxxxxxxxxxxxx



Web <http://www.stratoscale.com/> | Blog
<http://www.stratoscale.com/blog/> | Twitter
<https://twitter.com/Stratoscale> | Google+
<https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
 | Linkedin <https://www.linkedin.com/company/stratoscale>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



--
Dr. Benoit Hudzia

Mobile (UK): +44 (0) 75 346 78673
Mobile (IE):  +353 (0) 89 219 3675
Email: benoit@xxxxxxxxxxxxxxx



Web <http://www.stratoscale.com/> | Blog
<http://www.stratoscale.com/blog/> | Twitter
<https://twitter.com/Stratoscale> | Google+
<https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
 | Linkedin <https://www.linkedin.com/company/stratoscale>



--
Dr. Benoit Hudzia

Mobile (UK): +44 (0) 75 346 78673
Mobile (IE):  +353 (0) 89 219 3675
Email: benoit@xxxxxxxxxxxxxxx



Web <http://www.stratoscale.com/> | Blog <http://www.stratoscale.com/blog/>
 | Twitter <https://twitter.com/Stratoscale> | Google+
<https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
 | Linkedin <https://www.linkedin.com/company/stratoscale>



_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux