Re: Core dump blue store luminous 12.2.7

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi , I got another failure and this time was able to investigate a bit. 

1. If i delete the OSD and recreate it with the exact same setup, the OSD boot up successfully 
2., however, diffing the log between the failed run and the successful one I noticed something odd: https://www.diffchecker.com/sSHrxwC9

We have in every successful OSD startup the following lines executed 

Running command: ln -snf /dev/inaugurator/ed5a15e8-20b9-4312-991c-1a4d91b284bd-wal /var/lib/ceph/osd/ceph-5/block.wal
Running command: chown -h ceph:ceph /dev/inaugurator/ed5a15e8-20b9-4312-991c-1a4d91b284bd-wal 


However, in every failed run this two line are missing . Any idea why this would occur? 


Last but not least: I have setup the log level to 20, however, it seems that the bluestore crash before even getting to the point where things are logged. 

Regards
Benoit



On Mon, 6 Aug 2018 at 13:07, Benoit Hudzia <benoit@xxxxxxxxxxxxxxx> wrote:
Thanks, I ll try to check if i can reproduce it. It's really sporadic and occurs every 20-30 runs , I might check if it always occurs on the same server , maybe an HW issue. 

On Mon, 6 Aug 2018 at 06:12, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
This isn't very complete as it just indicates that something went wrong with a read. Since I presume it happens on every startup, it may help if you set "debug bluestore = 20" in the OSD's config and provide that log (perhaps with ceph-post-file if it's large).
I also went through my email and see https://tracker.ceph.com/issues/24639, if you have anything in common with that deployment. (But you probably don't; an error on read generally is about bad state on disk that was created somewhere else.)
-Greg

On Sun, Aug 5, 2018 at 3:19 PM Benoit Hudzia <benoit@xxxxxxxxxxxxxxx> wrote:
Hi,

We start to see core dump occurring with luminous 12.2.7. Any idea where this is coming from ?? We started having issues with bluestore core dumping when we moved to 12.2.6 and hoped that 12.2.7 would have fixed it. We might need to revert back to 12.2.5 as it seems a lot more stable. 

Pastebin link for full log: https://pastebin.com/na4E3m3N 


Core dump : 
starting osd.7 at - osd_data /var/lib/ceph/osd/ceph-7 /var/lib/ceph/osd/ceph-7/journal
*** Caught signal (Segmentation fault) **
 in thread 7fa8830cfd80 thread_name:ceph-osd
 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0xa48ec1) [0x55e010afcec1]
 2: (()+0xf6d0) [0x7fa8807966d0]
 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72]
 4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f]
 5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4]
 6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7]
 7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e]
 8: (OSD::init()+0x3bd) [0x55e0105c934d]
 9: (main()+0x2d07) [0x55e0104ce947]
 10: (__libc_start_main()+0xf5) [0x7fa87f7a3445]
 11: (()+0x4b9003) [0x55e01056d003]
2018-08-03 21:58:12.248736 7fa8830cfd80 -1 *** Caught signal (Segmentation fault) **
 in thread 7fa8830cfd80 thread_name:ceph-osd

 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0xa48ec1) [0x55e010afcec1]
 2: (()+0xf6d0) [0x7fa8807966d0]
 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72]
 4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f]
 5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4]
 6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7]
 7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e]
 8: (OSD::init()+0x3bd) [0x55e0105c934d]
 9: (main()+0x2d07) [0x55e0104ce947]
 10: (__libc_start_main()+0xf5) [0x7fa87f7a3445]
 11: (()+0x4b9003) [0x55e01056d003]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

     0> 2018-08-03 21:58:12.248736 7fa8830cfd80 -1 *** Caught signal (Segmentation fault) **
 in thread 7fa8830cfd80 thread_name:ceph-osd

 ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable)
 1: (()+0xa48ec1) [0x55e010afcec1]
 2: (()+0xf6d0) [0x7fa8807966d0]
 3: (BlueFS::_read(BlueFS::FileReader*, BlueFS::FileReaderBuffer*, unsigned long, unsigned long, ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72]
 4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f]
 5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4]
 6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7]
 7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e]
 8: (OSD::init()+0x3bd) [0x55e0105c934d]
 9: (main()+0x2d07) [0x55e0104ce947]
 10: (__libc_start_main()+0xf5) [0x7fa87f7a3445]
 11: (()+0x4b9003) [0x55e01056d003]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

/osd_entrypoint: line 98: 119388 Segmentation fault      (core dumped) /usr/bin/ceph-osd -f --cluster "${CEPH_CLUSTERNAME}" --id "${OSD_ID}" --setuser root --setgroup root



--
Dr. Benoit Hudzia

Mobile (UK): +44 (0) 75 346 78673
Mobile (IE):  +353 (0) 89 219 3675
Email: benoit@xxxxxxxxxxxxxxx



Web | Blog | Twitter | Google+ | Linkedin


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com


--
Dr. Benoit Hudzia

Mobile (UK): +44 (0) 75 346 78673
Mobile (IE):  +353 (0) 89 219 3675
Email: benoit@xxxxxxxxxxxxxxx



Web | Blog | Twitter | Google+ | Linkedin




--
Dr. Benoit Hudzia

Mobile (UK): +44 (0) 75 346 78673
Mobile (IE):  +353 (0) 89 219 3675
Email: benoit@xxxxxxxxxxxxxxx



Web | Blog | Twitter | Google+ | Linkedin


_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux