Hi,
the missing "ln -snf ..." is probably related to missing LV tags. When
we had to migrate OSD journals to another SSD because of a failed SSD
we noticed the same difference to new (healthy) OSDs. Compare the tags
of your Logical Volumes to their actual UUIDs and all the other
information to match to the actual setup.
lvs --noheadings -o +lv_tags [YOUR_LV]
shows you the tags of one or all your LVs. Probably, there's a wrong
UUID of the respective wal.db LV, the tag is something like
ceph.wal_uuid (I don't have access to our cluster right now).
If you find the false UUID, just delete the existing tag with
lvchange --deltag ...
and add the correct tag with
lvchange --addtag ...
I hope this helps to resolve your issue.
Regards,
Eugen
Zitat von Benoit Hudzia <benoit@xxxxxxxxxxxxxxx>:
Hi , I got another failure and this time was able to investigate a bit.
1. If i delete the OSD and recreate it with the exact same setup, the OSD
boot up successfully
2., however, diffing the log between the failed run and the successful one
I noticed something odd: https://www.diffchecker.com/sSHrxwC9
We have in every successful OSD startup the following lines executed
Running command: ln -snf
/dev/inaugurator/ed5a15e8-20b9-4312-991c-1a4d91b284bd-wal
/var/lib/ceph/osd/ceph-5/block.wal
Running command: chown -h ceph:ceph
/dev/inaugurator/ed5a15e8-20b9-4312-991c-1a4d91b284bd-wal
However, in every failed run this two line are missing . Any idea why this
would occur?
Last but not least: I have setup the log level to 20, however, it seems
that the bluestore crash before even getting to the point where things are
logged.
Regards
Benoit
On Mon, 6 Aug 2018 at 13:07, Benoit Hudzia <benoit@xxxxxxxxxxxxxxx> wrote:
Thanks, I ll try to check if i can reproduce it. It's really sporadic and
occurs every 20-30 runs , I might check if it always occurs on the same
server , maybe an HW issue.
On Mon, 6 Aug 2018 at 06:12, Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
This isn't very complete as it just indicates that something went wrong
with a read. Since I presume it happens on every startup, it may help if
you set "debug bluestore = 20" in the OSD's config and provide that log
(perhaps with ceph-post-file if it's large).
I also went through my email and see
https://tracker.ceph.com/issues/24639, if you have anything in common
with that deployment. (But you probably don't; an error on read generally
is about bad state on disk that was created somewhere else.)
-Greg
On Sun, Aug 5, 2018 at 3:19 PM Benoit Hudzia <benoit@xxxxxxxxxxxxxxx>
wrote:
Hi,
We start to see core dump occurring with luminous 12.2.7. Any idea where
this is coming from ?? We started having issues with bluestore
core dumping
when we moved to 12.2.6 and hoped that 12.2.7 would have fixed
it. We might
need to revert back to 12.2.5 as it seems a lot more stable.
Pastebin link for full log: https://pastebin.com/na4E3m3N
Core dump :
starting osd.7 at - osd_data /var/lib/ceph/osd/ceph-7
/var/lib/ceph/osd/ceph-7/journal
*** Caught signal (Segmentation fault) **
in thread 7fa8830cfd80 thread_name:ceph-osd
ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5)
luminous (stable)
1: (()+0xa48ec1) [0x55e010afcec1]
2: (()+0xf6d0) [0x7fa8807966d0]
3: (BlueFS::_read(BlueFS::FileReader*,
BlueFS::FileReaderBuffer*, unsigned long, unsigned long,
ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72]
4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f]
5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4]
6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7]
7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e]
8: (OSD::init()+0x3bd) [0x55e0105c934d]
9: (main()+0x2d07) [0x55e0104ce947]
10: (__libc_start_main()+0xf5) [0x7fa87f7a3445]
11: (()+0x4b9003) [0x55e01056d003]
2018-08-03 21:58:12.248736 7fa8830cfd80 -1 *** Caught signal
(Segmentation fault) **
in thread 7fa8830cfd80 thread_name:ceph-osd
ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5)
luminous (stable)
1: (()+0xa48ec1) [0x55e010afcec1]
2: (()+0xf6d0) [0x7fa8807966d0]
3: (BlueFS::_read(BlueFS::FileReader*,
BlueFS::FileReaderBuffer*, unsigned long, unsigned long,
ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72]
4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f]
5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4]
6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7]
7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e]
8: (OSD::init()+0x3bd) [0x55e0105c934d]
9: (main()+0x2d07) [0x55e0104ce947]
10: (__libc_start_main()+0xf5) [0x7fa87f7a3445]
11: (()+0x4b9003) [0x55e01056d003]
NOTE: a copy of the executable, or `objdump -rdS <executable>`
is needed to interpret this.
0> 2018-08-03 21:58:12.248736 7fa8830cfd80 -1 *** Caught
signal (Segmentation fault) **
in thread 7fa8830cfd80 thread_name:ceph-osd
ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5)
luminous (stable)
1: (()+0xa48ec1) [0x55e010afcec1]
2: (()+0xf6d0) [0x7fa8807966d0]
3: (BlueFS::_read(BlueFS::FileReader*,
BlueFS::FileReaderBuffer*, unsigned long, unsigned long,
ceph::buffer::list*, char*)+0x452) [0x55e010ab1e72]
4: (BlueFS::_replay(bool)+0x2ef) [0x55e010ac526f]
5: (BlueFS::mount()+0x1d4) [0x55e010ac8fd4]
6: (BlueStore::_open_db(bool)+0x1847) [0x55e0109e2da7]
7: (BlueStore::_mount(bool)+0x40e) [0x55e010a1406e]
8: (OSD::init()+0x3bd) [0x55e0105c934d]
9: (main()+0x2d07) [0x55e0104ce947]
10: (__libc_start_main()+0xf5) [0x7fa87f7a3445]
11: (()+0x4b9003) [0x55e01056d003]
NOTE: a copy of the executable, or `objdump -rdS <executable>`
is needed to interpret this.
/osd_entrypoint: line 98: 119388 Segmentation fault (core
dumped) /usr/bin/ceph-osd -f --cluster "${CEPH_CLUSTERNAME}" --id
"${OSD_ID}" --setuser root --setgroup root
--
Dr. Benoit Hudzia
Mobile (UK): +44 (0) 75 346 78673
Mobile (IE): +353 (0) 89 219 3675 <+353%2089%20219%203675>
Email: benoit@xxxxxxxxxxxxxxx
Web <http://www.stratoscale.com/> | Blog
<http://www.stratoscale.com/blog/> | Twitter
<https://twitter.com/Stratoscale> | Google+
<https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
| Linkedin <https://www.linkedin.com/company/stratoscale>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
--
Dr. Benoit Hudzia
Mobile (UK): +44 (0) 75 346 78673
Mobile (IE): +353 (0) 89 219 3675
Email: benoit@xxxxxxxxxxxxxxx
Web <http://www.stratoscale.com/> | Blog
<http://www.stratoscale.com/blog/> | Twitter
<https://twitter.com/Stratoscale> | Google+
<https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
| Linkedin <https://www.linkedin.com/company/stratoscale>
--
Dr. Benoit Hudzia
Mobile (UK): +44 (0) 75 346 78673
Mobile (IE): +353 (0) 89 219 3675
Email: benoit@xxxxxxxxxxxxxxx
Web <http://www.stratoscale.com/> | Blog <http://www.stratoscale.com/blog/>
| Twitter <https://twitter.com/Stratoscale> | Google+
<https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts>
| Linkedin <https://www.linkedin.com/company/stratoscale>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com