Jan,
In trying to recover my OSDs after the upgrade from Nautilus described
earlier, I eventually managed to make things worse to the point where I'm
going to scrub and fully reinstall. So I zapped all of the devices on one
of my nodes and reproduced the ceph-volume lvm create error I mentioned
earlier, using the procedure from
https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/ to
lay out the LVs and issue ceph-volume lvm create. As I was concerned that
maybe it was a size thing, I only create a 4TB block LV for my first
attempt, and the full 12TB drive for my second attempt.
The output is:
root@ceph01:~# ceph-volume lvm create --bluestore --data
ceph-block-0/block-0 --block.db ceph-db-0/db-0
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
6441f236-8694-46b9-9c6a-bf82af89765d
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-24
--> Absolute path not found for executable: selinuxenabled
--> Ensure $PATH environment variable contains common executable locations
Running command: /bin/chown -h ceph:ceph /dev/ceph-block-0/block-0
Running command: /bin/chown -R ceph:ceph /dev/dm-0
Running command: /bin/ln -s /dev/ceph-block-0/block-0
/var/lib/ceph/osd/ceph-24/block
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
/var/lib/ceph/osd/ceph-24/activate.monmap
stderr: got monmap epoch 4
Running command: /usr/bin/ceph-authtool /var/lib/ceph/osd/ceph-24/keyring
--create-keyring --name osd.24 --add-key
AQAuMjJe5OGHBRAAP94+1E7CzV5Rv9HFj9WVqA==
stdout: creating /var/lib/ceph/osd/ceph-24/keyring
added entity osd.24 auth(key=AQAuMjJe5OGHBRAAP94+1E7CzV5Rv9HFj9WVqA==)
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-24/keyring
Running command: /bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-24/
Running command: /bin/chown -h ceph:ceph /dev/ceph-db-0/db-0
Running command: /bin/chown -R ceph:ceph /dev/dm-1
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore
bluestore --mkfs -i 24 --monmap /var/lib/ceph/osd/ceph-24/activate.monmap
--keyfile - --bluestore-block-db-path /dev/ceph-db-0/db-0 --osd-data
/var/lib/ceph/osd/ceph-24/ --osd-uuid 6441f236-8694-46b9-9c6a-bf82af89765d
--setuser ceph --setgroup ceph
stderr: 2020-01-29 20:32:33.054 7ff4c24abc80 -1
bluestore(/var/lib/ceph/osd/ceph-24/) _read_fsid unparsable uuid
stderr: terminate called after throwing an instance of
'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::bad_get>
'
stderr: what(): boost::bad_get: failed value get using boost::get
stderr: *** Caught signal (Aborted) **
stderr: in thread 7ff4c24abc80 thread_name:ceph-osd
stderr: ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
nautilus (stable)
stderr: 1: (()+0x12730) [0x7ff4c2f54730]
stderr: 2: (gsignal()+0x10b) [0x7ff4c2a377bb]
stderr: 3: (abort()+0x121) [0x7ff4c2a22535]
stderr: 4: (()+0x8c983) [0x7ff4c2dea983]
stderr: 5: (()+0x928c6) [0x7ff4c2df08c6]
stderr: 6: (()+0x92901) [0x7ff4c2df0901]
stderr: 7: (()+0x92b34) [0x7ff4c2df0b34]
stderr: 8: (()+0x5a3f53) [0x564eed1c4f53]
stderr: 9: (Option::size_t const
md_config_t::get_val<Option::size_t>(ConfigValues const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&) const+0x81) [0x564eed1cac91]
stderr: 10: (BlueStore::_set_cache_sizes()+0x15a) [0x564eed645d8a]
stderr: 11: (BlueStore::_open_bdev(bool)+0x173) [0x564eed648b23]
stderr: 12: (BlueStore::mkfs()+0x42b) [0x564eed6adeab]
stderr: 13: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5)
[0x564eed1e4bf5]
stderr: 14: (main()+0x1796) [0x564eed191366]
stderr: 15: (__libc_start_main()+0xeb) [0x7ff4c2a2409b]
stderr: 16: (_start()+0x2a) [0x564eed1c4c6a]
stderr: 2020-01-29 20:32:33.062 7ff4c24abc80 -1 *** Caught signal
(Aborted) **
stderr: in thread 7ff4c24abc80 thread_name:ceph-osd
stderr: ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
nautilus (stable)
stderr: 1: (()+0x12730) [0x7ff4c2f54730]
stderr: 2: (gsignal()+0x10b) [0x7ff4c2a377bb]
stderr: 3: (abort()+0x121) [0x7ff4c2a22535]
stderr: 4: (()+0x8c983) [0x7ff4c2dea983]
stderr: 5: (()+0x928c6) [0x7ff4c2df08c6]
stderr: 6: (()+0x92901) [0x7ff4c2df0901]
stderr: 7: (()+0x92b34) [0x7ff4c2df0b34]
stderr: 8: (()+0x5a3f53) [0x564eed1c4f53]
stderr: 9: (Option::size_t const
md_config_t::get_val<Option::size_t>(ConfigValues const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&) const+0x81) [0x564eed1cac91]
stderr: 10: (BlueStore::_set_cache_sizes()+0x15a) [0x564eed645d8a]
stderr: 11: (BlueStore::_open_bdev(bool)+0x173) [0x564eed648b23]
stderr: 12: (BlueStore::mkfs()+0x42b) [0x564eed6adeab]
stderr: 13: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5)
[0x564eed1e4bf5]
stderr: 14: (main()+0x1796) [0x564eed191366]
stderr: 15: (__libc_start_main()+0xeb) [0x7ff4c2a2409b]
stderr: 16: (_start()+0x2a) [0x564eed1c4c6a]
stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
stderr: -5> 2020-01-29 20:32:33.054 7ff4c24abc80 -1
bluestore(/var/lib/ceph/osd/ceph-24/) _read_fsid unparsable uuid
stderr: 0> 2020-01-29 20:32:33.062 7ff4c24abc80 -1 *** Caught signal
(Aborted) **
stderr: in thread 7ff4c24abc80 thread_name:ceph-osd
stderr: ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
nautilus (stable)
stderr: 1: (()+0x12730) [0x7ff4c2f54730]
stderr: 2: (gsignal()+0x10b) [0x7ff4c2a377bb]
stderr: 3: (abort()+0x121) [0x7ff4c2a22535]
stderr: 4: (()+0x8c983) [0x7ff4c2dea983]
stderr: 5: (()+0x928c6) [0x7ff4c2df08c6]
stderr: 6: (()+0x92901) [0x7ff4c2df0901]
stderr: 7: (()+0x92b34) [0x7ff4c2df0b34]
stderr: 8: (()+0x5a3f53) [0x564eed1c4f53]
stderr: 9: (Option::size_t const
md_config_t::get_val<Option::size_t>(ConfigValues const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&) const+0x81) [0x564eed1cac91]
stderr: 10: (BlueStore::_set_cache_sizes()+0x15a) [0x564eed645d8a]
stderr: 11: (BlueStore::_open_bdev(bool)+0x173) [0x564eed648b23]
stderr: 12: (BlueStore::mkfs()+0x42b) [0x564eed6adeab]
stderr: 13: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5)
[0x564eed1e4bf5]
stderr: 14: (main()+0x1796) [0x564eed191366]
stderr: 15: (__libc_start_main()+0xeb) [0x7ff4c2a2409b]
stderr: 16: (_start()+0x2a) [0x564eed1c4c6a]
stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
stderr: -5> 2020-01-29 20:32:33.054 7ff4c24abc80 -1
bluestore(/var/lib/ceph/osd/ceph-24/) _read_fsid unparsable uuid
stderr: 0> 2020-01-29 20:32:33.062 7ff4c24abc80 -1 *** Caught signal
(Aborted) **
stderr: in thread 7ff4c24abc80 thread_name:ceph-osd
stderr: ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
nautilus (stable)
stderr: 1: (()+0x12730) [0x7ff4c2f54730]
stderr: 2: (gsignal()+0x10b) [0x7ff4c2a377bb]
stderr: 3: (abort()+0x121) [0x7ff4c2a22535]
stderr: 4: (()+0x8c983) [0x7ff4c2dea983]
stderr: 5: (()+0x928c6) [0x7ff4c2df08c6]
stderr: 6: (()+0x92901) [0x7ff4c2df0901]
stderr: 7: (()+0x92b34) [0x7ff4c2df0b34]
stderr: 8: (()+0x5a3f53) [0x564eed1c4f53]
stderr: 9: (Option::size_t const
md_config_t::get_val<Option::size_t>(ConfigValues const&,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> > const&) const+0x81) [0x564eed1cac91]
stderr: 10: (BlueStore::_set_cache_sizes()+0x15a) [0x564eed645d8a]
stderr: 11: (BlueStore::_open_bdev(bool)+0x173) [0x564eed648b23]
stderr: 12: (BlueStore::mkfs()+0x42b) [0x564eed6adeab]
stderr: 13: (OSD::mkfs(CephContext*, ObjectStore*, uuid_d, int)+0xd5)
[0x564eed1e4bf5]
stderr: 14: (main()+0x1796) [0x564eed191366]
stderr: 15: (__libc_start_main()+0xeb) [0x7ff4c2a2409b]
stderr: 16: (_start()+0x2a) [0x564eed1c4c6a]
stderr: NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed to interpret this.
--> Was unable to complete a new OSD, will rollback changes
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd purge-new osd.24
--yes-i-really-mean-it
stderr: purged osd.24
--> RuntimeError: Command failed with exit code 250: /usr/bin/ceph-osd
--cluster ceph --osd-objectstore bluestore --mkfs -i 24 --monmap
/var/lib/ceph/osd/ceph-24/activate.monmap --keyfile -
--bluestore-block-db-path /dev/ceph-db-0/db-0 --osd-data
/var/lib/ceph/osd/ceph-24/ --osd-uuid 6441f236-8694-46b9-9c6a-bf82af89765d
--setuser ceph --setgroup ceph
root@ceph01:~#
Dave Hall
Binghamton Universitykdhall@xxxxxxxxxxxxxx
607-760-2328 (Cell)
607-777-4641 (Office)
On 1/29/2020 3:15 AM, Jan Fajerski wrote:
On Tue, Jan 28, 2020 at 08:03:35PM +0100, bauen1 wrote:
Hi,
I've run into the same issue while testing:
ceph version 14.2.6 (f0aa067ac7a02ee46ea48aa26c6e298b5ea272e9)
nautilus (stable)
debian bullseye
Ceph was installed using ceph-ansible on a vm from the repo
http://download.ceph.com/debian-nautilus
The output of `sudo sh -c 'CEPH_VOLUME_DEBUG=true ceph-volume
--cluster test lvm batch --bluestore /dev/vdb'` has been attached.
Thx, I opened https://tracker.ceph.com/issues/43868.
This looks like a bluestore/osd issue to me, though it might end up being
ceph-volumes fault.
Also worth noting might be that '/var/lib/ceph/osd/test-0/fsid' is
empty (but I don't know too much about the internals)
- bauen1
On 1/28/20 4:54 PM, Dave Hall wrote:
Jan,
Unfortunately I'm under immense pressure right now to get some form
of Ceph into production, so it's going to be Luminous for now, or
maybe a live upgrade to Nautilus without recreating the OSDs (if
that's possible).
The good news is that in the next couple months I expect to add more
hardware that should be nearly identical. I will gladly give it a
go at that time and see if I can recreate. (Or, if I manage to
thoroughly crash my current fledgling cluster, I'll give it another
go on one node while I'm up all night recovering.)
If you could tell me where to look I'd gladly read some code and see
if I can find anything that way. Or if there's any sort of design
document describing the deep internals I'd be glad to scan it to see
if I've hit a corner case of some sort. Actually, I'd be interested
in reading those documents anyway if I could.
Thanks.
-Dave
Dave Hall
On 1/28/2020 3:05 AM, Jan Fajerski wrote:
On Mon, Jan 27, 2020 at 03:23:55PM -0500, Dave Hall wrote:
All,
I've just spent a significant amount of time unsuccessfully chasing
the _read_fsid unparsable uuid error on Debian 10 / Natilus 14.2.6.
Since this is a brand new cluster, last night I gave up and moved back
to Debian 9 / Luminous 12.2.11. In both cases I'm using the packages
>from Debian Backports with ceph-ansible as my deployment tool.
Note that above I said 'the _read_fsid unparsable uuid' error. I've
searched around a bit and found some previously reported issues, but I
did not see any conclusive resolutions.
I would like to get to Nautilus as quickly as possible, so I'd gladly
provide additional information to help track down the cause of this
symptom. I can confirm that, looking at the ceph-volume.log on the
OSD host I see no difference between the ceph-volume lvm batch command
generated by the ceph-ansible versions associated with these two Ceph
releases:
ceph-volume --cluster ceph lvm batch --bluestore --yes
--block-db-size 133358734540 /dev/sdc /dev/sdd /dev/sde /dev/sdf
/dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/nvme0n1
Note that I'm using --block-db-size to divide my NVMe into 12 segments
as I have 4 empty drive bays on my OSD servers that I may eventually
be able to fill.
My OSD hardware is:
Disk /dev/nvme0n1: 1.5 TiB, 1600321314816 bytes, 3125627568 sectors
Disk /dev/sdc: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
Disk /dev/sdd: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
Disk /dev/sde: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
Disk /dev/sdf: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
Disk /dev/sdg: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
Disk /dev/sdh: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
Disk /dev/sdi: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
Disk /dev/sdj: 10.9 TiB, 12000138625024 bytes, 23437770752 sectors
I'd send the output of ceph-volume inventory on Luminous, but I'm
getting -->: KeyError: 'human_readable_size'.
Please let me know if I can provide any further information.
Mind re-running you ceph-volume command with debug output
enabled:
CEPH_VOLUME_DEBUG=true ceph-volume --cluster ceph lvm batch
--bluestore ...
Ideally you could also openen a bug report
herehttps://tracker.ceph.com/projects/ceph-volume/issues/new
Thanks!
Thanks.
-Dave