My clusters are self rolled. My start command is as follows
podman run -it --privileged --pid=host --cpuset-cpus 0,1 --memory 2g
--name ceph_osd0 --hostname ceph_osd0 -v /dev:/dev -v
/etc/localtime:/etc/localtime:ro -v /etc/ceph:/etc/ceph/ -v
/var/lib/ceph/osd/ceph-0:/var/lib/ceph/osd/ceph-0 -v
/var/log/ceph:/var/log/ceph -v /run/udev/:/run/udev/
ceph/ceph:v16.2.7-20220201 ceph-osd --id 0 -c
/etc/ceph/ceph.conf --cluster
ceph -f
I jumped from the octopus img to the 16.2.7 img. I've been running
well for awhile with no issues. The cluster was clean, no backfills in
progressor etc This latest zyp up and reboot and now I have osds that
don't start.
podman image ls
quay.io/ceph/ceph v16.2.7
231fd40524c4 9 days ago 1.39 GB
quay.io/ceph/ceph
v16.2.7-20220201 231fd40524c4 9 days ago 1.39 GB
bluefs fails to mount up, I guess? The headers are still readable via
bluestore tool
ceph-bluestore-tool show-label --dev /dev/mapper/ceph-0block
{
"/dev/mapper/ceph-0block": {
"osd_uuid": "1234abcd-1234-abcd-1234-1234 abcd1234",
"size": 6001171365888,
"btime": "2019-04-11T08:46:36.013428-0700",
"description": "main",
"bfm_blocks": "1465129728",
"bfm_blocks_per_key": "128",
"bfm_bytes_per_block": "4096",
"bfm_size": "6001171365888",
"bluefs": "1",
"ceph_fsid": "1234abcd-1234-abcd-1234-1234 abcd1234",
"kv_backend": "rocksdb",
"magic": "ceph osd volume v026",
"mkfs_done": "yes",
"ready": "ready",
"require_osd_release": "16",
"whoami": "0"
}
}
On Fri, Feb 11, 2022 at 1:06 AM Eugen Block <eblock@xxxxxx> wrote:
Can you share some more information how exactly you upgraded? It
looks
like a cephadm managed cluster. Did you intall OS updates on all
nodes
without waiting for the first one to recover? Maybe I'm misreading so
please clarify what your update process looked like.
Zitat von Mazzystr <mazzystr@xxxxxxxxx>:
> I applied latest os updates and rebooted my hosts. Now all my osds
fail to
> start.
>
> # cat /etc/os-release
> NAME="openSUSE Tumbleweed"
> # VERSION="20220207"
> ID="opensuse-tumbleweed"
> ID_LIKE="opensuse suse"
> VERSION_ID="20220207"
>
> # uname -a
> Linux cube 5.16.5-1-default #1 SMP PREEMPT Thu Feb 3 05:26:48 UTC
2022
> (1af4009) x86_64 x86_64 x86_64 GNU/Linux
>
> container image: v16.2.7 / v16.2.7-20220201
>
> osd debug log shows the following
> -11> 2022-02-10T19:14:48.383-0800 7ff1be4c3080 1 bluefs
add_block_device
> bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 50 GiB
> -10> 2022-02-10T19:14:48.383-0800 7ff1be4c3080 1 set rocksdb
option
> max_total_wal_size = 1073741824
> -9> 2022-02-10T19:14:48.383-0800 7ff1be4c3080 1 set rocksdb
option
> compaction_readahead_size = 2097152
> -8> 2022-02-10T19:14:48.383-0800 7ff1be4c3080 1 set rocksdb
option
> max_write_buffer_number = 4
> -7> 2022-02-10T19:14:48.383-0800 7ff1be4c3080 1 set rocksdb
option
> max_background_compactions = 2
> -6> 2022-02-10T19:14:48.383-0800 7ff1be4c3080 1 set rocksdb
option
> compression = kNoCompression
> -5> 2022-02-10T19:14:48.383-0800 7ff1be4c3080 1 set rocksdb
option
> writable_file_max_buffer_size = 0
> -4> 2022-02-10T19:14:48.383-0800 7ff1be4c3080 1 set rocksdb
option
> min_write_buffer_number_to_merge = 1
> -3> 2022-02-10T19:14:48.383-0800 7ff1be4c3080 1 set rocksdb
option
> recycle_log_file_num = 4
> -2> 2022-02-10T19:14:48.383-0800 7ff1be4c3080 1 set rocksdb
option
> write_buffer_size = 268435456
> -1> 2022-02-10T19:14:48.383-0800 7ff1be4c3080 1 bluefs mount
> 0> 2022-02-10T19:14:48.387-0800 7ff1be4c3080 -1 *** Caught
signal
> (Aborted) **
> in thread 7ff1be4c3080 thread_name:ceph-osd
>
> ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
pacific
> (stable)
> 1: /lib64/libpthread.so.0(+0x12c20) [0x7ff1bc465c20]
> 2: gsignal()
> 3: abort()
> 4: /lib64/libstdc++.so.6(+0x9009b) [0x7ff1bba7c09b]
> 5: /lib64/libstdc++.so.6(+0x9653c) [0x7ff1bba8253c]
> 6: /lib64/libstdc++.so.6(+0x96597) [0x7ff1bba82597]
> 7: /lib64/libstdc++.so.6(+0x967f8) [0x7ff1bba827f8]
> 8: ceph-osd(+0x56301f) [0x559ff6d6301f]
> 9: (BlueFS::_open_super()+0x18c) [0x559ff745f08c]
> 10: (BlueFS::mount()+0xeb) [0x559ff748085b]
> 11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x559ff735e464]
> 12: (BlueStore::_prepare_db_environment(bool, bool,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >*, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >*)+0x6d9)
[0x559ff735f5b9]
> 13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x559ff73608b5]
> 14: (BlueStore::_open_db_and_around(bool, bool)+0x273)
[0x559ff73cba33]
> 15: (BlueStore::_mount()+0x204) [0x559ff73ce974]
> 16: (OSD::init()+0x380) [0x559ff6ea2400]
> 17: main()
> 18: __libc_start_main()
> 19: _start()
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed
> to interpret this.
>
>
> The process log shows the following
> 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> terminate called after throwing an instance of
> 'ceph::buffer::v15_2_0::malformed_input'
> what(): void
> bluefs_super_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)
no
> longer understand old encoding version 2 < 143: Malformed input
> *** Caught signal (Aborted) **
> in thread 7f22869e8080 thread_name:ceph-osd
> ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
pacific
> (stable)
> 1: /lib64/libpthread.so.0(+0x12c20) [0x7f228498ac20]
> 2: gsignal()
> 3: abort()
> 4: /lib64/libstdc++.so.6(+0x9009b) [0x7f2283fa109b]
> 5: /lib64/libstdc++.so.6(+0x9653c) [0x7f2283fa753c]
> 6: /lib64/libstdc++.so.6(+0x96597) [0x7f2283fa7597]
> 7: /lib64/libstdc++.so.6(+0x967f8) [0x7f2283fa77f8]
> 8: ceph-osd(+0x56301f) [0x55e6faf6301f]
> 9: (BlueFS::_open_super()+0x18c) [0x55e6fb65f08c]
> 10: (BlueFS::mount()+0xeb) [0x55e6fb68085b]
> 11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x55e6fb55e464]
> 12: (BlueStore::_prepare_db_environment(bool, bool,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >*, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >*)+0x6d9)
[0x55e6fb55f5b9]
> 13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x55e6fb5608b5]
> 14: (BlueStore::_open_db_and_around(bool, bool)+0x273)
[0x55e6fb5cba33]
> 15: (BlueStore::_mount()+0x204) [0x55e6fb5ce974]
> 16: (OSD::init()+0x380) [0x55e6fb0a2400]
> 17: main()
> 18: __libc_start_main()
> 19: _start()
> 2022-02-10T19:33:34.620-0800 7f22869e8080 -1 *** Caught signal
(Aborted) **
> in thread 7f22869e8080 thread_name:ceph-osd
>
>
> Doesn't anyone have any ideas what could be going on here?
>
> Thanks,
> /Chris
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx