Re: osds won't start

Eugen Block <eblock@xxxxxx> · Mon, 14 Feb 2022 08:53:09 +0000

Sorry I couldn't respond, but you obviously found the root cause. Great!

Zitat von Mazzystr <mazzystr@xxxxxxxxx>:

This problem is solved.  My links are indeed swapped

host0:/var/lib/ceph/osd/ceph-0 # ls -la block*
lrwxrwxrwx 1 ceph ceph   23 Jan 15 15:13 block -> /dev/mapper/ceph-0block
lrwxrwxrwx 1 ceph ceph   24 Jan 15 15:13 block.db -> /dev/mapper/ceph--0db
lrwxrwxrwx 1 ceph ceph   25 Jan 15 15:13 block.wal -> /dev/mapper/ceph--0wal

[root@ceph_osd0 /]# ceph-bluestore-tool show-label --dev
/var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "7755e0c2-b4bf-4cbe-bc9a-26042d5bdc52",
        "size": 49998200832,
        "btime": "2019-04-11T08:46:36.694465-0700",
        "description": "bluefs wal"
    }
}

[root@ceph_osd0 /]# ceph-bluestore-tool show-label --dev
/var/lib/ceph/osd/ceph-0/block.wal
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "7755e0c2-b4bf-4cbe-bc9a-26042d5bdc52",
        "size": 49998200832,
        "btime": "2019-04-11T08:46:36.694465-0700",
        "description": "bluefs db"
    }
}

Good grief!  How did I miss the bad LUKS labels!  I've been looking at this
for two days now!  LOL!

host0: ~  # lsblk
nvme0n1                        259:0    0 465.8G  0 disk
└─nvme0n1p1                    259:1    0 465.8G  0 part
  ├─vg-ceph--0                 254:3    0     1G  0 lvm
  │ └─ceph-0                   254:28   0  1008M  0 crypt
/var/lib/ceph/osd/ceph-0
  ├─vg-ceph--0wal              254:4    0     1G  0 lvm
--->│ └─ceph-0db                 254:29   0  1008M  0 crypt
  ├─vg-ceph--0db               254:5    0    50G  0 lvm
--->│ └─ceph-0wal                254:39   0    50G  0 crypt
  ├─vg-ceph--1                 254:6    0     1G  0 lvm

I flipped the soft links manually and the osd fires up, mounts the
bluestore, and starts pinging all his peeps.

This was the result of bad automation that populates our /etc/crypttab.

Hopefully this exercise can help the next person with some troubleshooting
tips.

Thanks,
/Chris

On Fri, Feb 11, 2022 at 11:09 AM Mazzystr <mazzystr@xxxxxxxxx> wrote:

I set debug {bdev, bluefs, bluestore, osd} = 20/20 and restarted osd.0

Logs are here
   -15> 2022-02-11T11:07:09.944-0800 7f93546c0080 10
bluestore(/var/lib/ceph/osd/ceph-0/block.wal) _read_bdev_label got
bdev(osd_uuid 7755e0c2-b4bf-4cbe-bc9a-26042d5bdc52, size 0xba4200000, btime
2019-04-11T08:46:36.694465-0700, desc bluefs db, 0 meta)
   -14> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
max_total_wal_size = 1073741824
   -13> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
compaction_readahead_size = 2097152
   -12> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
max_write_buffer_number = 4
   -11> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
max_background_compactions = 2
   -10> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
compression = kNoCompression
    -9> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
writable_file_max_buffer_size = 0
    -8> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
min_write_buffer_number_to_merge = 1
    -7> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
recycle_log_file_num = 4
    -6> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
write_buffer_size = 268435456
    -5> 2022-02-11T11:07:09.944-0800 7f93546c0080  1 bluefs mount
    -4> 2022-02-11T11:07:09.944-0800 7f93546c0080 10 bluefs _open_super
    -3> 2022-02-11T11:07:09.944-0800 7f93546c0080  5 bdev(0x55d345e82800
/var/lib/ceph/osd/ceph-0/block.db) read 0x1000~1000 (direct)
    -2> 2022-02-11T11:07:09.944-0800 7f93546c0080 20 bdev(0x55d345e82800
/var/lib/ceph/osd/ceph-0/block.db) _aio_log_start 0x1000~1000
    -1> 2022-02-11T11:07:09.944-0800 7f93546c0080 20 bdev(0x55d345e82800
/var/lib/ceph/osd/ceph-0/block.db) _aio_log_finish 1 0x1000~1000
     0> 2022-02-11T11:07:09.948-0800 7f93546c0080 -1 *** Caught signal
(Aborted) **
 in thread 7f93546c0080 thread_name:ceph-osd

 ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
(stable)
 1: /lib64/libpthread.so.0(+0x12c20) [0x7f9352662c20]
 2: gsignal()
 3: abort()
 4: /lib64/libstdc++.so.6(+0x9009b) [0x7f9351c7909b]
 5: /lib64/libstdc++.so.6(+0x9653c) [0x7f9351c7f53c]
 6: /lib64/libstdc++.so.6(+0x96597) [0x7f9351c7f597]
 7: /lib64/libstdc++.so.6(+0x967f8) [0x7f9351c7f7f8]
 8: ceph-osd(+0x56301f) [0x55d339f6301f]
 9: (BlueFS::_open_super()+0x18c) [0x55d33a65f08c]
 10: (BlueFS::mount()+0xeb) [0x55d33a68085b]
 11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x55d33a55e464]
 12: (BlueStore::_prepare_db_environment(bool, bool,
std::__cxx11::basic_string<char, std::char_traits<char>,
std::allocator<char> >*, std::__cxx11::basic_string<char,
std::char_traits<char>, std::allocator<char> >*)+0x6d9) [0x55d33a55f5b9]
 13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x55d33a5608b5]
 14: (BlueStore::_open_db_and_around(bool, bool)+0x273) [0x55d33a5cba33]
 15: (BlueStore::_mount()+0x204) [0x55d33a5ce974]
 16: (OSD::init()+0x380) [0x55d33a0a2400]
 17: main()
 18: __libc_start_main()
 19: _start()
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
to interpret this.

On Fri, Feb 11, 2022 at 10:07 AM Mazzystr <mazzystr@xxxxxxxxx> wrote:

I'm suspicious of cross contamination of devices here.  I was on CentOS
for eons until Red Hat shenanigans pinned me to CentOS 7 and nautilus.  I
had very well defined udev rules that ensured dm devices were statically
set and owned correctly and survived reboots.

I seem to be struggling with this in the openSuSe world.  Ownership on my
devices flip back to root despite my long standing udev rules bing migrated
over.

I know my paths are correct though.  The osd root dirs are also lv's with
filesystem labels.  the block, db, wal links are correct.  db and wal are
lv's named appropriately (yea yea, per Sage that ship has sailed.  I LV'd
too). The osd drives get partitions with osd number labels.  bluestore tool
also confirms.

On Fri, Feb 11, 2022 at 9:14 AM Mazzystr <mazzystr@xxxxxxxxx> wrote:

I forgot to mention I freeze the cluster with 'ceph osd set
no{down,out,backfill}'.  Then I zyp up all hosts and reboot them.  Only
when everything is backup do I unset.

My client IO patterns allow me to do this since it's a worm data store
with long spans of time between writes and reads.  I have plenty  
of time to
work with the community and get my store back online.

This thread is really for documentation for the next person that comes
along with the same problem

On Fri, Feb 11, 2022 at 9:08 AM Mazzystr <mazzystr@xxxxxxxxx> wrote:

My clusters are self rolled.  My start command is as follows

podman run -it --privileged --pid=host --cpuset-cpus 0,1 --memory 2g
--name ceph_osd0 --hostname ceph_osd0 -v /dev:/dev -v
/etc/localtime:/etc/localtime:ro -v /etc/ceph:/etc/ceph/ -v
/var/lib/ceph/osd/ceph-0:/var/lib/ceph/osd/ceph-0 -v
/var/log/ceph:/var/log/ceph -v /run/udev/:/run/udev/
ceph/ceph:v16.2.7-20220201 ceph-osd --id 0 -c  
/etc/ceph/ceph.conf --cluster
ceph -f

I jumped from the octopus img to the 16.2.7 img.  I've been running
well for awhile with no issues.  The cluster was clean, no backfills in
progressor etc  This latest zyp up and reboot and now I have osds that
don't start.

podman image ls
quay.io/ceph/ceph                                         v16.2.7
      231fd40524c4  9 days ago    1.39 GB
quay.io/ceph/ceph
v16.2.7-20220201  231fd40524c4  9 days ago    1.39 GB

bluefs fails to mount up, I guess?  The headers are still readable via
bluestore tool

ceph-bluestore-tool show-label --dev /dev/mapper/ceph-0block
{
    "/dev/mapper/ceph-0block": {
        "osd_uuid": "1234abcd-1234-abcd-1234-1234 abcd1234",
        "size": 6001171365888,
        "btime": "2019-04-11T08:46:36.013428-0700",
        "description": "main",
        "bfm_blocks": "1465129728",
        "bfm_blocks_per_key": "128",
        "bfm_bytes_per_block": "4096",
        "bfm_size": "6001171365888",
        "bluefs": "1",
        "ceph_fsid": "1234abcd-1234-abcd-1234-1234 abcd1234",
        "kv_backend": "rocksdb",
        "magic": "ceph osd volume v026",
        "mkfs_done": "yes",
        "ready": "ready",
        "require_osd_release": "16",
        "whoami": "0"
    }
}

On Fri, Feb 11, 2022 at 1:06 AM Eugen Block <eblock@xxxxxx> wrote:

Can you share some more information how exactly you upgraded? It
looks
like a cephadm managed cluster. Did you intall OS updates on all
nodes
without waiting for the first one to recover? Maybe I'm misreading so
please clarify what your update process looked like.

Zitat von Mazzystr <mazzystr@xxxxxxxxx>:

> I applied latest os updates and rebooted my hosts.  Now all my osds
fail to
> start.
>
> # cat /etc/os-release
> NAME="openSUSE Tumbleweed"
> # VERSION="20220207"
> ID="opensuse-tumbleweed"
> ID_LIKE="opensuse suse"
> VERSION_ID="20220207"
>
> # uname -a
> Linux cube 5.16.5-1-default #1 SMP PREEMPT Thu Feb 3 05:26:48 UTC
2022
> (1af4009) x86_64 x86_64 x86_64 GNU/Linux
>
> container image: v16.2.7 / v16.2.7-20220201
>
> osd debug log shows the following
>   -11> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1 bluefs
add_block_device
> bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 50 GiB
>    -10> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
option
> max_total_wal_size = 1073741824
>     -9> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
option
> compaction_readahead_size = 2097152
>     -8> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
option
> max_write_buffer_number = 4
>     -7> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
option
> max_background_compactions = 2
>     -6> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
option
> compression = kNoCompression
>     -5> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
option
> writable_file_max_buffer_size = 0
>     -4> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
option
> min_write_buffer_number_to_merge = 1
>     -3> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
option
> recycle_log_file_num = 4
>     -2> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
option
> write_buffer_size = 268435456
>     -1> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1 bluefs mount
>      0> 2022-02-10T19:14:48.387-0800 7ff1be4c3080 -1 *** Caught
signal
> (Aborted) **
>  in thread 7ff1be4c3080 thread_name:ceph-osd
>
>  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
pacific
> (stable)
>  1: /lib64/libpthread.so.0(+0x12c20) [0x7ff1bc465c20]
>  2: gsignal()
>  3: abort()
>  4: /lib64/libstdc++.so.6(+0x9009b) [0x7ff1bba7c09b]
>  5: /lib64/libstdc++.so.6(+0x9653c) [0x7ff1bba8253c]
>  6: /lib64/libstdc++.so.6(+0x96597) [0x7ff1bba82597]
>  7: /lib64/libstdc++.so.6(+0x967f8) [0x7ff1bba827f8]
>  8: ceph-osd(+0x56301f) [0x559ff6d6301f]
>  9: (BlueFS::_open_super()+0x18c) [0x559ff745f08c]
>  10: (BlueFS::mount()+0xeb) [0x559ff748085b]
>  11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x559ff735e464]
>  12: (BlueStore::_prepare_db_environment(bool, bool,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >*, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >*)+0x6d9)
[0x559ff735f5b9]
>  13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x559ff73608b5]
>  14: (BlueStore::_open_db_and_around(bool, bool)+0x273)
[0x559ff73cba33]
>  15: (BlueStore::_mount()+0x204) [0x559ff73ce974]
>  16: (OSD::init()+0x380) [0x559ff6ea2400]
>  17: main()
>  18: __libc_start_main()
>  19: _start()
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
needed
> to interpret this.
>
>
> The process log shows the following
> 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> terminate called after throwing an instance of
> 'ceph::buffer::v15_2_0::malformed_input'
>   what():  void
> bluefs_super_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)
no
> longer understand old encoding version 2 < 143: Malformed input
> *** Caught signal (Aborted) **
>  in thread 7f22869e8080 thread_name:ceph-osd
>  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
pacific
> (stable)
>  1: /lib64/libpthread.so.0(+0x12c20) [0x7f228498ac20]
>  2: gsignal()
>  3: abort()
>  4: /lib64/libstdc++.so.6(+0x9009b) [0x7f2283fa109b]
>  5: /lib64/libstdc++.so.6(+0x9653c) [0x7f2283fa753c]
>  6: /lib64/libstdc++.so.6(+0x96597) [0x7f2283fa7597]
>  7: /lib64/libstdc++.so.6(+0x967f8) [0x7f2283fa77f8]
>  8: ceph-osd(+0x56301f) [0x55e6faf6301f]
>  9: (BlueFS::_open_super()+0x18c) [0x55e6fb65f08c]
>  10: (BlueFS::mount()+0xeb) [0x55e6fb68085b]
>  11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x55e6fb55e464]
>  12: (BlueStore::_prepare_db_environment(bool, bool,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >*, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >*)+0x6d9)
[0x55e6fb55f5b9]
>  13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x55e6fb5608b5]
>  14: (BlueStore::_open_db_and_around(bool, bool)+0x273)
[0x55e6fb5cba33]
>  15: (BlueStore::_mount()+0x204) [0x55e6fb5ce974]
>  16: (OSD::init()+0x380) [0x55e6fb0a2400]
>  17: main()
>  18: __libc_start_main()
>  19: _start()
> 2022-02-10T19:33:34.620-0800 7f22869e8080 -1 *** Caught signal
(Aborted) **
>  in thread 7f22869e8080 thread_name:ceph-osd
>
>
> Doesn't anyone have any ideas what could be going on here?
>
> Thanks,
> /Chris
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx