Re: osds won't start

Mazzystr <mazzystr@xxxxxxxxx> · Fri, 11 Feb 2022 15:54:19 -0800

This problem is solved.  My links are indeed swapped

host0:/var/lib/ceph/osd/ceph-0 # ls -la block*
lrwxrwxrwx 1 ceph ceph   23 Jan 15 15:13 block -> /dev/mapper/ceph-0block
lrwxrwxrwx 1 ceph ceph   24 Jan 15 15:13 block.db -> /dev/mapper/ceph--0db
lrwxrwxrwx 1 ceph ceph   25 Jan 15 15:13 block.wal -> /dev/mapper/ceph--0wal

[root@ceph_osd0 /]# ceph-bluestore-tool show-label --dev
/var/lib/ceph/osd/ceph-0/block.db
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "7755e0c2-b4bf-4cbe-bc9a-26042d5bdc52",
        "size": 49998200832,
        "btime": "2019-04-11T08:46:36.694465-0700",
        "description": "bluefs wal"
    }
}

[root@ceph_osd0 /]# ceph-bluestore-tool show-label --dev
/var/lib/ceph/osd/ceph-0/block.wal
{
    "/var/lib/ceph/osd/ceph-0/block.db": {
        "osd_uuid": "7755e0c2-b4bf-4cbe-bc9a-26042d5bdc52",
        "size": 49998200832,
        "btime": "2019-04-11T08:46:36.694465-0700",
        "description": "bluefs db"
    }
}

Good grief!  How did I miss the bad LUKS labels!  I've been looking at this
for two days now!  LOL!

host0: ~  # lsblk
nvme0n1                        259:0    0 465.8G  0 disk
└─nvme0n1p1                    259:1    0 465.8G  0 part
  ├─vg-ceph--0                 254:3    0     1G  0 lvm
  │ └─ceph-0                   254:28   0  1008M  0 crypt
/var/lib/ceph/osd/ceph-0
  ├─vg-ceph--0wal              254:4    0     1G  0 lvm
--->│ └─ceph-0db                 254:29   0  1008M  0 crypt
  ├─vg-ceph--0db               254:5    0    50G  0 lvm
--->│ └─ceph-0wal                254:39   0    50G  0 crypt
  ├─vg-ceph--1                 254:6    0     1G  0 lvm

I flipped the soft links manually and the osd fires up, mounts the
bluestore, and starts pinging all his peeps.

This was the result of bad automation that populates our /etc/crypttab.

Hopefully this exercise can help the next person with some troubleshooting
tips.

Thanks,
/Chris

On Fri, Feb 11, 2022 at 11:09 AM Mazzystr <mazzystr@xxxxxxxxx> wrote:

> I set debug {bdev, bluefs, bluestore, osd} = 20/20 and restarted osd.0
>
> Logs are here
>    -15> 2022-02-11T11:07:09.944-0800 7f93546c0080 10
> bluestore(/var/lib/ceph/osd/ceph-0/block.wal) _read_bdev_label got
> bdev(osd_uuid 7755e0c2-b4bf-4cbe-bc9a-26042d5bdc52, size 0xba4200000, btime
> 2019-04-11T08:46:36.694465-0700, desc bluefs db, 0 meta)
>    -14> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
> max_total_wal_size = 1073741824
>    -13> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
> compaction_readahead_size = 2097152
>    -12> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
> max_write_buffer_number = 4
>    -11> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
> max_background_compactions = 2
>    -10> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
> compression = kNoCompression
>     -9> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
> writable_file_max_buffer_size = 0
>     -8> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
> min_write_buffer_number_to_merge = 1
>     -7> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
> recycle_log_file_num = 4
>     -6> 2022-02-11T11:07:09.944-0800 7f93546c0080  1  set rocksdb option
> write_buffer_size = 268435456
>     -5> 2022-02-11T11:07:09.944-0800 7f93546c0080  1 bluefs mount
>     -4> 2022-02-11T11:07:09.944-0800 7f93546c0080 10 bluefs _open_super
>     -3> 2022-02-11T11:07:09.944-0800 7f93546c0080  5 bdev(0x55d345e82800
> /var/lib/ceph/osd/ceph-0/block.db) read 0x1000~1000 (direct)
>     -2> 2022-02-11T11:07:09.944-0800 7f93546c0080 20 bdev(0x55d345e82800
> /var/lib/ceph/osd/ceph-0/block.db) _aio_log_start 0x1000~1000
>     -1> 2022-02-11T11:07:09.944-0800 7f93546c0080 20 bdev(0x55d345e82800
> /var/lib/ceph/osd/ceph-0/block.db) _aio_log_finish 1 0x1000~1000
>      0> 2022-02-11T11:07:09.948-0800 7f93546c0080 -1 *** Caught signal
> (Aborted) **
>  in thread 7f93546c0080 thread_name:ceph-osd
>
>  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
> (stable)
>  1: /lib64/libpthread.so.0(+0x12c20) [0x7f9352662c20]
>  2: gsignal()
>  3: abort()
>  4: /lib64/libstdc++.so.6(+0x9009b) [0x7f9351c7909b]
>  5: /lib64/libstdc++.so.6(+0x9653c) [0x7f9351c7f53c]
>  6: /lib64/libstdc++.so.6(+0x96597) [0x7f9351c7f597]
>  7: /lib64/libstdc++.so.6(+0x967f8) [0x7f9351c7f7f8]
>  8: ceph-osd(+0x56301f) [0x55d339f6301f]
>  9: (BlueFS::_open_super()+0x18c) [0x55d33a65f08c]
>  10: (BlueFS::mount()+0xeb) [0x55d33a68085b]
>  11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x55d33a55e464]
>  12: (BlueStore::_prepare_db_environment(bool, bool,
> std::__cxx11::basic_string<char, std::char_traits<char>,
> std::allocator<char> >*, std::__cxx11::basic_string<char,
> std::char_traits<char>, std::allocator<char> >*)+0x6d9) [0x55d33a55f5b9]
>  13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x55d33a5608b5]
>  14: (BlueStore::_open_db_and_around(bool, bool)+0x273) [0x55d33a5cba33]
>  15: (BlueStore::_mount()+0x204) [0x55d33a5ce974]
>  16: (OSD::init()+0x380) [0x55d33a0a2400]
>  17: main()
>  18: __libc_start_main()
>  19: _start()
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
>
>
>
> On Fri, Feb 11, 2022 at 10:07 AM Mazzystr <mazzystr@xxxxxxxxx> wrote:
>
>> I'm suspicious of cross contamination of devices here.  I was on CentOS
>> for eons until Red Hat shenanigans pinned me to CentOS 7 and nautilus.  I
>> had very well defined udev rules that ensured dm devices were statically
>> set and owned correctly and survived reboots.
>>
>> I seem to be struggling with this in the openSuSe world.  Ownership on my
>> devices flip back to root despite my long standing udev rules bing migrated
>> over.
>>
>> I know my paths are correct though.  The osd root dirs are also lv's with
>> filesystem labels.  the block, db, wal links are correct.  db and wal are
>> lv's named appropriately (yea yea, per Sage that ship has sailed.  I LV'd
>> too). The osd drives get partitions with osd number labels.  bluestore tool
>> also confirms.
>>
>>
>> On Fri, Feb 11, 2022 at 9:14 AM Mazzystr <mazzystr@xxxxxxxxx> wrote:
>>
>>> I forgot to mention I freeze the cluster with 'ceph osd set
>>> no{down,out,backfill}'.  Then I zyp up all hosts and reboot them.  Only
>>> when everything is backup do I unset.
>>>
>>> My client IO patterns allow me to do this since it's a worm data store
>>> with long spans of time between writes and reads.  I have plenty of time to
>>> work with the community and get my store back online.
>>>
>>> This thread is really for documentation for the next person that comes
>>> along with the same problem
>>>
>>> On Fri, Feb 11, 2022 at 9:08 AM Mazzystr <mazzystr@xxxxxxxxx> wrote:
>>>
>>>> My clusters are self rolled.  My start command is as follows
>>>>
>>>> podman run -it --privileged --pid=host --cpuset-cpus 0,1 --memory 2g
>>>> --name ceph_osd0 --hostname ceph_osd0 -v /dev:/dev -v
>>>> /etc/localtime:/etc/localtime:ro -v /etc/ceph:/etc/ceph/ -v
>>>> /var/lib/ceph/osd/ceph-0:/var/lib/ceph/osd/ceph-0 -v
>>>> /var/log/ceph:/var/log/ceph -v /run/udev/:/run/udev/
>>>> ceph/ceph:v16.2.7-20220201 ceph-osd --id 0 -c /etc/ceph/ceph.conf --cluster
>>>> ceph -f
>>>>
>>>>
>>>> I jumped from the octopus img to the 16.2.7 img.  I've been running
>>>> well for awhile with no issues.  The cluster was clean, no backfills in
>>>> progressor etc  This latest zyp up and reboot and now I have osds that
>>>> don't start.
>>>>
>>>> podman image ls
>>>> quay.io/ceph/ceph                                         v16.2.7
>>>>       231fd40524c4  9 days ago    1.39 GB
>>>> quay.io/ceph/ceph
>>>> v16.2.7-20220201  231fd40524c4  9 days ago    1.39 GB
>>>>
>>>>
>>>> bluefs fails to mount up, I guess?  The headers are still readable via
>>>> bluestore tool
>>>>
>>>> ceph-bluestore-tool show-label --dev /dev/mapper/ceph-0block
>>>> {
>>>>     "/dev/mapper/ceph-0block": {
>>>>         "osd_uuid": "1234abcd-1234-abcd-1234-1234 abcd1234",
>>>>         "size": 6001171365888,
>>>>         "btime": "2019-04-11T08:46:36.013428-0700",
>>>>         "description": "main",
>>>>         "bfm_blocks": "1465129728",
>>>>         "bfm_blocks_per_key": "128",
>>>>         "bfm_bytes_per_block": "4096",
>>>>         "bfm_size": "6001171365888",
>>>>         "bluefs": "1",
>>>>         "ceph_fsid": "1234abcd-1234-abcd-1234-1234 abcd1234",
>>>>         "kv_backend": "rocksdb",
>>>>         "magic": "ceph osd volume v026",
>>>>         "mkfs_done": "yes",
>>>>         "ready": "ready",
>>>>         "require_osd_release": "16",
>>>>         "whoami": "0"
>>>>     }
>>>> }
>>>>
>>>>
>>>> On Fri, Feb 11, 2022 at 1:06 AM Eugen Block <eblock@xxxxxx> wrote:
>>>>
>>>>> Can you share some more information how exactly you upgraded? It
>>>>> looks
>>>>> like a cephadm managed cluster. Did you intall OS updates on all
>>>>> nodes
>>>>> without waiting for the first one to recover? Maybe I'm misreading so
>>>>> please clarify what your update process looked like.
>>>>>
>>>>>
>>>>> Zitat von Mazzystr <mazzystr@xxxxxxxxx>:
>>>>>
>>>>> > I applied latest os updates and rebooted my hosts.  Now all my osds
>>>>> fail to
>>>>> > start.
>>>>> >
>>>>> > # cat /etc/os-release
>>>>> > NAME="openSUSE Tumbleweed"
>>>>> > # VERSION="20220207"
>>>>> > ID="opensuse-tumbleweed"
>>>>> > ID_LIKE="opensuse suse"
>>>>> > VERSION_ID="20220207"
>>>>> >
>>>>> > # uname -a
>>>>> > Linux cube 5.16.5-1-default #1 SMP PREEMPT Thu Feb 3 05:26:48 UTC
>>>>> 2022
>>>>> > (1af4009) x86_64 x86_64 x86_64 GNU/Linux
>>>>> >
>>>>> > container image: v16.2.7 / v16.2.7-20220201
>>>>> >
>>>>> > osd debug log shows the following
>>>>> >   -11> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1 bluefs
>>>>> add_block_device
>>>>> > bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 50 GiB
>>>>> >    -10> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
>>>>> option
>>>>> > max_total_wal_size = 1073741824
>>>>> >     -9> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
>>>>> option
>>>>> > compaction_readahead_size = 2097152
>>>>> >     -8> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
>>>>> option
>>>>> > max_write_buffer_number = 4
>>>>> >     -7> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
>>>>> option
>>>>> > max_background_compactions = 2
>>>>> >     -6> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
>>>>> option
>>>>> > compression = kNoCompression
>>>>> >     -5> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
>>>>> option
>>>>> > writable_file_max_buffer_size = 0
>>>>> >     -4> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
>>>>> option
>>>>> > min_write_buffer_number_to_merge = 1
>>>>> >     -3> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
>>>>> option
>>>>> > recycle_log_file_num = 4
>>>>> >     -2> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb
>>>>> option
>>>>> > write_buffer_size = 268435456
>>>>> >     -1> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1 bluefs mount
>>>>> >      0> 2022-02-10T19:14:48.387-0800 7ff1be4c3080 -1 *** Caught
>>>>> signal
>>>>> > (Aborted) **
>>>>> >  in thread 7ff1be4c3080 thread_name:ceph-osd
>>>>> >
>>>>> >  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
>>>>> pacific
>>>>> > (stable)
>>>>> >  1: /lib64/libpthread.so.0(+0x12c20) [0x7ff1bc465c20]
>>>>> >  2: gsignal()
>>>>> >  3: abort()
>>>>> >  4: /lib64/libstdc++.so.6(+0x9009b) [0x7ff1bba7c09b]
>>>>> >  5: /lib64/libstdc++.so.6(+0x9653c) [0x7ff1bba8253c]
>>>>> >  6: /lib64/libstdc++.so.6(+0x96597) [0x7ff1bba82597]
>>>>> >  7: /lib64/libstdc++.so.6(+0x967f8) [0x7ff1bba827f8]
>>>>> >  8: ceph-osd(+0x56301f) [0x559ff6d6301f]
>>>>> >  9: (BlueFS::_open_super()+0x18c) [0x559ff745f08c]
>>>>> >  10: (BlueFS::mount()+0xeb) [0x559ff748085b]
>>>>> >  11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x559ff735e464]
>>>>> >  12: (BlueStore::_prepare_db_environment(bool, bool,
>>>>> > std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>> > std::allocator<char> >*, std::__cxx11::basic_string<char,
>>>>> > std::char_traits<char>, std::allocator<char> >*)+0x6d9)
>>>>> [0x559ff735f5b9]
>>>>> >  13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x559ff73608b5]
>>>>> >  14: (BlueStore::_open_db_and_around(bool, bool)+0x273)
>>>>> [0x559ff73cba33]
>>>>> >  15: (BlueStore::_mount()+0x204) [0x559ff73ce974]
>>>>> >  16: (OSD::init()+0x380) [0x559ff6ea2400]
>>>>> >  17: main()
>>>>> >  18: __libc_start_main()
>>>>> >  19: _start()
>>>>> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>> needed
>>>>> > to interpret this.
>>>>> >
>>>>> >
>>>>> > The process log shows the following
>>>>> > 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
>>>>> > dangerous and experimental features are enabled: bluestore,rocksdb
>>>>> > 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
>>>>> > dangerous and experimental features are enabled: bluestore,rocksdb
>>>>> > 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
>>>>> > dangerous and experimental features are enabled: bluestore,rocksdb
>>>>> > terminate called after throwing an instance of
>>>>> > 'ceph::buffer::v15_2_0::malformed_input'
>>>>> >   what():  void
>>>>> > bluefs_super_t::decode(ceph::buffer::v15_2_0::list::const_iterator&)
>>>>> no
>>>>> > longer understand old encoding version 2 < 143: Malformed input
>>>>> > *** Caught signal (Aborted) **
>>>>> >  in thread 7f22869e8080 thread_name:ceph-osd
>>>>> >  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
>>>>> pacific
>>>>> > (stable)
>>>>> >  1: /lib64/libpthread.so.0(+0x12c20) [0x7f228498ac20]
>>>>> >  2: gsignal()
>>>>> >  3: abort()
>>>>> >  4: /lib64/libstdc++.so.6(+0x9009b) [0x7f2283fa109b]
>>>>> >  5: /lib64/libstdc++.so.6(+0x9653c) [0x7f2283fa753c]
>>>>> >  6: /lib64/libstdc++.so.6(+0x96597) [0x7f2283fa7597]
>>>>> >  7: /lib64/libstdc++.so.6(+0x967f8) [0x7f2283fa77f8]
>>>>> >  8: ceph-osd(+0x56301f) [0x55e6faf6301f]
>>>>> >  9: (BlueFS::_open_super()+0x18c) [0x55e6fb65f08c]
>>>>> >  10: (BlueFS::mount()+0xeb) [0x55e6fb68085b]
>>>>> >  11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x55e6fb55e464]
>>>>> >  12: (BlueStore::_prepare_db_environment(bool, bool,
>>>>> > std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>> > std::allocator<char> >*, std::__cxx11::basic_string<char,
>>>>> > std::char_traits<char>, std::allocator<char> >*)+0x6d9)
>>>>> [0x55e6fb55f5b9]
>>>>> >  13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x55e6fb5608b5]
>>>>> >  14: (BlueStore::_open_db_and_around(bool, bool)+0x273)
>>>>> [0x55e6fb5cba33]
>>>>> >  15: (BlueStore::_mount()+0x204) [0x55e6fb5ce974]
>>>>> >  16: (OSD::init()+0x380) [0x55e6fb0a2400]
>>>>> >  17: main()
>>>>> >  18: __libc_start_main()
>>>>> >  19: _start()
>>>>> > 2022-02-10T19:33:34.620-0800 7f22869e8080 -1 *** Caught signal
>>>>> (Aborted) **
>>>>> >  in thread 7f22869e8080 thread_name:ceph-osd
>>>>> >
>>>>> >
>>>>> > Doesn't anyone have any ideas what could be going on here?
>>>>> >
>>>>> > Thanks,
>>>>> > /Chris
>>>>> > _______________________________________________
>>>>> > ceph-users mailing list -- ceph-users@xxxxxxx
>>>>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>
>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>>
>>>>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx