Re: osds won't start

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I forgot to mention I freeze the cluster with 'ceph osd set
no{down,out,backfill}'.  Then I zyp up all hosts and reboot them.  Only
when everything is backup do I unset.

My client IO patterns allow me to do this since it's a worm data store with
long spans of time between writes and reads.  I have plenty of time to work
with the community and get my store back online.

This thread is really for documentation for the next person that comes
along with the same problem

On Fri, Feb 11, 2022 at 9:08 AM Mazzystr <mazzystr@xxxxxxxxx> wrote:

> My clusters are self rolled.  My start command is as follows
>
> podman run -it --privileged --pid=host --cpuset-cpus 0,1 --memory 2g
> --name ceph_osd0 --hostname ceph_osd0 -v /dev:/dev -v
> /etc/localtime:/etc/localtime:ro -v /etc/ceph:/etc/ceph/ -v
> /var/lib/ceph/osd/ceph-0:/var/lib/ceph/osd/ceph-0 -v
> /var/log/ceph:/var/log/ceph -v /run/udev/:/run/udev/
> ceph/ceph:v16.2.7-20220201 ceph-osd --id 0 -c /etc/ceph/ceph.conf --cluster
> ceph -f
>
>
> I jumped from the octopus img to the 16.2.7 img.  I've been running well
> for awhile with no issues.  The cluster was clean, no backfills in
> progressor etc  This latest zyp up and reboot and now I have osds that
> don't start.
>
> podman image ls
> quay.io/ceph/ceph                                         v16.2.7
>   231fd40524c4  9 days ago    1.39 GB
> quay.io/ceph/ceph
> v16.2.7-20220201  231fd40524c4  9 days ago    1.39 GB
>
>
> bluefs fails to mount up, I guess?  The headers are still readable via
> bluestore tool
>
> ceph-bluestore-tool show-label --dev /dev/mapper/ceph-0block
> {
>     "/dev/mapper/ceph-0block": {
>         "osd_uuid": "1234abcd-1234-abcd-1234-1234 abcd1234",
>         "size": 6001171365888,
>         "btime": "2019-04-11T08:46:36.013428-0700",
>         "description": "main",
>         "bfm_blocks": "1465129728",
>         "bfm_blocks_per_key": "128",
>         "bfm_bytes_per_block": "4096",
>         "bfm_size": "6001171365888",
>         "bluefs": "1",
>         "ceph_fsid": "1234abcd-1234-abcd-1234-1234 abcd1234",
>         "kv_backend": "rocksdb",
>         "magic": "ceph osd volume v026",
>         "mkfs_done": "yes",
>         "ready": "ready",
>         "require_osd_release": "16",
>         "whoami": "0"
>     }
> }
>
>
> On Fri, Feb 11, 2022 at 1:06 AM Eugen Block <eblock@xxxxxx> wrote:
>
>> Can you share some more information how exactly you upgraded? It looks
>> like a cephadm managed cluster. Did you intall OS updates on all nodes
>> without waiting for the first one to recover? Maybe I'm misreading so
>> please clarify what your update process looked like.
>>
>>
>> Zitat von Mazzystr <mazzystr@xxxxxxxxx>:
>>
>> > I applied latest os updates and rebooted my hosts.  Now all my osds
>> fail to
>> > start.
>> >
>> > # cat /etc/os-release
>> > NAME="openSUSE Tumbleweed"
>> > # VERSION="20220207"
>> > ID="opensuse-tumbleweed"
>> > ID_LIKE="opensuse suse"
>> > VERSION_ID="20220207"
>> >
>> > # uname -a
>> > Linux cube 5.16.5-1-default #1 SMP PREEMPT Thu Feb 3 05:26:48 UTC 2022
>> > (1af4009) x86_64 x86_64 x86_64 GNU/Linux
>> >
>> > container image: v16.2.7 / v16.2.7-20220201
>> >
>> > osd debug log shows the following
>> >   -11> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1 bluefs
>> add_block_device
>> > bdev 0 path /var/lib/ceph/osd/ceph-0/block.wal size 50 GiB
>> >    -10> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
>> > max_total_wal_size = 1073741824
>> >     -9> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
>> > compaction_readahead_size = 2097152
>> >     -8> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
>> > max_write_buffer_number = 4
>> >     -7> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
>> > max_background_compactions = 2
>> >     -6> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
>> > compression = kNoCompression
>> >     -5> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
>> > writable_file_max_buffer_size = 0
>> >     -4> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
>> > min_write_buffer_number_to_merge = 1
>> >     -3> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
>> > recycle_log_file_num = 4
>> >     -2> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1  set rocksdb option
>> > write_buffer_size = 268435456
>> >     -1> 2022-02-10T19:14:48.383-0800 7ff1be4c3080  1 bluefs mount
>> >      0> 2022-02-10T19:14:48.387-0800 7ff1be4c3080 -1 *** Caught signal
>> > (Aborted) **
>> >  in thread 7ff1be4c3080 thread_name:ceph-osd
>> >
>> >  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
>> > (stable)
>> >  1: /lib64/libpthread.so.0(+0x12c20) [0x7ff1bc465c20]
>> >  2: gsignal()
>> >  3: abort()
>> >  4: /lib64/libstdc++.so.6(+0x9009b) [0x7ff1bba7c09b]
>> >  5: /lib64/libstdc++.so.6(+0x9653c) [0x7ff1bba8253c]
>> >  6: /lib64/libstdc++.so.6(+0x96597) [0x7ff1bba82597]
>> >  7: /lib64/libstdc++.so.6(+0x967f8) [0x7ff1bba827f8]
>> >  8: ceph-osd(+0x56301f) [0x559ff6d6301f]
>> >  9: (BlueFS::_open_super()+0x18c) [0x559ff745f08c]
>> >  10: (BlueFS::mount()+0xeb) [0x559ff748085b]
>> >  11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x559ff735e464]
>> >  12: (BlueStore::_prepare_db_environment(bool, bool,
>> > std::__cxx11::basic_string<char, std::char_traits<char>,
>> > std::allocator<char> >*, std::__cxx11::basic_string<char,
>> > std::char_traits<char>, std::allocator<char> >*)+0x6d9) [0x559ff735f5b9]
>> >  13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x559ff73608b5]
>> >  14: (BlueStore::_open_db_and_around(bool, bool)+0x273) [0x559ff73cba33]
>> >  15: (BlueStore::_mount()+0x204) [0x559ff73ce974]
>> >  16: (OSD::init()+0x380) [0x559ff6ea2400]
>> >  17: main()
>> >  18: __libc_start_main()
>> >  19: _start()
>> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>> needed
>> > to interpret this.
>> >
>> >
>> > The process log shows the following
>> > 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
>> > dangerous and experimental features are enabled: bluestore,rocksdb
>> > 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
>> > dangerous and experimental features are enabled: bluestore,rocksdb
>> > 2022-02-10T19:33:31.852-0800 7f22869e8080 -1 WARNING: the following
>> > dangerous and experimental features are enabled: bluestore,rocksdb
>> > terminate called after throwing an instance of
>> > 'ceph::buffer::v15_2_0::malformed_input'
>> >   what():  void
>> > bluefs_super_t::decode(ceph::buffer::v15_2_0::list::const_iterator&) no
>> > longer understand old encoding version 2 < 143: Malformed input
>> > *** Caught signal (Aborted) **
>> >  in thread 7f22869e8080 thread_name:ceph-osd
>> >  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific
>> > (stable)
>> >  1: /lib64/libpthread.so.0(+0x12c20) [0x7f228498ac20]
>> >  2: gsignal()
>> >  3: abort()
>> >  4: /lib64/libstdc++.so.6(+0x9009b) [0x7f2283fa109b]
>> >  5: /lib64/libstdc++.so.6(+0x9653c) [0x7f2283fa753c]
>> >  6: /lib64/libstdc++.so.6(+0x96597) [0x7f2283fa7597]
>> >  7: /lib64/libstdc++.so.6(+0x967f8) [0x7f2283fa77f8]
>> >  8: ceph-osd(+0x56301f) [0x55e6faf6301f]
>> >  9: (BlueFS::_open_super()+0x18c) [0x55e6fb65f08c]
>> >  10: (BlueFS::mount()+0xeb) [0x55e6fb68085b]
>> >  11: (BlueStore::_open_bluefs(bool, bool)+0x94) [0x55e6fb55e464]
>> >  12: (BlueStore::_prepare_db_environment(bool, bool,
>> > std::__cxx11::basic_string<char, std::char_traits<char>,
>> > std::allocator<char> >*, std::__cxx11::basic_string<char,
>> > std::char_traits<char>, std::allocator<char> >*)+0x6d9) [0x55e6fb55f5b9]
>> >  13: (BlueStore::_open_db(bool, bool, bool)+0x155) [0x55e6fb5608b5]
>> >  14: (BlueStore::_open_db_and_around(bool, bool)+0x273) [0x55e6fb5cba33]
>> >  15: (BlueStore::_mount()+0x204) [0x55e6fb5ce974]
>> >  16: (OSD::init()+0x380) [0x55e6fb0a2400]
>> >  17: main()
>> >  18: __libc_start_main()
>> >  19: _start()
>> > 2022-02-10T19:33:34.620-0800 7f22869e8080 -1 *** Caught signal
>> (Aborted) **
>> >  in thread 7f22869e8080 thread_name:ceph-osd
>> >
>> >
>> > Doesn't anyone have any ideas what could be going on here?
>> >
>> > Thanks,
>> > /Chris
>> > _______________________________________________
>> > ceph-users mailing list -- ceph-users@xxxxxxx
>> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>
>>
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux