Re: resharding RocksDB after upgrade to Pacific breaks OSDs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



If someone can point me at the errant docs locus I'll make it right.

> On Nov 3, 2023, at 11:45, Laura Flores <lflores@xxxxxxxxxx> wrote:
> 
> Yes, Josh beat me to it- this is an issue of incorrectly documenting the
> command. You can try the solution posted in the tracker issue.
> 
> On Fri, Nov 3, 2023 at 10:43 AM Josh Baergen <jbaergen@xxxxxxxxxxxxxxxx>
> wrote:
> 
>> The ticket has been updated, but it's probably important enough to
>> state on the list as well: The documentation is currently wrong in a
>> way that running the command as documented will cause this corruption.
>> The correct command to run is:
>> 
>>       ceph-bluestore-tool \
>>         --path <data path> \
>>        --sharding="m(3) p(3,0-12)
>> O(3,0-13)=block_cache={type=binned_lru} L P" \
>>         reshard
>> 
>> Josh
>> 
>> On Fri, Nov 3, 2023 at 7:58 AM Denis Polom <denispolom@xxxxxxxxx> wrote:
>>> 
>>> Hi,
>>> 
>>> yes, exactly. I had to recreate OSD as well because daemon wasn't able
>>> to start.
>>> 
>>> It's obviously a bug and should be fixed either in documentation or code.
>>> 
>>> 
>>> On 11/3/23 11:45, Eugen Block wrote:
>>>> Hi,
>>>> 
>>>> this seems like a dangerous operation to me, I tried the same on two
>>>> different virtual clusters, Reef and Pacific (all upgraded from
>>>> previous releases). In Reef the reshard fails alltogether and the OSD
>>>> fails to start, I had to recreate it. In Pacific the reshard reports a
>>>> successful operation, but the OSD fails to start as well, with the
>>>> same stack trace as yours. I wasn't aware of this resharding operation
>>>> yet, but is it really safe? I don't have an idea how to fix, I just
>>>> recreated the OSDs.
>>>> 
>>>> 
>>>> Zitat von Denis Polom <denispolom@xxxxxxxxx>:
>>>> 
>>>>> Hi
>>>>> 
>>>>> we upgraded our Ceph cluster from latest Octopus to Pacific 16.2.14
>>>>> and then we followed the docs
>>>>> (
>> https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#rocksdb-sharding
>>>>> <
>> https://docs.ceph.com/en/latest/rados/configuration/bluestore-config-ref/#rocksdb-sharding
>>> )
>>>>> to reshard RocksDB on our OSDs.
>>>>> 
>>>>> Despite resharding reports operation as successful, OSD fails to
>> start.
>>>>> 
>>>>> # ceph-bluestore-tool  --path /var/lib/ceph/osd/ceph-5/
>>>>> --sharding="m(3) p(3,0-12) o(3,0-13)=block_cache={type=binned_lru} l
>>>>> p" reshard
>>>>> reshard success
>>>>> 
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:
>>>>> /build/ceph-16.2.14/src/kv/RocksDBStore.cc: 1223: FAILED
>>>>> ceph_assert(recreate_mode)
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  ceph version 16.2.14
>>>>> (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable)
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  1:
>>>>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>> const*)+0x14b) [0x564047cb92b2]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  2:
>>>>> /usr/bin/ceph-osd(+0xaa948a) [0x564047cb948a]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  3:
>>>>> (RocksDBStore::do_open(std::ostream&, bool, bool,
>>>>> std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>> std::allocator<char> > const&)+0x1609) [0x564048794829]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  4:
>>>>> (BlueStore::_open_db(bool, bool, bool)+0x601) [0x564048240421]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  5:
>>>>> (BlueStore::_open_db_and_around(bool, bool)+0x26b) [0x5640482a5f8b]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  6:
>>>>> (BlueStore::_mount()+0x9c) [0x5640482a896c]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  7: (OSD::init()+0x38a)
>>>>> [0x564047daacea]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  8: main()
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  9: __libc_start_main()
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  10: _start()
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:      0>
>>>>> 2023-10-30T12:44:17.088+0000 7f4971ed2100 -1 *** Caught signal
>>>>> (Aborted) **
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  in thread 7f4971ed2100
>>>>> thread_name:ceph-osd
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  ceph version 16.2.14
>>>>> (238ba602515df21ea7ffc75c88db29f9e5ef12c9) pacific (stable)
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  1:
>>>>> /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7f4972921730]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  2: gsignal()
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  3: abort()
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  4:
>>>>> (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>>>> const*)+0x19c) [0x564047cb9303]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  5:
>>>>> /usr/bin/ceph-osd(+0xaa948a) [0x564047cb948a]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  6:
>>>>> (RocksDBStore::do_open(std::ostream&, bool, bool,
>>>>> std::__cxx11::basic_string<char, std::char_traits<char>,
>>>>> std::allocator<char> > const&)+0x1609) [0x564048794829]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  7:
>>>>> (BlueStore::_open_db(bool, bool, bool)+0x601) [0x564048240421]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  8:
>>>>> (BlueStore::_open_db_and_around(bool, bool)+0x26b) [0x5640482a5f8b]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  9:
>>>>> (BlueStore::_mount()+0x9c) [0x5640482a896c]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  10: (OSD::init()+0x38a)
>>>>> [0x564047daacea]
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  11: main()
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  12: __libc_start_main()
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  13: _start()
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:  NOTE: a copy of the
>>>>> executable, or `objdump -rdS <executable>` is needed to interpret
>> this.
>>>>> Oct 30 12:44:17 octopus2 ceph-osd[4521]:     -1>
>>>>> 2023-10-30T12:44:17.084+0000 7f4971ed2100 -1
>>>>> /build/ceph-16.2.14/src/kv/RocksDBStore.cc: In function 'int
>>>>> RocksDBStore::do_open(std::ostream&, bool, bool, const string&)'
>>>>> thread 7f4971ed2100 time 2023-10-30T12:44:17.087172+0000
>>>>> 
>>>>> I've submitted bug report here https://tracker.ceph.com/issues/63353
>>>>> but may be community here have some ideas how to fix it unless it's
>>>>> really a bug.
>>>>> 
>>>>> Thanks
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>> 
>>>> 
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> _______________________________________________
>> ceph-users mailing list -- ceph-users@xxxxxxx
>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>> 
> 
> 
> -- 
> 
> Laura Flores
> 
> She/Her/Hers
> 
> Software Engineer, Ceph Storage <https://ceph.io>
> 
> Chicago, IL
> 
> lflores@xxxxxxx | lflores@xxxxxxxxxx <lflores@xxxxxxxxxx>
> M: +17087388804
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux