Re: Issues with the ceph-bluestore-tool during cluster upgrade from Mimic to Nautilus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks!

Now got the root cause. The fix is on its way...

Meanwhile you might want to try to workaround the issue via setting "bluestore_hybrid_alloc_mem_cap" to 0 or using different allocator, e.g. avl for bluestore_allocator (and optionally for bluefs_allocator too).


Hope this helps,

Igor.



On 9/14/2020 5:02 PM, Jean-Philippe Méthot wrote:
Alright, here’s the full log file.





Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






Le 14 sept. 2020 à 06:49, Igor Fedotov <ifedotov@xxxxxxx <mailto:ifedotov@xxxxxxx>> a écrit :

Well, I can see duplicate admin socket command registration/de-registration (and the second de-registration asserts) but don't understand how this could happen.

Would you share the full log, please?


Thanks,

Igor

On 9/11/2020 7:26 PM, Jean-Philippe Méthot wrote:
Here’s the out file, as requested.




Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






Le 11 sept. 2020 à 10:38, Igor Fedotov <ifedotov@xxxxxxx <mailto:ifedotov@xxxxxxx>> a écrit :

Could you please run:

CEPH_ARGS="--log-file log --debug-asok 5" ceph-bluestore-tool repair --path <...> ; cat log | grep asok > out

and share 'out' file.


Thanks,

Igor

On 9/11/2020 5:15 PM, Jean-Philippe Méthot wrote:
Hi,

We’re upgrading our cluster OSD node per OSD node to Nautilus from Mimic. From some release notes, it was recommended to run the following command to fix stats after an upgrade :

ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-0

However, running that command gives us the following error message:

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc <http://allocator.cc/>: In  function 'virtual Allocator::SocketHook::~SocketHook()' thread 7f1a6467eec0 time 2020-09-10 14:40:25.872353 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc <http://allocator.cc/>: 53
: FAILED ceph_assert(r == 0)
 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f1a5a823025]
 2: (()+0x25c1ed) [0x7f1a5a8231ed]
 3: (()+0x3c7a4f) [0x55b33537ca4f]
 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) [0x55b3352749a1]
 8: (main()+0x10b3) [0x55b335187493]
 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 10: (()+0x1f9b5f) [0x55b3351aeb5f]
2020-09-10 14:40:25.873 7f1a6467eec0 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc <http://allocator.cc/>: In function 'virtual Allocator::SocketHook::~SocketHook()' thread 7f1a6467eec0 time 2020-09-10 14:40:25.872353 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc <http://allocator.cc/>: 53: FAILED ceph_assert(r == 0)

 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f1a5a823025]
 2: (()+0x25c1ed) [0x7f1a5a8231ed]
 3: (()+0x3c7a4f) [0x55b33537ca4f]
 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
 7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) [0x55b3352749a1]
 8: (main()+0x10b3) [0x55b335187493]
 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 10: (()+0x1f9b5f) [0x55b3351aeb5f]
*** Caught signal (Aborted) **
 in thread 7f1a6467eec0 thread_name:ceph-bluestore-
ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
 1: (()+0xf630) [0x7f1a58cf0630]
 2: (gsignal()+0x37) [0x7f1a574be387]
 3: (abort()+0x148) [0x7f1a574bfa78]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7f1a5a823074]
 5: (()+0x25c1ed) [0x7f1a5a8231ed]
 6: (()+0x3c7a4f) [0x55b33537ca4f]
 7: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 8: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 9: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
 10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) [0x55b3352749a1]
 11: (main()+0x10b3) [0x55b335187493]
 12: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 13: (()+0x1f9b5f) [0x55b3351aeb5f]
2020-09-10 14:40:25.874 7f1a6467eec0 -1 *** Caught signal (Aborted) **
 in thread 7f1a6467eec0 thread_name:ceph-bluestore-

 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
 1: (()+0xf630) [0x7f1a58cf0630]
 2: (gsignal()+0x37) [0x7f1a574be387]
 3: (abort()+0x148) [0x7f1a574bfa78]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7f1a5a823074]
 5: (()+0x25c1ed) [0x7f1a5a8231ed]
 6: (()+0x3c7a4f) [0x55b33537ca4f]
 7: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 8: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 9: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]
 10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) [0x55b3352749a1]
 11: (main()+0x10b3) [0x55b335187493]
 12: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 13: (()+0x1f9b5f) [0x55b3351aeb5f]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

What could be the source of this error? I haven’t found much of anything about it online.


Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>


_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux