Re: Issues with the ceph-bluestore-tool during cluster upgrade from Mimic to Nautilus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 9/25/2020 6:07 PM, Saber@xxxxxxxxxxxxxxxxx wrote:
Hi Igor,

The only thing abnormal about this osdstore is that it was created by Mimic 13.2.8 and I can see that the OSDs size of this osdstore are not the same as the others in the cluster (while they should be exactly the same size).

Can it be https://tracker.ceph.com/issues/39151 ?

hmm, may be... Did you change H/W at some point for this OSD's node as it happened in the ticket?

And it's still unclear to me if the issue is reproducible for you.

Could you please also run fsck (at first) and then repair for this OSD and collect log(s).


Thanks,

Igor



Thanks!
Saber
CTO @PlanetHoster

On Sep 25, 2020, at 5:46 AM, Igor Fedotov <ifedotov@xxxxxxx <mailto:ifedotov@xxxxxxx>> wrote:

Hi Saber,

I don't think this is related. New assertion happens along the write path while the original one occurred on allocator shutdown.


Unfortunately there are not much information to  troubleshoot this... Are you able to reproduce the case?


Thanks,

Igor

On 9/25/2020 4:21 AM, Saber@xxxxxxxxxxxxxxxxx wrote:
Hi Igor,

We had an osd crash a week after running Nautilus. I have attached the logs, is it related to the same bug?




Thanks,
Saber
CTO @PlanetHoster

On Sep 14, 2020, at 10:22 AM, Igor Fedotov <ifedotov@xxxxxxx <mailto:ifedotov@xxxxxxx>> wrote:

Thanks!

Now got the root cause. The fix is on its way...

Meanwhile you might want to try to workaround the issue via setting "bluestore_hybrid_alloc_mem_cap" to 0 or using different allocator, e.g. avl for bluestore_allocator (and optionally for bluefs_allocator too).


Hope this helps,

Igor.



On 9/14/2020 5:02 PM, Jean-Philippe Méthot wrote:
Alright, here’s the full log file.





Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






Le 14 sept. 2020 à 06:49, Igor Fedotov <ifedotov@xxxxxxx <mailto:ifedotov@xxxxxxx>> a écrit :

Well, I can see duplicate admin socket command registration/de-registration (and the second de-registration asserts) but don't understand how this could happen.

Would you share the full log, please?


Thanks,

Igor

On 9/11/2020 7:26 PM, Jean-Philippe Méthot wrote:
Here’s the out file, as requested.




Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






Le 11 sept. 2020 à 10:38, Igor Fedotov <ifedotov@xxxxxxx <mailto:ifedotov@xxxxxxx>> a écrit :

Could you please run:

CEPH_ARGS="--log-file log --debug-asok 5" ceph-bluestore-tool repair --path <...> ; cat log | grep asok > out

and share 'out' file.


Thanks,

Igor

On 9/11/2020 5:15 PM, Jean-Philippe Méthot wrote:
Hi,

We’re upgrading our cluster OSD node per OSD node to Nautilus from Mimic. From some release notes, it was recommended to run the following command to fix stats after an upgrade :

ceph-bluestore-tool repair --path /var/lib/ceph/osd/ceph-0

However, running that command gives us the following error message:

/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc <http://allocator.cc/>: In  function 'virtual Allocator::SocketHook::~SocketHook()' thread 7f1a6467eec0 time 2020-09-10 14:40:25.872353 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc <http://allocator.cc/>: 53
: FAILED ceph_assert(r == 0)
 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f1a5a823025]
 2: (()+0x25c1ed) [0x7f1a5a8231ed]
 3: (()+0x3c7a4f) [0x55b33537ca4f]
 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]  7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) [0x55b3352749a1]
 8: (main()+0x10b3) [0x55b335187493]
 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 10: (()+0x1f9b5f) [0x55b3351aeb5f]
2020-09-10 14:40:25.873 7f1a6467eec0 -1 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc <http://allocator.cc/>: In function 'virtual Allocator::SocketHook::~SocketHook()' thread 7f1a6467eec0 time 2020-09-10 14:40:25.872353 /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/gigantic/release/14.2.11/rpm/el7/BUILD/ceph-14.2.11/src/os/bluestore/Allocator.cc <http://allocator.cc/>: 53: FAILED ceph_assert(r == 0)

 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x14a) [0x7f1a5a823025]
 2: (()+0x25c1ed) [0x7f1a5a8231ed]
 3: (()+0x3c7a4f) [0x55b33537ca4f]
 4: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 5: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 6: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]  7: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) [0x55b3352749a1]
 8: (main()+0x10b3) [0x55b335187493]
 9: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 10: (()+0x1f9b5f) [0x55b3351aeb5f]
*** Caught signal (Aborted) **
 in thread 7f1a6467eec0 thread_name:ceph-bluestore-
ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
 1: (()+0xf630) [0x7f1a58cf0630]
 2: (gsignal()+0x37) [0x7f1a574be387]
 3: (abort()+0x148) [0x7f1a574bfa78]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7f1a5a823074]
 5: (()+0x25c1ed) [0x7f1a5a8231ed]
 6: (()+0x3c7a4f) [0x55b33537ca4f]
 7: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 8: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 9: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]  10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) [0x55b3352749a1]
 11: (main()+0x10b3) [0x55b335187493]
 12: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 13: (()+0x1f9b5f) [0x55b3351aeb5f]
2020-09-10 14:40:25.874 7f1a6467eec0 -1 *** Caught signal (Aborted) **
 in thread 7f1a6467eec0 thread_name:ceph-bluestore-

 ceph version 14.2.11 (f7fdb2f52131f54b891a2ec99d8205561242cdaf) nautilus (stable)
 1: (()+0xf630) [0x7f1a58cf0630]
 2: (gsignal()+0x37) [0x7f1a574be387]
 3: (abort()+0x148) [0x7f1a574bfa78]
 4: (ceph::__ceph_assert_fail(char const*, char const*, int, char const*)+0x199) [0x7f1a5a823074]
 5: (()+0x25c1ed) [0x7f1a5a8231ed]
 6: (()+0x3c7a4f) [0x55b33537ca4f]
 7: (HybridAllocator::~HybridAllocator()+0x17) [0x55b3353ac517]
 8: (BlueStore::_close_alloc()+0x42) [0x55b3351f2082]
 9: (BlueStore::_close_db_and_around(bool)+0x2f8) [0x55b335274528]  10: (BlueStore::_fsck(BlueStore::FSCKDepth, bool)+0x2c1) [0x55b3352749a1]
 11: (main()+0x10b3) [0x55b335187493]
 12: (__libc_start_main()+0xf5) [0x7f1a574aa555]
 13: (()+0x1f9b5f) [0x55b3351aeb5f]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

What could be the source of this error? I haven’t found much of anything about it online.


Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423






_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx <mailto:ceph-users@xxxxxxx> To unsubscribe send an email to ceph-users-leave@xxxxxxx <mailto:ceph-users-leave@xxxxxxx>




_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux