Re: Octopus: Recovery and backfilling causes OSDs to crash after upgrading from nautilus to octopus

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



An update about the progression of this issue:

After a few hours of normal operation the problem is now back in full swing.

About ten, this time different osd's, have started crashing on segfaults again.

kind regards,

Wout
42on

On 2020-07-06 09:23, Wout van Heeswijk wrote:
Hi Dan,

Yes the require-osd-release is set to octopus. I did see the threads about the memory consumption with omap conversion. Since we did not experience any problems at all with the restarting of the osd's in terms of load or time I don't expect this is the issue. We gained control over the cluster by stopping the troubled osd's. We then manually exported some of the placement group (shards) from the troubled osd's and imported them into osd's that had no problems. We are now almost there.

kind regards,

Wout
42on

On 2020-07-05 22:33, Dan van der Ster wrote:
Since it's not clear from your email, I'm assuming you've also already done

    ceph osd require-osd-release octopus

and fully enabled msgr2 ?

Also, did the new octopus omap conversion complete already? There were
threads earlier that it was using loads of memory.
(see ceph config set osd bluestore_fsck_quick_fix_on_mount false in
the release notes).

-- Dan



On Sun, Jul 5, 2020 at 2:43 PM Wout van Heeswijk <wout@xxxxxxxx> wrote:
Hi All,

A customer of ours has upgraded the cluster from nautilus to octopus
after experiencing issues with osds not being able to connect to each
other, clients/mons/mgrs. The connectivity issues was related to the
msgrV2 and require_osd_release setting not being set to nautilus. After
fixing this the OSDs were restarted and all placement groups became
active again.

After unsetting the norecover and nobackfill flag some OSDs started
crashing every few minutes. The OSD log, even with high debug settings,
don't seem to reveal anything, it just stops logging mid log line.

I've created a bug report: https://tracker.ceph.com/issues/46366

Has anyone experienced something similar?

--
kind regards,

Wout
42on
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux