Hi Marc, we just completed a third upgrade test. There are 2 ways to convert the OSDs: A) convert along with the upgrade (quick-fix-on-start=true) B) convert after setting require-osd-release=octopus (quick-fix-on-start=false until require-osd-release set to octopus, then restart to initiate conversion) There is a variation A' of A: follow A, then initiate manual compaction and restart all OSDs. Our experiments show that paths A and B do *not* yield the same result. Following path A leads to a severely performance degraded cluster. As of now, we cannot confirm that A' fixes this. It seems that the only way out is to zap and re-deploy all OSDs, basically what Boris is doing right now. We extended now our procedure to adding bluestore_fsck_quick_fix_on_mount = false to every ceph.conf file and executing ceph config set osd bluestore_fsck_quick_fix_on_mount false to catch any accidents. After daemon upgrade, we set bluestore_fsck_quick_fix_on_mount = true host by host in the ceph.conf and restart OSDs. This procedure works like a charm. I don't know what the difference between A and B is. It is possible that B executes an extra step that is missing in A. The performance degradation only shows up when snaptrim is active, but then it is very severe. I suspect that many users who complained about snaptrim in the past have at least 1 A-converted OSD in their cluster. If you have a cluster upgraded with B-converted OSDs, it works like a native octopus cluster. There is very little performance reduction compared with mimic. In exchange, I have the impression that it operates more stable. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Marc <Marc@xxxxxxxxxxxxxxxxx> Sent: 13 September 2022 16:28:47 To: Frank Schilder Cc: ceph-users@xxxxxxx Subject: RE: Re: laggy OSDs and staling krbd IO after upgrade from nautilus to octopus > > It might be possible that converting OSDs before setting require-osd- > release=octopus leads to a broken state of the converted OSDs. I could > not yet find a way out of this situation. We will soon perform a third > upgrade test to test this hypothesis. > So with upgrading one should put this line in ceph.conf, before restarting the osd daemons? require-osd-release=octopus (I still need to upgrade from Nautilus) _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx