Successful Upgrade from 14.2.18 to 15.2.16

Stefan Kooman <stefan@xxxxxx> · Mon, 11 Apr 2022 10:55:41 +0200

Hi All,

Last week we succesfully upgraded our 14.2.18 cluster to 15.2.16. 
According to "ceph crash ls" we did not have a single crash while 
running Nautilus \o/. Unlike releases before Nautilus we occasionally 
had issues with MDS (hitting bugs) but since Nautilus this is no longer 
the case. And hopefully it stays like this. So kudos to all Ceph devs 
and contributors!

One thing that took *way* longer than expected was the bluestore fsck. 
We did a "offline" and a "online" approach on one host. Both took the 
same amount of time (unlike previous releases, where online fsck would 
take way longer) ... about *9 hours* on NVMe disks (Samsung PM-983, 
SAMSUNG MZQLB3T8HALS-00007).
Note that we have a relatively big CephFS workload (with lots of small 
files and deep directory hierarchies), so your mileage may vary. Also 
note that "online" does not mean that our OSDs are UP ... they are not. 
They only "boot" when this process has finished. So the 
"bluestore_fsck_quick_fix_on_mount" parameter is misleading here.
We decided to not proceed with bluestore fsck and first upgrade all 
storage nodes. We are now planning on redeploying the remaining OSDs and 
use "pgremapper" to drain hosts to new storage servers one by one: less 
risk (no degraded data for a prolonged period of time) and potentially 
even quicker.

FYI,

Gr. Stefan

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx