Hi All,
Last week we succesfully upgraded our 14.2.18 cluster to 15.2.16.
According to "ceph crash ls" we did not have a single crash while
running Nautilus \o/. Unlike releases before Nautilus we occasionally
had issues with MDS (hitting bugs) but since Nautilus this is no longer
the case. And hopefully it stays like this. So kudos to all Ceph devs
and contributors!
One thing that took *way* longer than expected was the bluestore fsck.
We did a "offline" and a "online" approach on one host. Both took the
same amount of time (unlike previous releases, where online fsck would
take way longer) ... about *9 hours* on NVMe disks (Samsung PM-983,
SAMSUNG MZQLB3T8HALS-00007).
Note that we have a relatively big CephFS workload (with lots of small
files and deep directory hierarchies), so your mileage may vary. Also
note that "online" does not mean that our OSDs are UP ... they are not.
They only "boot" when this process has finished. So the
"bluestore_fsck_quick_fix_on_mount" parameter is misleading here.
We decided to not proceed with bluestore fsck and first upgrade all
storage nodes. We are now planning on redeploying the remaining OSDs and
use "pgremapper" to drain hosts to new storage servers one by one: less
risk (no degraded data for a prolonged period of time) and potentially
even quicker.
FYI,
Gr. Stefan
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx