Good morning everyone, recently we read a lot of questions and updates around cephadm. Some of you might remember that we went down the road of rook-ceph instead of cephadm and I wanted to give a short overview of how the update from 16.2.6 to 16.2.7 is performed on rook-ceph with the intention of spreading information about how containers/rook are working. Long story short, the full work required for upgrading a whole cluster was for this upgrade: Change the line image: quay.io/ceph/ceph:v16.2.6 to image: quay.io/ceph/ceph:v16.2.7 in the CephCluster definition (appended below for reference [0]), git commit & git push it. And from there on, it's only waiting. We are utilising argocd [1], which picks up the cluster state from git and then updates the kubernetes custom resource "CephCluster" using our git commit. >From there on, the rook-ceph-operator, basically a process running in kubernetes, detects that an upgrade is requested and check the status of the monitors, upgrades one after another, then continues with the mgr and finally upgrades the OSDs (i.e. change the image to the new version). The interesting bit from our side: the behaviour is pretty much "standard" in terms of how we upgrade our native ceph clusters, just fully automated and observable: Using kubernetes log functionality [2], we watched the operator progress and take actions (waiting for monitors to join the quorom, etc.), depending on the cluster state. This comes with the typical two sides of the same coin: the whole upgrade is fully automated and thus if everything works fine, well, the practical required working time is in the pure minutes, not hours for upgrading dozens or hundreds of osds. However, if things go wrong, you'll need to work against automation (i.e. stopping the operator, deploying things manually, etc.). For us it is very interesting to see the differences between Devuan/Home made/Ceph ("we know/do everything") orchestration to Alpine/Kubernetes/Rook/Ceph ("the operator does everything"). Best regards, Nico p.s.: The process for updating rook itself is pretty similar, just doing a git commit, however it comes without restarting the mons/mgr/osds. -------------------------------------------------------------------------------- [0] apiVersion: ceph.rook.io/v1 kind: CephCluster metadata: name: rook-ceph namespace: rook-ceph spec: cephVersion: image: quay.io/ceph/ceph:v16.2.7 dataDirHostPath: /var/lib/rook mon: count: 5 allowMultiplePerNode: false storage: useAllNodes: true useAllDevices: true onlyApplyOSDPlacement: false mgr: count: 1 modules: - name: pg_autoscaler enabled: true network: ipFamily: "IPv6" dualStack: false crashCollector: disable: false # Uncomment daysToRetain to prune ceph crash entries older than the # specified number of days. daysToRetain: 30 -------------------------------------------------------------------------------- [1] https://argo-cd.readthedocs.io/en/stable/ [2] kubectl -n rook-ceph logs -f rook-ceph-operator-85f45d468f-lhwmm -- Sustainable and modern Infrastructures by ungleich.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx