Big problems encoutered during upgrade from hammer 0.94.5 to jewel 10.2.3

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



After a test on a non production environment, we decided to upgrade our running cluster to jewel 10.2.3. Our cluster has 3 monitors and 8 nodes of 20 disks. The cluster is in hammer 0.94.5 with tunables set to "bobtail".
As the cluster is in production and it wasn't possible to upgrade ceph client at the same time, so we decided to keep the tunables in bobtail
.
First step was to upgrade the three monitors : no problem
Second step : put the cluster in noout and then upgrade the first node : as soon as we stopped the OSDs on the first node, the cluster went in error with a lot of PG peering. We lost a lot of disks on the VMs hosted by ceph clients.A lot of OSD went flapping (down then up) for hours.
So we decide to stop all the VMs and so the IOs on the cluster for it to stabilize and it took about 3 hours. With no IO on the cluster, we arrived to upgrade 4 nodes (on 8).

At this time, we have pools which are spread only on these 4 nodes now on jewel. But still now, if we stop an OSD on one of this 4 nodes, PGs are still peering and the cluster is in error status and then not good to serve the production needs.
Is this behaviour because of a mix of node in jewel and in hammer ?

We will upgrade the last 4 nodes next week-end so all the OSD nodes will be in jewel. Do we have to wait for the ceph clients upgrade in jewel to recover a stable cluster ? Do we have to wait till the tunables are set to optimal ?

I saw in the release notes that an upgrade from hammer to jewel could be done without downtime ...I known that there is no garanty but for now, we still have an unstable cluster and pray for not loosing an OSD before the last operation of upgrade

If you have somme advices, i'll take tem :)

Vincent

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux