Hi Hervé!
Thanks for the detailed summary, much appreciated!
Best,
MJ
On 09/21/2018 09:03 AM, Hervé Ballans wrote:
Hi MJ (and all),
So we upgraded our Proxmox/Ceph cluster, and if we have to summarize the
operation in a few words : overall, everything went well :)
The most critical operation of all is the 'osd crush tunables optimal',
I talk about it in more detail after...
The Proxmox documentation is really well written and accurate and,
normally, following the documentation step by step is almost sufficient !
* first step : upgrade Ceph Jewel to Luminous :
https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous
(Note here : OSDs remain in FileStore backend, no BlueStore migration)
* second step : upgrade Proxmox version 4 to 5 :
https://pve.proxmox.com/wiki/Upgrade_from_4.x_to_5.0
Just some numbers, observations and tips (based on our feedback, I'm not
an expert !) :
* Before migration, make sure you are in the lastest version of Proxmox
4 (4.4-24) and Ceph Jewel (10.2.11)
* We don't use the pve repository for ceph packages but the official one
(download.ceph.com). Thus, during the upgrade of Promox PVE, we don't
replace ceph.com repository with promox.com Ceph repository...
* When you upgrade Ceph to Luminous (without tunables optimal), there is
no impact on Proxmox 4. VMs are still running normally.
The side effect (non blocking for the functionning of VMs) is located in
the GUI, on the Ceph menu : it can't report the status of the ceph
cluster as it has a JSON formatting error (indeed the output of the
command 'ceph -s' is completely different, really more readable on Luminous)
* It misses a little step in section 8 "Create Manager instances" of the
upgrade ceph documentation. As the Ceph manager daemon is new since
Luminous, the package doesn't exist on Jewel. So you have to install the
ceph-mgr package on each node first before doing 'pveceph createmgr'|||
|
* The 'osd crush tunables optimal' operation is time consuming ! in our
case : 5 nodes (PE R730xd), 58 OSDs, replicated (3/2) rbd pool with 2048
pgs and 2 millions objects, 22 TB used. The tunables operation took a
little more than 24 hours !
* Really take the right time to make the 'tunables optimal' !
We encountered some pgs stuck and blocked requests during this
operation. In our case, the involved OSDs were those with a high numbers
of pgs (as they are high capacity disks).
The consequences can be critical since it can freeze some VMs (I guess
those that replicas are stored on the stuck pgs ?).
The stuck state were corrected by rebooting the involved OSDs.
If you can move the disks of your critical VMs on another storage, so
these VMs should not be impacted by the recovery (we moved some disks on
another Ceph cluster and keep the conf in the Proxmox cluster being
updated and there was no impact)
Otherwise :
- verify that all your VMs are recently backuped on an external storage
(in case of Disaster recovery Plan !)
- if you can, stop all your non-critical VMs (in order to limit client
io operations)
- if any, wait for the end of current backups then disable datacenter
backup (in order to limit client io operations). !! do not forget to
re-enable it when all is over !!
- if any and if no longer needed, delete your snapshots, it removes many
useless objects !
- start the tunables operation outside of major activity periods (night,
week-end, ??) and take into account that it can be very slow...
There are probably some options to configure in ceph to avoid 'pgs
stuck' states, but on our side, as we previously moved our critical VM's
disks, we didn't care about that !
* Anyway, the upgrade step of Proxmox PVE is done easily and quickly
(just follow the documentation). Note that you can upgrade Proxmox PVE
before doing the 'tunables optimal' operation.
Hoping that you will find this information useful, good luck with your
very next migration !
Hervé
Le 13/09/2018 à 22:04, mj a écrit :
Hi Hervé,
No answer from me, but just to say that I have exactly the same
upgrade path ahead of me. :-)
Please report here any tips, trics, or things you encountered doing
the upgrades. It could potentially save us a lot of time. :-)
Thanks!
MJ
On 09/13/2018 05:23 PM, Hervé Ballans wrote:
Dear list,
I am currently in the process of upgrading Proxmox 4/Jewel to
Proxmox5/Luminous.
I also have a new node to add to my Proxmox cluster.
What I plan to do is the following (from
https://pve.proxmox.com/wiki/Ceph_Jewel_to_Luminous):
* upgrade Jewel to Luminous
* let the "ceph osd crush tunables optimal " command run
* upgrade my proxmox to v5
* add the new node (already up to date in v5)
* add the new OSDs
* let ceph rebalance the lot
A couple of questions I have :
* would it be a good idea to add the new node+OSDs and run the
"tunables optimal" command immediately after, which would maybe gain
a little time and avoid two successive pg rebalancing ?
* did I miss anything in this plan?
Regards,
Hervé
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com