On 06/12/17 09:17, Caspar Smit wrote: > > 2017-12-05 18:39 GMT+01:00 Richard Hesketh <richard.hesketh@xxxxxxxxxxxx <mailto:richard.hesketh@xxxxxxxxxxxx>>: > > On 05/12/17 17:10, Graham Allan wrote: > > On 12/05/2017 07:20 AM, Wido den Hollander wrote: > >> Hi, > >> > >> I haven't tried this before but I expect it to work, but I wanted to > >> check before proceeding. > >> > >> I have a Ceph cluster which is running with manually formatted > >> FileStore XFS disks, Jewel, sysvinit and Ubuntu 14.04. > >> > >> I would like to upgrade this system to Luminous, but since I have to > >> re-install all servers and re-format all disks I'd like to move it to > >> BlueStore at the same time. > > > > You don't *have* to update the OS in order to update to Luminous, do you? Luminous is still supported on Ubuntu 14.04 AFAIK. > > > > Though obviously I understand your desire to upgrade; I only ask because I am in the same position (Ubuntu 14.04, xfs, sysvinit), though happily with a smaller cluster. Personally I was planning to upgrade ours entirely to Luminous while still on Ubuntu 14.04, before later going through the same process of decommissioning one machine at a time to reinstall with CentOS 7 and Bluestore. I too don't see any reason the mixed Jewel/Luminous cluster wouldn't work, but still felt less comfortable with extending the upgrade duration. > > > > Graham > > Yes, you can run luminous on Trusty; one of my clusters is currently Luminous/Bluestore/Trusty as I've not had time to sort out doing OS upgrades on it. I second the suggestion that it would be better to do the luminous upgrade first, retaining existing filestore OSDs, and then do the OS upgrade/OSD recreation on each node in sequence. I don't think there should realistically be any problems with running a mixed cluster for a while but doing the jewel->luminous upgrade on the existing installs first shouldn't be significant extra effort/time as you're already predicting at least two months to upgrade everything, and it does minimise the amount of change at any one time in case things do start going horribly wrong. > > Also, at 48 nodes, I would've thought you could get away with cycling more than one of them at once. Assuming they're homogenous taking out even 4 at a time should only raise utilisation on the rest of the cluster to a little over 65%, which still seems safe to me, and you'd waste way less time waiting for recovery. (I recognise that depending on the nature of your employment situation this may not actually be desirable...) > > > Assuming size=3 and min_size=2 and failure-domain=host: > > I always thought that bringing down more then 1 host cause data inaccessebility right away because the chance that a pg will have osd's in these 2 hosts is there. Only if the failure-domain is higher then host (rack or something) you can safely bring more then 1 host down (in the same failure domain offcourse). > > Am i right? > > Kind regards, > Caspar Oh, yeah, if you just bring them down immediately without rebalancing first, you'll have problems. But the intention is that rather than just killing the nodes, you first weight them to 0 and then wait for the cluster to rebalance the data off them so they are empty and harmless when you do shut them down. You minimise time spent waiting and overall data movement if you do this sort of replacement in larger batches. Others have correctly pointed out though that the larger the change you make at any one time, the more likely something might go wrong overall... I suspect a good rule of thumb is that you should try to add/replace/remove nodes/OSDs in batches of as many you can get away with at once without stretching outside the failure domain. Rich
Attachment:
signature.asc
Description: OpenPGP digital signature
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com