Dear all,
I want to share some experience of upgrading my experimental 1-host
Ceph cluster from v13.2.0 to v13.2.1.
First, I fetched new packages and installed them using 'apt
dist-upgrade', which went smooth as usual.
Then I noticed from 'lsof', that Ceph daemons were not restarted after
the upgrade ('ceph osd versions' still showed 13.2.0).
Using instructions on Luminous->Mimic upgrade, I decided to restart
ceph-{mon,mgr,osd}.targets.
And surely, on restarting ceph-osd.target, iSCSI sessions had been
broken on tcmu-runner side ('Timing out cmd', 'Handler connection
lost'), and Windows (2008 R2) clients lost their iSCSI devices.
But that was only a beginning of surprises that followed.
Looking into Windows Disk Management, I noticed that iSCSI disks were
re-detected with size about 0.12 Gb larger, i.e. 2794.52 GB instead of
2794.40 GB, and of course the system lost their GPT labels from its
sight. I quickly checked 'rbd info' on Ceph side and did not notice any
increase in RBD images. They were still exactly 715398 4MB-objects as I
intended initially.
Restarting iSCSI initiator service on Windows did not help. Restarting
the whole Windows did not help. Restarting tcmu-runner on Ceph side did
not help. What resolved the problem, to my great surprise, was
_removing/re-adding MPIO feature and re-adding iSCSI multipath support_.
After that, Windows detected iSCSI disks with proper size again, and
restored visibility of GPT partitions, dynamic disk metadata and all the
rest.
Ok, I avoided data loss at this time, but I have some remaining questions :
1. Can Ceph minor version upgrades be made less disruptive and
traumatic? Like, some king of staged/rolling OSD daemons restart within
single upgraded host, without losing librbd sessions ?
2. Is Windows (2008 R2) MPIO support really that screwed & crippled ?
Were there any improvements in Win2012/2016 ? I have physical servers
with Windows 2008 R2, and I would like to mirror their volumes to Ceph
iSCSI targets, then convert them into QEMU/KVM virtual machines where
the same data will be accessed with librbd. During my initial
experiments, I found that reinstalling MPIO & re-enabling iSCSI
multipath would fix most problems in Windows iSCSI access, but I would
like to have a faster way of resetting iSCSI+MPIO state when something
is going wrong on Windows side like in my case.
3. Anybody has an idea of where these 0.12 GB (probably 120 or 128 MB)
were taken from ?
Thank you in advance for your responses.
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com