Re: Recovery after datacenter outage

Christian Zunker <christian.zunker@codecentric.cloud> · Mon, 25 Jun 2018 11:14:14 +0200

Hi Jason,
your guesses were correct. Thank you for your support.

Just in case, someone else stumbles upon this thread, some more links:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020722.html
http://docs.ceph.com/docs/luminous/rados/operations/user-management/#authorization-capabilities
http://docs.ceph.com/docs/luminous/rbd/rbd-openstack/#setup-ceph-client-authentication
https://github.com/ceph/ceph/pull/15991

Jason Dillaman <jdillama@xxxxxxxxxx> schrieb am Fr., 22. Juni 2018 um 22:58 Uhr:
It sounds like your OpenStack users do not have the correct caps to blacklist dead clients. See step 6 in the upgrade section of Luminous’ release notes or (preferably) use the new “profile rbd”-style caps if you don’t use older clients.

The reason why repairing the object map seemed to fix everything was because I suspect you performed the op using the admin user, which had the caps necessary to blacklist the dead clients and clean up the dirty exclusive lock on the image. 

On Fri, Jun 22, 2018 at 4:47 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
On Fri, Jun 22, 2018 at 2:26 AM Christian Zunker <christian.zunker@codecentric.cloud> wrote:
Hi List,
we are running a ceph cluster (12.2.5) as backend to our OpenStack cloud.

Yesterday our datacenter had a power outage. As this wouldn't be enough, we also had a separated ceph cluster because of networking problems.

First of all thanks a lot to the ceph developers. After the network was back to normal, ceph recovered itself. You saved us from a lot of downtime, lack of sleep and insanity.

Now to our problem/question:
After ceph recovered, we tried to bring up our VMs. They have cinder volumes saved in ceph. All VMs didn't start because of I/O problems during start:
[    4.393246] JBD2: recovery failed
[    4.395949] EXT4-fs (vda1): error loading journal
[    4.400811] VFS: Dirty inode writeback failed for block device vda1 (err=-5).
mount: mounting /dev/vda1 on /root failed: Input/output error
done.
Begin: Running /scripts/local-bottom ... done.
Begin: Running /scripts/init-bottom ... mount: mounting /dev on /root/dev failed: No such file or directory

We tried to recover the disk with different methods, but all failed because of different reasons. What helped us at the end was a rebuild on the object map of each image:
rbd object-map rebuild volumes/<uuid>

From what we understood, object-map is a feature for ceph internal speedup. How can this lead to I/O errors in our VMs?
Is this the expected way for a recovery?
Did we miss something?
Is there any documentation describing what leads to invalid object-maps and how to recover? (We did not find a doc on that topic...)

An object map definitely shouldn't lead to IO errors in your VMs; in fact I thought it auto-repaired itself if necessary. Maybe the RBD guys can chime in here about probable causes of trouble.

My *guess* is that perhaps your VMs or QEMU were configured to ignore barriers or some similar thing, so that when the power failed a write was "lost" as it got written to a new RBD object but not committed into the object map, but the FS or database journal recorded it as complete. I can't be sure about that though.
-Greg

regards
Christian
_______________________________________________

ceph-users mailing list

ceph-users@xxxxxxxxxxxxxx

http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Jason

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com