Hi Jason,
your guesses were correct. Thank you for your support.
Just in case, someone else stumbles upon this thread, some more links:
Jason Dillaman <jdillama@xxxxxxxxxx> schrieb am Fr., 22. Juni 2018 um 22:58 Uhr:
It sounds like your OpenStack users do not have the correct caps to blacklist dead clients. See step 6 in the upgrade section of Luminous’ release notes or (preferably) use the new “profile rbd”-style caps if you don’t use older clients.The reason why repairing the object map seemed to fix everything was because I suspect you performed the op using the admin user, which had the caps necessary to blacklist the dead clients and clean up the dirty exclusive lock on the image.--On Fri, Jun 22, 2018 at 4:47 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:On Fri, Jun 22, 2018 at 2:26 AM Christian Zunker <christian.zunker@codecentric.cloud> wrote:Hi List,we are running a ceph cluster (12.2.5) as backend to our OpenStack cloud.Yesterday our datacenter had a power outage. As this wouldn't be enough, we also had a separated ceph cluster because of networking problems.First of all thanks a lot to the ceph developers. After the network was back to normal, ceph recovered itself. You saved us from a lot of downtime, lack of sleep and insanity.Now to our problem/question:After ceph recovered, we tried to bring up our VMs. They have cinder volumes saved in ceph. All VMs didn't start because of I/O problems during start:[ 4.393246] JBD2: recovery failed[ 4.395949] EXT4-fs (vda1): error loading journal[ 4.400811] VFS: Dirty inode writeback failed for block device vda1 (err=-5).mount: mounting /dev/vda1 on /root failed: Input/output errordone.Begin: Running /scripts/local-bottom ... done.Begin: Running /scripts/init-bottom ... mount: mounting /dev on /root/dev failed: No such file or directoryWe tried to recover the disk with different methods, but all failed because of different reasons. What helped us at the end was a rebuild on the object map of each image:rbd object-map rebuild volumes/<uuid>From what we understood, object-map is a feature for ceph internal speedup. How can this lead to I/O errors in our VMs?Is this the expected way for a recovery?
Did we miss something?Is there any documentation describing what leads to invalid object-maps and how to recover? (We did not find a doc on that topic...)An object map definitely shouldn't lead to IO errors in your VMs; in fact I thought it auto-repaired itself if necessary. Maybe the RBD guys can chime in here about probable causes of trouble.My *guess* is that perhaps your VMs or QEMU were configured to ignore barriers or some similar thing, so that when the power failed a write was "lost" as it got written to a new RBD object but not committed into the object map, but the FS or database journal recorded it as complete. I can't be sure about that though.-Greg_______________________________________________regardsChristian
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Jason
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com