+Paul On Mon, Jun 25, 2018 at 5:14 AM, Christian Zunker <christian.zunker@codecentric.cloud> wrote: > Hi Jason, > > your guesses were correct. Thank you for your support. > > Just in case, someone else stumbles upon this thread, some more links: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-September/020722.html > http://docs.ceph.com/docs/luminous/rados/operations/user-management/#authorization-capabilities > http://docs.ceph.com/docs/luminous/rbd/rbd-openstack/#setup-ceph-client-authentication > https://github.com/ceph/ceph/pull/15991 > > Jason Dillaman <jdillama@xxxxxxxxxx> schrieb am Fr., 22. Juni 2018 um 22:58 > Uhr: >> >> It sounds like your OpenStack users do not have the correct caps to >> blacklist dead clients. See step 6 in the upgrade section of Luminous’ >> release notes or (preferably) use the new “profile rbd”-style caps if you >> don’t use older clients. >> >> The reason why repairing the object map seemed to fix everything was >> because I suspect you performed the op using the admin user, which had the >> caps necessary to blacklist the dead clients and clean up the dirty >> exclusive lock on the image. >> >> On Fri, Jun 22, 2018 at 4:47 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote: >>> >>> On Fri, Jun 22, 2018 at 2:26 AM Christian Zunker >>> <christian.zunker@codecentric.cloud> wrote: >>>> >>>> Hi List, >>>> >>>> we are running a ceph cluster (12.2.5) as backend to our OpenStack >>>> cloud. >>>> >>>> Yesterday our datacenter had a power outage. As this wouldn't be enough, >>>> we also had a separated ceph cluster because of networking problems. >>>> >>>> First of all thanks a lot to the ceph developers. After the network was >>>> back to normal, ceph recovered itself. You saved us from a lot of downtime, >>>> lack of sleep and insanity. >>>> >>>> Now to our problem/question: >>>> After ceph recovered, we tried to bring up our VMs. They have cinder >>>> volumes saved in ceph. All VMs didn't start because of I/O problems during >>>> start: >>>> [ 4.393246] JBD2: recovery failed >>>> [ 4.395949] EXT4-fs (vda1): error loading journal >>>> [ 4.400811] VFS: Dirty inode writeback failed for block device vda1 >>>> (err=-5). >>>> mount: mounting /dev/vda1 on /root failed: Input/output error >>>> done. >>>> Begin: Running /scripts/local-bottom ... done. >>>> Begin: Running /scripts/init-bottom ... mount: mounting /dev on >>>> /root/dev failed: No such file or directory >>>> >>>> We tried to recover the disk with different methods, but all failed >>>> because of different reasons. What helped us at the end was a rebuild on the >>>> object map of each image: >>>> rbd object-map rebuild volumes/<uuid> >>>> >>>> From what we understood, object-map is a feature for ceph internal >>>> speedup. How can this lead to I/O errors in our VMs? >>>> Is this the expected way for a recovery? >>>> Did we miss something? >>>> Is there any documentation describing what leads to invalid object-maps >>>> and how to recover? (We did not find a doc on that topic...) >>> >>> >>> An object map definitely shouldn't lead to IO errors in your VMs; in fact >>> I thought it auto-repaired itself if necessary. Maybe the RBD guys can chime >>> in here about probable causes of trouble. >>> >>> My *guess* is that perhaps your VMs or QEMU were configured to ignore >>> barriers or some similar thing, so that when the power failed a write was >>> "lost" as it got written to a new RBD object but not committed into the >>> object map, but the FS or database journal recorded it as complete. I can't >>> be sure about that though. >>> -Greg >>> >>>> >>>> >>>> >>>> regards >>>> Christian >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@xxxxxxxxxxxxxx >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> -- >> Jason > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com