.... also, I should point out that if you've already upgraded to Luminous, you can just use the new RBD caps profiles (a la mon 'profile rbd' osd 'profile rbd') [1]. The explicit blacklist caps mentioned in the upgrade guide are only required since pre-Luminous clusters didn't support the RBD caps profiles. [1] http://docs.ceph.com/docs/master/rbd/rbd-openstack/#setup-ceph-client-authentication On Thu, May 10, 2018 at 10:11 AM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote: > It only bites you if you have a hard failure of a VM (i.e. the RBD > image wasn't cleanly closed and the lock wasn't cleanly released). In > that case, the next librbd client to attempt to acquire the lock will > notice the dead lock owner and will attempt to blacklist it from the > cluster to ensure it cannot write to the image. > > On Thu, May 10, 2018 at 10:08 AM, Jonathan Proulx <jon@xxxxxxxxxxxxx> wrote: >> On Thu, May 10, 2018 at 09:55:15AM -0700, Jason Dillaman wrote: >> :My immediate guess is that your caps are incorrect for your OpenStack >> :Ceph user. Please refer to step 6 from the Luminous upgrade guide to >> :ensure your RBD users have permission to blacklist dead peers [1] >> : >> :[1] http://docs.ceph.com/docs/master/releases/luminous/#upgrade-from-jewel-or-kraken >> >> Good spotting! Thanks for fastreply. Next question is why did this >> take so long to bite me we've been on luminous for 6 months, not going >> to worry too myc about that last quetion though. >> >> Hoepfully that was the problem (it definitely was a problem). >> >> Thanks, >> -Jon >> >> :On Thu, May 10, 2018 at 9:49 AM, Jonathan Proulx <jon@xxxxxxxxxxxxx> wrote: >> :> Hi All, >> :> >> :> recently I saw a number of rbd backed VMs in my openstack cloud fail >> :> to reboot after a hypervisor crash with errors simialr to: >> :> >> :> [ 5.279393] blk_update_request: I/O error, dev vda, sector 2048 >> :> [ 5.281427] Buffer I/O error on dev vda1, logical block 0, lost async page write >> :> [ 5.284114] Buffer I/O error on dev vda1, logical block 1, lost async page write >> :> [ 5.286600] Buffer I/O error on dev vda1, logical block 2, lost async page write >> :> [ 5.289022] Buffer I/O error on dev vda1, logical block 3, lost async page write >> :> [ 5.291515] Buffer I/O error on dev vda1, logical block 4, lost async page write >> :> [ 5.338981] blk_update_request: I/O error, dev vda, sector 3088 >> :> >> :> for many blocks and sectors. I was able to export the rbd images and >> :> they seemed fine, also 'rbd flatten' made them boot again with no >> :> errors. >> :> >> :> I found this puzzling and concerning but given the crash and limited >> :> time didn't really follow up. >> :> >> :> Today I intetionally rebooted a VM on a health hypervisor and had it >> :> land in the same condition, now I'm really worried. >> :> >> :> running: >> :> Ubuntu16.04 >> :> ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable) (on hypervisor) >> :> { >> :> "mon": { >> :> "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)": 3 >> :> }, >> :> "mgr": { >> :> "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)": 3 >> :> }, >> :> "osd": { >> :> "ceph version 12.2.2 (cf0baeeeeba3b47f9427c6c97e2144b094b7e5ba) luminous (stable)": 102, >> :> "ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949) luminous (stable)": 10, >> :> "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)": 62 >> :> } >> :> } >> :> libvirt-bin 1.3.1-1ubuntu10.21 >> :> qemu-system 1:2.5+dfsg-5ubuntu10.24 >> :> OpenStack Mitaka >> :> >> :> Any one seen anything like this or have suggestions where to look for more details? >> :> >> :> -Jon >> :> -- >> :> _______________________________________________ >> :> ceph-users mailing list >> :> ceph-users@xxxxxxxxxxxxxx >> :> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> : >> : >> : >> :-- >> :Jason >> >> -- > > > > -- > Jason -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com