I presume QEMU is using librbd instead of a mapped krbd block device, correct? If that is the case, can you add "debug-rbd=20" and "debug objecter=20" to your ceph.conf and boot up your last remaining broken OSD? On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius <nico.schottelius@xxxxxxxxxxx> wrote: > > Good morning, > > yesterday we had an unpleasant surprise that I would like to discuss: > > Many (not all!) of our VMs were suddenly > dying (qemu process exiting) and when trying to restart them, inside the > qemu process we saw i/o errors on the disks and the OS was not able to > start (i.e. stopped in initramfs). > > When we exported the image from rbd and loop mounted it, there were > however no I/O errors and the filesystem could be cleanly mounted [-1]. > > We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are > some problems reported with kernels < 3.16.39 and thus we upgraded one > host that serves as VM host + runs ceph osds to Devuan ascii using > 4.9.0-3-amd64. > > Trying to start the VM again on this host however resulted in the same > I/O problem. > > We then did the "stupid" approach of exporting an image and importing it > again as the same name [0]. Surprisingly, this solved our problem > reproducible for all affected VMs and allowed us to go back online. > > We intentionally left one broken VM in our system (a test VM) so that we > have the chance of debugging further what happened and how we can > prevent it from happening again. > > As you might have guessed, there have been some event prior this: > > - Some weeks before we upgraded our cluster from kraken to luminous (in > the right order of mon's first, adding mgrs) > > - About a week ago we added the first hdd to our cluster and modified the > crushmap so that it the "one" pool (from opennebula) still selects > only ssds > > - Some hours before we took out one of the 5 hosts of the ceph cluster, > as we intended to replace the filesystem based OSDs with bluestore > (roughly 3 hours prior to the event) > > - Short time before the event we readded an osd, but did not "up" it > > To our understanding, none of these actions should have triggered this > behaviour, however we are aware that with the upgrade to luminous also > the client libraries were updated and not all qemu processes were > restarted. [1] > > After this long story, I was wondering about the following things: > > - Why did this happen at all? > And what is different after we reimported the image? > Can it be related to disconnected the image from the parent > (i.e. opennebula creates clones prior to starting a VM) > > - We have one broken VM left - is there a way to get it back running > without doing the export/import dance? > > - How / or is http://tracker.ceph.com/issues/18807 related to our issue? > How is the kernel involved into running VMs that use librbd? > rbd showmapped does not show any mapped VMs, as qemu connects directly > to ceph. > > We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64, > but did not fix our problem. > > We would appreciate any pointer! > > Best, > > Nico > > > [-1] > losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img > mkdir /tmp/monitoring1-mnt > mount /dev/loop0p1 /tmp/monitoring1-mnt/ > > > [0] > > rbd export one/$img /var/tmp/one-staging/$img > rbd rm one/$img > rbd import /var/tmp/one-staging/$img one/$img > rm /var/tmp/one-staging/$img > > [1] > [14:05:34] server5:~# ceph features > { > "mon": { > "group": { > "features": "0x1ffddff8eea4fffb", > "release": "luminous", > "num": 3 > } > }, > "osd": { > "group": { > "features": "0x1ffddff8eea4fffb", > "release": "luminous", > "num": 49 > } > }, > "client": { > "group": { > "features": "0xffddff8ee84fffb", > "release": "kraken", > "num": 1 > }, > "group": { > "features": "0xffddff8eea4fffb", > "release": "luminous", > "num": 4 > }, > "group": { > "features": "0x1ffddff8eea4fffb", > "release": "luminous", > "num": 61 > } > } > } > > > -- > Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Jason _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com