RBD I/O errors with QEMU [luminous upgrade/osd change]

Nico Schottelius <nico.schottelius@xxxxxxxxxxx> · Sun, 10 Sep 2017 14:23:02 +0200

Good morning,

yesterday we had an unpleasant surprise that I would like to discuss:

Many (not all!) of our VMs were suddenly
dying (qemu process exiting) and when trying to restart them, inside the
qemu process we saw i/o errors on the disks and the OS was not able to
start (i.e. stopped in initramfs).

When we exported the image from rbd and loop mounted it, there were
however no I/O errors and the filesystem could be cleanly mounted [-1].

We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are
some problems reported with kernels < 3.16.39 and thus we upgraded one
host that serves as VM host + runs ceph osds to Devuan ascii using
4.9.0-3-amd64.

Trying to start the VM again on this host however resulted in the same
I/O problem.

We then did the "stupid" approach of exporting an image and importing it
again as the same name [0]. Surprisingly, this solved our problem
reproducible for all affected VMs and allowed us to go back online.

We intentionally left one broken VM in our system (a test VM) so that we
have the chance of debugging further what happened and how we can
prevent it from happening again.

As you might have guessed, there have been some event prior this:

- Some weeks before we upgraded our cluster from kraken to luminous (in
  the right order of mon's first, adding mgrs)

- About a week ago we added the first hdd to our cluster and modified the
  crushmap so that it the "one" pool (from opennebula) still selects
  only ssds

- Some hours before we took out one of the 5 hosts of the ceph cluster,
  as we intended to replace the filesystem based OSDs with bluestore
  (roughly 3 hours prior to the event)

- Short time before the event we readded an osd, but did not "up" it

To our understanding, none of these actions should have triggered this
behaviour, however we are aware that with the upgrade to luminous also
the client libraries were updated and not all qemu processes were
restarted. [1]

After this long story, I was wondering about the following things:

- Why did this happen at all?
  And what is different after we reimported the image?
  Can it be related to disconnected the image from the parent
  (i.e. opennebula creates clones prior to starting a VM)

- We have one broken VM left - is there a way to get it back running
  without doing the export/import dance?

- How / or is http://tracker.ceph.com/issues/18807 related to our issue?
  How is the kernel involved into running VMs that use librbd?
  rbd showmapped does not show any mapped VMs, as qemu connects directly
  to ceph.

  We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64,
  but did not fix our problem.

We would appreciate any pointer!

Best,

Nico

[-1]
losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img
mkdir /tmp/monitoring1-mnt
mount /dev/loop0p1 /tmp/monitoring1-mnt/

[0]

rbd export one/$img /var/tmp/one-staging/$img
rbd rm one/$img
rbd import /var/tmp/one-staging/$img one/$img
rm /var/tmp/one-staging/$img

[1]
[14:05:34] server5:~# ceph features
{
    "mon": {
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 3
        }
    },
    "osd": {
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 49
        }
    },
    "client": {
        "group": {
            "features": "0xffddff8ee84fffb",
            "release": "kraken",
            "num": 1
        },
        "group": {
            "features": "0xffddff8eea4fffb",
            "release": "luminous",
            "num": 4
        },
        "group": {
            "features": "0x1ffddff8eea4fffb",
            "release": "luminous",
            "num": 61
        }
    }
}

--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com