Re: RBD I/O errors with QEMU [luminous upgrade/osd change]

Jason Dillaman <jdillama@xxxxxxxxxx> · Sun, 10 Sep 2017 08:49:10 -0400

I presume QEMU is using librbd instead of a mapped krbd block device,
correct? If that is the case, can you add "debug-rbd=20" and "debug
objecter=20" to your ceph.conf and boot up your last remaining broken
OSD?

On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius
<nico.schottelius@xxxxxxxxxxx> wrote:
>
> Good morning,
>
> yesterday we had an unpleasant surprise that I would like to discuss:
>
> Many (not all!) of our VMs were suddenly
> dying (qemu process exiting) and when trying to restart them, inside the
> qemu process we saw i/o errors on the disks and the OS was not able to
> start (i.e. stopped in initramfs).
>
> When we exported the image from rbd and loop mounted it, there were
> however no I/O errors and the filesystem could be cleanly mounted [-1].
>
> We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are
> some problems reported with kernels < 3.16.39 and thus we upgraded one
> host that serves as VM host + runs ceph osds to Devuan ascii using
> 4.9.0-3-amd64.
>
> Trying to start the VM again on this host however resulted in the same
> I/O problem.
>
> We then did the "stupid" approach of exporting an image and importing it
> again as the same name [0]. Surprisingly, this solved our problem
> reproducible for all affected VMs and allowed us to go back online.
>
> We intentionally left one broken VM in our system (a test VM) so that we
> have the chance of debugging further what happened and how we can
> prevent it from happening again.
>
> As you might have guessed, there have been some event prior this:
>
> - Some weeks before we upgraded our cluster from kraken to luminous (in
>   the right order of mon's first, adding mgrs)
>
> - About a week ago we added the first hdd to our cluster and modified the
>   crushmap so that it the "one" pool (from opennebula) still selects
>   only ssds
>
> - Some hours before we took out one of the 5 hosts of the ceph cluster,
>   as we intended to replace the filesystem based OSDs with bluestore
>   (roughly 3 hours prior to the event)
>
> - Short time before the event we readded an osd, but did not "up" it
>
> To our understanding, none of these actions should have triggered this
> behaviour, however we are aware that with the upgrade to luminous also
> the client libraries were updated and not all qemu processes were
> restarted. [1]
>
> After this long story, I was wondering about the following things:
>
> - Why did this happen at all?
>   And what is different after we reimported the image?
>   Can it be related to disconnected the image from the parent
>   (i.e. opennebula creates clones prior to starting a VM)
>
> - We have one broken VM left - is there a way to get it back running
>   without doing the export/import dance?
>
> - How / or is http://tracker.ceph.com/issues/18807 related to our issue?
>   How is the kernel involved into running VMs that use librbd?
>   rbd showmapped does not show any mapped VMs, as qemu connects directly
>   to ceph.
>
>   We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64,
>   but did not fix our problem.
>
> We would appreciate any pointer!
>
> Best,
>
> Nico
>
>
> [-1]
> losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img
> mkdir /tmp/monitoring1-mnt
> mount /dev/loop0p1 /tmp/monitoring1-mnt/
>
>
> [0]
>
> rbd export one/$img /var/tmp/one-staging/$img
> rbd rm one/$img
> rbd import /var/tmp/one-staging/$img one/$img
> rm /var/tmp/one-staging/$img
>
> [1]
> [14:05:34] server5:~# ceph features
> {
>     "mon": {
>         "group": {
>             "features": "0x1ffddff8eea4fffb",
>             "release": "luminous",
>             "num": 3
>         }
>     },
>     "osd": {
>         "group": {
>             "features": "0x1ffddff8eea4fffb",
>             "release": "luminous",
>             "num": 49
>         }
>     },
>     "client": {
>         "group": {
>             "features": "0xffddff8ee84fffb",
>             "release": "kraken",
>             "num": 1
>         },
>         "group": {
>             "features": "0xffddff8eea4fffb",
>             "release": "luminous",
>             "num": 4
>         },
>         "group": {
>             "features": "0x1ffddff8eea4fffb",
>             "release": "luminous",
>             "num": 61
>         }
>     }
> }
>
>
> --
> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com