Re: RBD I/O errors with QEMU [luminous upgrade/osd change]

Nico Schottelius <nico.schottelius@xxxxxxxxxxx> · Sun, 10 Sep 2017 15:22:58 +0200

Hello Jason,

I think there is a slight misunderstanding:
There is only one *VM*, not one OSD left that we did not start.

Or does librbd also read ceph.conf and will that cause qemu to output
debug messages?

Best,

Nico

Jason Dillaman <jdillama@xxxxxxxxxx> writes:

> I presume QEMU is using librbd instead of a mapped krbd block device,
> correct? If that is the case, can you add "debug-rbd=20" and "debug
> objecter=20" to your ceph.conf and boot up your last remaining broken
> OSD?
>
> On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius
> <nico.schottelius@xxxxxxxxxxx> wrote:
>>
>> Good morning,
>>
>> yesterday we had an unpleasant surprise that I would like to discuss:
>>
>> Many (not all!) of our VMs were suddenly
>> dying (qemu process exiting) and when trying to restart them, inside the
>> qemu process we saw i/o errors on the disks and the OS was not able to
>> start (i.e. stopped in initramfs).
>>
>> When we exported the image from rbd and loop mounted it, there were
>> however no I/O errors and the filesystem could be cleanly mounted [-1].
>>
>> We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are
>> some problems reported with kernels < 3.16.39 and thus we upgraded one
>> host that serves as VM host + runs ceph osds to Devuan ascii using
>> 4.9.0-3-amd64.
>>
>> Trying to start the VM again on this host however resulted in the same
>> I/O problem.
>>
>> We then did the "stupid" approach of exporting an image and importing it
>> again as the same name [0]. Surprisingly, this solved our problem
>> reproducible for all affected VMs and allowed us to go back online.
>>
>> We intentionally left one broken VM in our system (a test VM) so that we
>> have the chance of debugging further what happened and how we can
>> prevent it from happening again.
>>
>> As you might have guessed, there have been some event prior this:
>>
>> - Some weeks before we upgraded our cluster from kraken to luminous (in
>>   the right order of mon's first, adding mgrs)
>>
>> - About a week ago we added the first hdd to our cluster and modified the
>>   crushmap so that it the "one" pool (from opennebula) still selects
>>   only ssds
>>
>> - Some hours before we took out one of the 5 hosts of the ceph cluster,
>>   as we intended to replace the filesystem based OSDs with bluestore
>>   (roughly 3 hours prior to the event)
>>
>> - Short time before the event we readded an osd, but did not "up" it
>>
>> To our understanding, none of these actions should have triggered this
>> behaviour, however we are aware that with the upgrade to luminous also
>> the client libraries were updated and not all qemu processes were
>> restarted. [1]
>>
>> After this long story, I was wondering about the following things:
>>
>> - Why did this happen at all?
>>   And what is different after we reimported the image?
>>   Can it be related to disconnected the image from the parent
>>   (i.e. opennebula creates clones prior to starting a VM)
>>
>> - We have one broken VM left - is there a way to get it back running
>>   without doing the export/import dance?
>>
>> - How / or is http://tracker.ceph.com/issues/18807 related to our issue?
>>   How is the kernel involved into running VMs that use librbd?
>>   rbd showmapped does not show any mapped VMs, as qemu connects directly
>>   to ceph.
>>
>>   We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64,
>>   but did not fix our problem.
>>
>> We would appreciate any pointer!
>>
>> Best,
>>
>> Nico
>>
>>
>> [-1]
>> losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img
>> mkdir /tmp/monitoring1-mnt
>> mount /dev/loop0p1 /tmp/monitoring1-mnt/
>>
>>
>> [0]
>>
>> rbd export one/$img /var/tmp/one-staging/$img
>> rbd rm one/$img
>> rbd import /var/tmp/one-staging/$img one/$img
>> rm /var/tmp/one-staging/$img
>>
>> [1]
>> [14:05:34] server5:~# ceph features
>> {
>>     "mon": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 3
>>         }
>>     },
>>     "osd": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 49
>>         }
>>     },
>>     "client": {
>>         "group": {
>>             "features": "0xffddff8ee84fffb",
>>             "release": "kraken",
>>             "num": 1
>>         },
>>         "group": {
>>             "features": "0xffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 4
>>         },
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 61
>>         }
>>     }
>> }
>>
>>
>> --
>> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com