Re: RBD I/O errors with QEMU [luminous upgrade/osd change]

Jason Dillaman <jdillama@xxxxxxxxxx> · Sun, 10 Sep 2017 09:37:55 -0400

Sorry -- meant VM. Yes, librbd uses ceph.conf for configuration settings.

On Sun, Sep 10, 2017 at 9:22 AM, Nico Schottelius
<nico.schottelius@xxxxxxxxxxx> wrote:
>
> Hello Jason,
>
> I think there is a slight misunderstanding:
> There is only one *VM*, not one OSD left that we did not start.
>
> Or does librbd also read ceph.conf and will that cause qemu to output
> debug messages?
>
> Best,
>
> Nico
>
> Jason Dillaman <jdillama@xxxxxxxxxx> writes:
>
>> I presume QEMU is using librbd instead of a mapped krbd block device,
>> correct? If that is the case, can you add "debug-rbd=20" and "debug
>> objecter=20" to your ceph.conf and boot up your last remaining broken
>> OSD?
>>
>> On Sun, Sep 10, 2017 at 8:23 AM, Nico Schottelius
>> <nico.schottelius@xxxxxxxxxxx> wrote:
>>>
>>> Good morning,
>>>
>>> yesterday we had an unpleasant surprise that I would like to discuss:
>>>
>>> Many (not all!) of our VMs were suddenly
>>> dying (qemu process exiting) and when trying to restart them, inside the
>>> qemu process we saw i/o errors on the disks and the OS was not able to
>>> start (i.e. stopped in initramfs).
>>>
>>> When we exported the image from rbd and loop mounted it, there were
>>> however no I/O errors and the filesystem could be cleanly mounted [-1].
>>>
>>> We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are
>>> some problems reported with kernels < 3.16.39 and thus we upgraded one
>>> host that serves as VM host + runs ceph osds to Devuan ascii using
>>> 4.9.0-3-amd64.
>>>
>>> Trying to start the VM again on this host however resulted in the same
>>> I/O problem.
>>>
>>> We then did the "stupid" approach of exporting an image and importing it
>>> again as the same name [0]. Surprisingly, this solved our problem
>>> reproducible for all affected VMs and allowed us to go back online.
>>>
>>> We intentionally left one broken VM in our system (a test VM) so that we
>>> have the chance of debugging further what happened and how we can
>>> prevent it from happening again.
>>>
>>> As you might have guessed, there have been some event prior this:
>>>
>>> - Some weeks before we upgraded our cluster from kraken to luminous (in
>>>   the right order of mon's first, adding mgrs)
>>>
>>> - About a week ago we added the first hdd to our cluster and modified the
>>>   crushmap so that it the "one" pool (from opennebula) still selects
>>>   only ssds
>>>
>>> - Some hours before we took out one of the 5 hosts of the ceph cluster,
>>>   as we intended to replace the filesystem based OSDs with bluestore
>>>   (roughly 3 hours prior to the event)
>>>
>>> - Short time before the event we readded an osd, but did not "up" it
>>>
>>> To our understanding, none of these actions should have triggered this
>>> behaviour, however we are aware that with the upgrade to luminous also
>>> the client libraries were updated and not all qemu processes were
>>> restarted. [1]
>>>
>>> After this long story, I was wondering about the following things:
>>>
>>> - Why did this happen at all?
>>>   And what is different after we reimported the image?
>>>   Can it be related to disconnected the image from the parent
>>>   (i.e. opennebula creates clones prior to starting a VM)
>>>
>>> - We have one broken VM left - is there a way to get it back running
>>>   without doing the export/import dance?
>>>
>>> - How / or is http://tracker.ceph.com/issues/18807 related to our issue?
>>>   How is the kernel involved into running VMs that use librbd?
>>>   rbd showmapped does not show any mapped VMs, as qemu connects directly
>>>   to ceph.
>>>
>>>   We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64,
>>>   but did not fix our problem.
>>>
>>> We would appreciate any pointer!
>>>
>>> Best,
>>>
>>> Nico
>>>
>>>
>>> [-1]
>>> losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img
>>> mkdir /tmp/monitoring1-mnt
>>> mount /dev/loop0p1 /tmp/monitoring1-mnt/
>>>
>>>
>>> [0]
>>>
>>> rbd export one/$img /var/tmp/one-staging/$img
>>> rbd rm one/$img
>>> rbd import /var/tmp/one-staging/$img one/$img
>>> rm /var/tmp/one-staging/$img
>>>
>>> [1]
>>> [14:05:34] server5:~# ceph features
>>> {
>>>     "mon": {
>>>         "group": {
>>>             "features": "0x1ffddff8eea4fffb",
>>>             "release": "luminous",
>>>             "num": 3
>>>         }
>>>     },
>>>     "osd": {
>>>         "group": {
>>>             "features": "0x1ffddff8eea4fffb",
>>>             "release": "luminous",
>>>             "num": 49
>>>         }
>>>     },
>>>     "client": {
>>>         "group": {
>>>             "features": "0xffddff8ee84fffb",
>>>             "release": "kraken",
>>>             "num": 1
>>>         },
>>>         "group": {
>>>             "features": "0xffddff8eea4fffb",
>>>             "release": "luminous",
>>>             "num": 4
>>>         },
>>>         "group": {
>>>             "features": "0x1ffddff8eea4fffb",
>>>             "release": "luminous",
>>>             "num": 61
>>>         }
>>>     }
>>> }
>>>
>>>
>>> --
>>> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> --
> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch

-- 
Jason
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com