Re: RBD I/O errors with QEMU [luminous upgrade/osd change]

Nico Schottelius <nico.schottelius@xxxxxxxxxxx> · Mon, 11 Sep 2017 10:45:51 +0200

Good morning Lionel,

it's great to hear that it's not only us being affected!

I am not sure what you refer to by "glance" images, but what we see is
that we can spawn a new VM based on an existing image and that one runs.

Can I invite you (and anyone else who has problems w/ Luminous upgrade)
to join our chat at https://brandnewchat.ungleich.ch/ so that we can
discuss online the real world problems?

For us it is currently very unclear how to progress, if it is even save
to rejoin the host into the cluster or if a downgrade would even make
sense.

Best,

Nico

p.s.: This cluster was installed with kraken, so no old jewel clients or
osds have existed at all.

Beard Lionel (BOSTON-STORAGE) <lbeard@xxxxxx> writes:

> Hi,
>
> We also have the same issue with Openstack instances (QEMU/libvirt) after upgrading from kraken to luminous, and just after starting osd migration from btrfs to bluestore.
> We were able to restart failed VMs by mounting all disks from a linux box with rbd map, and run fsck on them.
> QEMU hosts are running Ubuntu with kernel > 4.4.
> We have noticed that one of our QEMU hosts was still running jewel ceph client (error during installation...) , and issue doesn't happen on this one.
>
> Don't you have issues with some glance images?
> Because we do (unable to spawn an instance from them), and it was fixed by following this ticket: http://tracker.ceph.com/issues/19413
>
> Regards,
> Lionel
>
>> -----Original Message-----
>> From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>> Nico Schottelius
>> Sent: dimanche 10 septembre 2017 14:23
>> To: ceph-users <ceph-users@xxxxxxxx>
>> Cc: kamila.souckova@xxxxxxxxxxx
>> Subject:  RBD I/O errors with QEMU [luminous upgrade/osd
>> change]
>>
>>
>> Good morning,
>>
>> yesterday we had an unpleasant surprise that I would like to discuss:
>>
>> Many (not all!) of our VMs were suddenly dying (qemu process exiting) and
>> when trying to restart them, inside the qemu process we saw i/o errors on
>> the disks and the OS was not able to start (i.e. stopped in initramfs).
>>
>> When we exported the image from rbd and loop mounted it, there were
>> however no I/O errors and the filesystem could be cleanly mounted [-1].
>>
>> We are running Devuan with kernel 3.16.0-4-amd64 and saw that there are
>> some problems reported with kernels < 3.16.39 and thus we upgraded one
>> host that serves as VM host + runs ceph osds to Devuan ascii using 4.9.0-3-
>> amd64.
>>
>> Trying to start the VM again on this host however resulted in the same I/O
>> problem.
>>
>> We then did the "stupid" approach of exporting an image and importing it
>> again as the same name [0]. Surprisingly, this solved our problem
>> reproducible for all affected VMs and allowed us to go back online.
>>
>> We intentionally left one broken VM in our system (a test VM) so that we
>> have the chance of debugging further what happened and how we can
>> prevent it from happening again.
>>
>> As you might have guessed, there have been some event prior this:
>>
>> - Some weeks before we upgraded our cluster from kraken to luminous (in
>>   the right order of mon's first, adding mgrs)
>>
>> - About a week ago we added the first hdd to our cluster and modified the
>>   crushmap so that it the "one" pool (from opennebula) still selects
>>   only ssds
>>
>> - Some hours before we took out one of the 5 hosts of the ceph cluster,
>>   as we intended to replace the filesystem based OSDs with bluestore
>>   (roughly 3 hours prior to the event)
>>
>> - Short time before the event we readded an osd, but did not "up" it
>>
>> To our understanding, none of these actions should have triggered this
>> behaviour, however we are aware that with the upgrade to luminous also
>> the client libraries were updated and not all qemu processes were restarted.
>> [1]
>>
>> After this long story, I was wondering about the following things:
>>
>> - Why did this happen at all?
>>   And what is different after we reimported the image?
>>   Can it be related to disconnected the image from the parent
>>   (i.e. opennebula creates clones prior to starting a VM)
>>
>> - We have one broken VM left - is there a way to get it back running
>>   without doing the export/import dance?
>>
>> - How / or is http://tracker.ceph.com/issues/18807 related to our issue?
>>   How is the kernel involved into running VMs that use librbd?
>>   rbd showmapped does not show any mapped VMs, as qemu connects
>> directly
>>   to ceph.
>>
>>   We tried upgrading one host to Devuan ascii which uses 4.9.0-3-amd64,
>>   but did not fix our problem.
>>
>> We would appreciate any pointer!
>>
>> Best,
>>
>> Nico
>>
>>
>> [-1]
>> losetup -P /dev/loop0 /var/tmp/one-staging/monitoring1-disk.img
>> mkdir /tmp/monitoring1-mnt
>> mount /dev/loop0p1 /tmp/monitoring1-mnt/
>>
>>
>> [0]
>>
>> rbd export one/$img /var/tmp/one-staging/$img rbd rm one/$img rbd
>> import /var/tmp/one-staging/$img one/$img rm /var/tmp/one-staging/$img
>>
>> [1]
>> [14:05:34] server5:~# ceph features
>> {
>>     "mon": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 3
>>         }
>>     },
>>     "osd": {
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 49
>>         }
>>     },
>>     "client": {
>>         "group": {
>>             "features": "0xffddff8ee84fffb",
>>             "release": "kraken",
>>             "num": 1
>>         },
>>         "group": {
>>             "features": "0xffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 4
>>         },
>>         "group": {
>>             "features": "0x1ffddff8eea4fffb",
>>             "release": "luminous",
>>             "num": 61
>>         }
>>     }
>> }
>>
>>
>> --
>> Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@xxxxxxxxxxxxxx
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>                            Cliquez sur l'url suivante
>> https://www.mailcontrol.com/sr/dWzYf5XXCpHGX2PQPOmvUsQN7C0BBstk
>> nMRzZSjIekZRSJBaW5eaTkSPXK6hbl4Cz!hRGaJvxjD4eW8epClCcw==
>>                     si ce message est indésirable (pourriel).
>>                            Click
>> https://www.mailcontrol.com/sr/dWzYf5XXCpHGX2PQPOmvUsQN7C0BBstk
>> nMRzZSjIekZRSJBaW5eaTkSPXK6hbl4Cz!hRGaJvxjD4eW8epClCcw==
>>                     if this mail is unwanted (SPAM).
> ________________________________
>
> Ce message et toutes les pièces jointes (ci-après le "message") sont établis à l'intention exclusive de ses destinataires et sont confidentiels. Si vous recevez ce message par erreur ou s'il ne vous est pas destiné, merci de le détruire ainsi que toute copie de votre système et d'en avertir immédiatement l'expéditeur. Toute lecture non autorisée, toute utilisation de ce message qui n'est pas conforme à sa destination, toute diffusion ou toute publication, totale ou partielle, est interdite. L'Internet ne permettant pas d'assurer l'intégrité de ce message électronique susceptible d'altération, l’expéditeur (et ses filiales) décline(nt) toute responsabilité au titre de ce message dans l'hypothèse où il aurait été modifié ou falsifié.
>
> This message and any attachments (the "message") is intended solely for the intended recipient(s) and is confidential. If you receive this message in error, or are not the intended recipient(s), please delete it and any copies from your systems and immediately notify the sender. Any unauthorized view, use that does not comply with its purpose, dissemination or disclosure, either whole or partial, is prohibited. Since the internet cannot guarantee the integrity of this message which may not be reliable, the sender (and its subsidiaries) shall not be liable for the message if modified or falsified.

--
Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com