Hi Stefan - we simply disabled exclusive-lock on all older (pre-jewel) images. We still allow the default jewel featuresets for newly created images because as you mention - the issue does not seem to affect them.
On Thu, May 4, 2017 at 10:19 AM, Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> wrote:
Hello Brian,
this really sounds the same. I don't see this on a cluster with only
images created AFTER jewel. And it seems to start happening after i
enabled exclusive lock on all images.
Did just use feature disable, exclusive-lock,fast-diff,object-map or did
you also restart all those vms?
Greets,
Stefan
Am 04.05.2017 um 19:11 schrieb Brian Andrus:
> Sounds familiar... and discussed in "disk timeouts in libvirt/qemu VMs..."
>
> We have not had this issue since reverting exclusive-lock, but it was
> suggested this was not the issue. So far it's held up for us with not a
> single corrupt filesystem since then.
>
> On some images (ones created post-Jewel upgrade) the feature could not
> be disabled, but these don't seem to be affected. Of course, we never
> did pinpoint the cause of timeouts, so it's entirely possible something
> else was causing it but no other major changes went into effect.
>
> One thing to look for that might confirm the same issue are timeouts in
> the guest VM. Most OS kernel will report a hung task in conjunction with
> the hang up/lock/corruption. Wondering if you're seeing that too.
>
> On Wed, May 3, 2017 at 10:49 PM, Stefan Priebe - Profihost AG
> <s.priebe@xxxxxxxxxxxx <mailto:s.priebe@xxxxxxxxxxxx>> wrote: > ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.
>
> Hello,
>
> since we've upgraded from hammer to jewel 10.2.7 and enabled
> exclusive-lock,object-map,fast-diff we've problems with corrupting VM
> filesystems.
>
> Sometimes the VMs are just crashing with FS errors and a restart can
> solve the problem. Sometimes the whole VM is not even bootable and we
> need to import a backup.
>
> All of them have the same problem that you can't revert to an older
> snapshot. The rbd command just hangs at 99% forever.
>
> Is this a known issue - anythink we can check?
>
> Greets,
> Stefan
> _______________________________________________
> ceph-users mailing list
com >
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. com
> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph. >com
>
>
>
>
> --
> Brian Andrus | Cloud Systems Engineer | DreamHost
> brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com <http://www.dreamhost.com>
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com