Re: corrupted rbd filesystems since jewel

Brian Andrus <brian.andrus@xxxxxxxxxxxxx> · Thu, 4 May 2017 10:24:46 -0700

Hi Stefan - we simply disabled exclusive-lock on all older (pre-jewel) images. We still allow the default jewel featuresets for newly created images because as you mention - the issue does not seem to affect them.

On Thu, May 4, 2017 at 10:19 AM, Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> wrote:
Hello Brian,

this really sounds the same. I don't see this on a cluster with only

images created AFTER jewel. And it seems to start happening after i

enabled exclusive lock on all images.

Did just use feature disable, exclusive-lock,fast-diff,object-map or did

you also restart all those vms?

Greets,

Stefan

Am 04.05.2017 um 19:11 schrieb Brian Andrus:

> Sounds familiar... and discussed in "disk timeouts in libvirt/qemu VMs..."

>

> We have not had this issue since reverting exclusive-lock, but it was

> suggested this was not the issue. So far it's held up for us with not a

> single corrupt filesystem since then.

>

> On some images (ones created post-Jewel upgrade) the feature could not

> be disabled, but these don't seem to be affected. Of course, we never

> did pinpoint the cause of timeouts, so it's entirely possible something

> else was causing it but no other major changes went into effect.

>

> One thing to look for that might confirm the same issue are timeouts in

> the guest VM. Most OS kernel will report a hung task in conjunction with

> the hang up/lock/corruption. Wondering if you're seeing that too.

>

> On Wed, May 3, 2017 at 10:49 PM, Stefan Priebe - Profihost AG

> <s.priebe@xxxxxxxxxxxx <mailto:s.priebe@xxxxxxxxxxxx>> wrote:

>

>     Hello,

>

>     since we've upgraded from hammer to jewel 10.2.7 and enabled

>     exclusive-lock,object-map,fast-diff we've problems with corrupting VM

>     filesystems.

>

>     Sometimes the VMs are just crashing with FS errors and a restart can

>     solve the problem. Sometimes the whole VM is not even bootable and we

>     need to import a backup.

>

>     All of them have the same problem that you can't revert to an older

>     snapshot. The rbd command just hangs at 99% forever.

>

>     Is this a known issue - anythink we can check?

>

>     Greets,

>     Stefan

>     _______________________________________________

>     ceph-users mailing list

>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>

>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>

>

>

>

>

> --

> Brian Andrus | Cloud Systems Engineer | DreamHost

> brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com <http://www.dreamhost.com>

-- 
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com