Re: corrupted rbd filesystems since jewel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Stefan - we simply disabled exclusive-lock on all older (pre-jewel) images. We still allow the default jewel featuresets for newly created images because as you mention - the issue does not seem to affect them.

On Thu, May 4, 2017 at 10:19 AM, Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> wrote:
Hello Brian,

this really sounds the same. I don't see this on a cluster with only
images created AFTER jewel. And it seems to start happening after i
enabled exclusive lock on all images.

Did just use feature disable, exclusive-lock,fast-diff,object-map or did
you also restart all those vms?

Greets,
Stefan

Am 04.05.2017 um 19:11 schrieb Brian Andrus:
> Sounds familiar... and discussed in "disk timeouts in libvirt/qemu VMs..."
>
> We have not had this issue since reverting exclusive-lock, but it was
> suggested this was not the issue. So far it's held up for us with not a
> single corrupt filesystem since then.
>
> On some images (ones created post-Jewel upgrade) the feature could not
> be disabled, but these don't seem to be affected. Of course, we never
> did pinpoint the cause of timeouts, so it's entirely possible something
> else was causing it but no other major changes went into effect.
>
> One thing to look for that might confirm the same issue are timeouts in
> the guest VM. Most OS kernel will report a hung task in conjunction with
> the hang up/lock/corruption. Wondering if you're seeing that too.
>
> On Wed, May 3, 2017 at 10:49 PM, Stefan Priebe - Profihost AG
> <s.priebe@xxxxxxxxxxxx <mailto:s.priebe@xxxxxxxxxxxx>> wrote:
>
>     Hello,
>
>     since we've upgraded from hammer to jewel 10.2.7 and enabled
>     exclusive-lock,object-map,fast-diff we've problems with corrupting VM
>     filesystems.
>
>     Sometimes the VMs are just crashing with FS errors and a restart can
>     solve the problem. Sometimes the whole VM is not even bootable and we
>     need to import a backup.
>
>     All of them have the same problem that you can't revert to an older
>     snapshot. The rbd command just hangs at 99% forever.
>
>     Is this a known issue - anythink we can check?
>
>     Greets,
>     Stefan
>     _______________________________________________
>     ceph-users mailing list
>     ceph-users@xxxxxxxxxxxxxx <mailto:ceph-users@xxxxxxxxxx.com>
>     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>
>
>
>
> --
> Brian Andrus | Cloud Systems Engineer | DreamHost
> brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com <http://www.dreamhost.com>



--
Brian Andrus | Cloud Systems Engineer | DreamHost
brian.andrus@xxxxxxxxxxxxx | www.dreamhost.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux