Re: corrupted rbd filesystems since jewel

Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx> · Tue, 16 May 2017 08:12:47 +0200

Hello Jason,

it got some further hints. Please see below.

Am 15.05.2017 um 22:25 schrieb Jason Dillaman:
> On Mon, May 15, 2017 at 3:54 PM, Stefan Priebe - Profihost AG
> <s.priebe@xxxxxxxxxxxx> wrote:
>> Would it be possible that the problem is the same you fixed?
> 
> No, I would not expect it to be related to the other issues you are
> seeing. The issue I just posted a fix against only occurs when a
> client requests the lock from the current owner, which will only occur
> under the following scenarios: (1) attempt to write to the image
> locked by another client, (2) attempt to disable image features on an
> image locked by another client, (3) demote a primary mirrored image
> when locked by another client, or (4) the rbd CLI attempted to perform
> an operation not supported by the currently running lock owner client
> due to version mismatch.

ah OK. Mhm nothing i would expect.

> I am assuming you are not running two VMs concurrently using the same
> backing RBD image, so that would eliminate possibility (1).

No i do not.

I investigated a lot of time in analyzing the log files. What i can tell
so far is:

1.) it happens very often, when we issue a fstrim command on the root
device of a vm. We're using Qemu virtio-scsi backend with:

cache=writeback,aio=threads,detect-zeroes=unmap,discard=on

2.) but it also happens on other unknown "operations" - at least fstrim
seems to trigger it at best

3.) it happens once or twice a night while doing around 1500-2000
backups. So it looks like a race to me.

3.) it still happens on pre jewel images even when they got restarted /
killed and reinitialized. In that case they've the asok socket available
for now. Should i issue any command to the socket to get log out of the
hanging vm? Qemu is still responding just ceph / disk i/O gets stalled.

Greets,
Stefan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com