> 3.) it still happens on pre jewel images even when they got restarted > / killed and reinitialized. In that case they've the asok socket > available > for now. Should i issue any command to the socket to get log out of > the hanging vm? Qemu is still responding just ceph / disk i/O gets > stalled. This meant to be restarting qemu. Stefan Am 16.05.2017 um 08:12 schrieb Stefan Priebe - Profihost AG: > Hello Jason, > > it got some further hints. Please see below. > > Am 15.05.2017 um 22:25 schrieb Jason Dillaman: >> On Mon, May 15, 2017 at 3:54 PM, Stefan Priebe - Profihost AG >> <s.priebe@xxxxxxxxxxxx> wrote: >>> Would it be possible that the problem is the same you fixed? >> >> No, I would not expect it to be related to the other issues you are >> seeing. The issue I just posted a fix against only occurs when a >> client requests the lock from the current owner, which will only occur >> under the following scenarios: (1) attempt to write to the image >> locked by another client, (2) attempt to disable image features on an >> image locked by another client, (3) demote a primary mirrored image >> when locked by another client, or (4) the rbd CLI attempted to perform >> an operation not supported by the currently running lock owner client >> due to version mismatch. > > ah OK. Mhm nothing i would expect. > >> I am assuming you are not running two VMs concurrently using the same >> backing RBD image, so that would eliminate possibility (1). > > No i do not. > > I investigated a lot of time in analyzing the log files. What i can tell > so far is: > > 1.) it happens very often, when we issue a fstrim command on the root > device of a vm. We're using Qemu virtio-scsi backend with: > > cache=writeback,aio=threads,detect-zeroes=unmap,discard=on > > 2.) but it also happens on other unknown "operations" - at least fstrim > seems to trigger it at best > > 3.) it happens once or twice a night while doing around 1500-2000 > backups. So it looks like a race to me. > > 3.) it still happens on pre jewel images even when they got restarted / > killed and reinitialized. In that case they've the asok socket available > for now. Should i issue any command to the socket to get log out of the > hanging vm? Qemu is still responding just ceph / disk i/O gets stalled. > > Greets, > Stefan > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com