Hi,
We're hitting an odd issue on our ceph cluster:
- We have machine1 mapping an exclusive-lock RBD.
- Machine2 wants to take a snapshot of the RBD, but fails to take the lock.
Stracing the rbd snap process on machine2 shows it looping on sending "lockget" commands, without ever moving forward.
In rbd status, we see that machine1 is a watcher on the image, which is expected. What is not expected is that the rbd snap process can't get the lock.
This commit deployed in 10.2.10, which we are using, sounds related: https://github.com/ceph/ceph/commit/475dda114a7e25b43dc9066b9808a64fc0c6dc89
But there isn't the equivalent in ceph-client's code, which we would expect too. That said, I don't have a full understanding, so I might be off-base there.
Am I wrong in expecting the equivalent in ceph-client's code? (aka Linux kernel) Am I completely off-base as to what is wrong there? Can I provide any additional information to help debugging?
Regards,
Florian
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com