CephFS currently deadlocks under CTDB's ping_pong POSIX locking test when run concurrently on multiple nodes. The deadlock is caused by failed removal of a waiting_locks entry when the waiting lock is merged with an existing lock, e.g: Initial MDS state (two clients, same file): held_locks -- start: 0, length: 1, client: 4116, pid: 7899, type: 2 start: 2, length: 1, client: 4110, pid: 40767, type: 2 waiting_locks -- start: 1, length: 1, client: 4116, pid: 7899, type: 2 Waiting lock entry 4116@1:1 fires: handle_client_file_setlock: start: 1, length: 1, client: 4116, pid: 7899, type: 2 MDS state after lock is obtained: held_locks -- start: 0, length: 2, client: 4116, pid: 7899, type: 2 start: 2, length: 1, client: 4110, pid: 40767, type: 2 waiting_locks -- start: 1, length: 1, client: 4116, pid: 7899, type: 2 Note that the waiting 4116@1:1 lock entry is merged with the existing 4116@0:1 held lock to become a 4116@0:2 held lock. However, the now handled 4116@1:1 waiting_locks entry remains. When handling a lock request, the MDS calls adjust_locks() to merge the new lock with available neighbours. If the new lock is merged, then the waiting_locks entry is not located in the subsequent remove_waiting() call. This fix ensures that the waiting_locks entry is removed prior to modification during merge. Signed-off-by: David Disseldorp <ddiss@xxxxxxx> --- src/mds/flock.cc | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/src/mds/flock.cc b/src/mds/flock.cc index e83c5ee..5e329af 100644 --- a/src/mds/flock.cc +++ b/src/mds/flock.cc @@ -75,12 +75,14 @@ bool ceph_lock_state_t::add_lock(ceph_filelock& new_lock, } else { //yay, we can insert a shared lock dout(15) << "inserting shared lock" << dendl; + remove_waiting(new_lock); adjust_locks(self_overlapping_locks, new_lock, neighbor_locks); held_locks.insert(pair<uint64_t, ceph_filelock>(new_lock.start, new_lock)); ret = true; } } } else { //no overlapping locks except our own + remove_waiting(new_lock); adjust_locks(self_overlapping_locks, new_lock, neighbor_locks); dout(15) << "no conflicts, inserting " << new_lock << dendl; held_locks.insert(pair<uint64_t, ceph_filelock> @@ -89,7 +91,6 @@ bool ceph_lock_state_t::add_lock(ceph_filelock& new_lock, } if (ret) { ++client_held_lock_counts[(client_t)new_lock.client]; - remove_waiting(new_lock); } else if (wait_on_fail && !replay) ++client_waiting_lock_counts[(client_t)new_lock.client]; @@ -306,7 +307,7 @@ void ceph_lock_state_t::adjust_locks(list<multimap<uint64_t, ceph_filelock>::ite old_lock = &(*iter)->second; old_lock_client = old_lock->client; dout(15) << "lock to coalesce: " << *old_lock << dendl; - /* because if it's a neibhoring lock there can't be any self-overlapping + /* because if it's a neighboring lock there can't be any self-overlapping locks that covered it */ if (old_lock->type == new_lock.type) { //merge them if (0 == new_lock.length) { -- 1.8.1.4 -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html