Hello, I am trying to understand how mutexes work in the kernel, and I think there might be a race between mutex_trylock() and mutex_unlock(). More specifically, the race is between the functions __mutex_trylock_slowpath and __mutex_unlock_common_slowpath (both defined in kernel/locking/mutex.c). Consider the following sequence of events: 0. Suppose a mutex is locked by task A and has no waiters. 1. Task B calls mutex_trylock(). 2. mutex_trylock() calls the architecture-specific __mutex_fastpath_trylock(), with __mutex_trylock_slowpath() as fail_fn. 3. According to the description of __mutex_fastpath_trylock() (for example in include/asm-generic/mutex-dec.h), "if the architecture has no effective trylock variant, it should call the fail_fn spinlock-based trylock variant unconditionally". So __mutex_fastpath_trylock() may now call __mutex_trylock_slowpath(). 4. Task A releases the mutex. 5. Task B, in __mutex_trylock_slowpath, executes: /* No need to trylock if the mutex is locked. */ if (mutex_is_locked(lock)) return 0; Since the mutex is no longer locked, the function continues. 6. Task C, which runs on a different cpu than task B, locks the mutex again. 7. Task B, in __mutex_trylock_slowpath(), continues: spin_lock_mutex(&lock->wait_lock, flags); prev = atomic_xchg(&lock->count, -1); if (likely(prev == 1)) { mutex_set_owner(lock); mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_); } At this point task B holds mutex->wait_lock, prev is 0 (because there are no waiters other than task B, so the count was 0) and the mutex count is set to -1. 5. Task C calls mutex_unlock() to unlock the mutex. 6. mutex_unlock() calls the architecture-specific function __mutex_fastpath_unlock(), which fails (because the mutex count is -1), so it now calls __mutex_unlock_slowpath(), which calls __mutex_unlock_common_slowpath(). 7. __mutex_unlock_common_slowpath() sets the mutex count to 1 unconditionally, before spinning on mutex->wait_lock. 8. Task B, in __mutex_trylock_slowpath, continues: /* Set it back to 0 if there are no waiters: */ if (likely(list_empty(&lock->wait_list))) atomic_set(&lock->count, 0); spin_unlock_mutex(&lock->wait_lock, flags); return prev == 1; mutex->wait_list is still empty, so the code sets the mutex count to zero (which means the mutex is locked), releases mutex->wait_lock, and returns 0 (which means that the mutex is locked by someone else, and cannot be acquired). 9. Task C, in __mutex_unlock_common_slowpath, acquires mutex->wait_lock, unlocks it immediately (because there are no waiters to wake up) and returns. The end result is that the mutex count is 0 (locked), although the owner has just released it, and nobody else is holding the mutex. So it can no longer be acquired by anyone. Am I missing something that prevents the above scenario from happening? If not, should I post a patch that fixes it to LKML? Or is it considered too "theoretical" and cannot happen in practice? Thanks. _______________________________________________ Kernelnewbies mailing list Kernelnewbies@xxxxxxxxxxxxxxxxx http://lists.kernelnewbies.org/mailman/listinfo/kernelnewbies