On Wed, 19 Feb 2025 21:15:08 -0500 Waiman Long <llong@xxxxxxxxxx> wrote: > > On 2/19/25 8:41 PM, Steven Rostedt wrote: > > On Wed, 19 Feb 2025 20:36:13 -0500 > > Waiman Long <llong@xxxxxxxxxx> wrote: > > > > > >>>>>> this field, we don't need to take lock, though taking the wait_lock may > >>>>>> still be needed to examine other information inside the mutex. > >>> Do we need to take it just for accessing owner, which is in an atomic? > >> Right. I forgot it is an atomic_long_t. In that case, no lock should be > >> needed. > > Now if we have a two fields to read: > > > > block_flags (for the type of lock) and blocked_on (for the lock) > > > > We need a way to synchronize the two. What happens if we read the type, and > > the task wakes up and and then blocks on a different type of lock? > > > > Then the lock read from blocked_on could be a different type of lock than > > what is expected. > > That is different from reading the owner. In this case, we need to use > smp_rmb()/wmb() to sequence the read and write operations unless it is > guaranteed that they are in the same cacheline. One possible way is as > follows: > > Writer - setting them: > > WRITE_ONCE(lock) > smp_wmb() > WRITE_ONCE(type) > > Clearing them: > > WRITE_ONCE(type, 0) > smp_wmb() > WRITE_ONCE(lock, NULL) > > Reader: > > READ_ONCE(type) > again: > smp_rmb() > READ_ONCE(lock) > smp_rmb() > if (READ_ONCE(type) != type) > goto again What about mutex-rwsem-mutex case? mutex_lock(&lock1); down_read(&lock2); mutex_lock(&lock3); The worst scenario is; WRITE_ONCE(lock, &lock1) smp_wmb() WRITE_ONCE(type, MUTEX) READ_ONCE(type) -> MUTEX WRITE_ONCE(type, 0) smp_wmb() WRITE_ONCE(lock, NULL) WRITE_ONCE(lock, &lock2) READ_ONCE(lock) -> &lock2 smp_wmb() WRITE_ONCE(type, RWSEM) WRITE_ONCE(type, 0) smp_wmb() WRITE_ONCE(lock, NULL) WRITE_ONCE(lock, &lock3) smp_wmb() WRITE_ONCE(type, MUTEX) READ_ONCE(type) -> MUTEX == MUTEX WRITE_ONCE(type, 0) smp_wmb() WRITE_ONCE(lock, NULL) "OK, lock2 is a MUTEX!" So unless stopping the blocker task, we can not ensure this works. But unless decode the lock, we don't know the blocker task. Maybe we can run the hung_task in stop_machine()? (or introduce common_lock) Thank you, -- Masami Hiramatsu (Google) <mhiramat@xxxxxxxxxx>