On Tue, 6 Jan 2009, Andrew Morton wrote: > On Tue, 06 Jan 2009 12:40:31 +0100 Peter Zijlstra <peterz@xxxxxxxxxxxxx> wrote: > > > Subject: mutex: adaptive spin > > From: Peter Zijlstra <a.p.zijlstra@xxxxxxxxx> > > Date: Tue Jan 06 12:32:12 CET 2009 > > > > Based on the code in -rt, provide adaptive spins on generic mutexes. > > > > How dumb is it to send a lump of uncommented, changelogged code as an > rfc, only to have Ingo reply, providing the changelog for you? Sounds smart to me. He was able to get someone else to do the work. ;-) > > Sigh. > > > --- linux-2.6.orig/kernel/mutex.c > > +++ linux-2.6/kernel/mutex.c > > @@ -46,6 +46,7 @@ __mutex_init(struct mutex *lock, const c > > atomic_set(&lock->count, 1); > > spin_lock_init(&lock->wait_lock); > > INIT_LIST_HEAD(&lock->wait_list); > > + lock->owner = NULL; > > > > debug_mutex_init(lock, name, key); > > } > > @@ -120,6 +121,28 @@ void __sched mutex_unlock(struct mutex * > > > > EXPORT_SYMBOL(mutex_unlock); > > > > +#ifdef CONFIG_SMP > > +static int adaptive_wait(struct mutex_waiter *waiter, > > + struct task_struct *owner, long state) > > +{ > > + for (;;) { > > + if (signal_pending_state(state, waiter->task)) > > + return 0; > > + if (waiter->lock->owner != owner) > > + return 0; > > + if (!task_is_current(owner)) > > + return 1; > > + cpu_relax(); > > + } > > +} > > Each of the tests in this function should be carefully commented. It's > really the core piece of the design. Yep, I agree here too. > > > +#else > > +static int adaptive_wait(struct mutex_waiter *waiter, > > + struct task_struct *owner, long state) > > +{ > > + return 1; > > +} > > +#endif > > + > > /* > > * Lock a mutex (possibly interruptible), slowpath: > > */ > > @@ -127,7 +150,7 @@ static inline int __sched > > __mutex_lock_common(struct mutex *lock, long state, unsigned int subclass, > > unsigned long ip) > > { > > - struct task_struct *task = current; > > + struct task_struct *owner, *task = current; > > struct mutex_waiter waiter; > > unsigned int old_val; > > unsigned long flags; > > @@ -141,6 +164,7 @@ __mutex_lock_common(struct mutex *lock, > > /* add waiting tasks to the end of the waitqueue (FIFO): */ > > list_add_tail(&waiter.list, &lock->wait_list); > > waiter.task = task; > > + waiter.lock = lock; > > > > old_val = atomic_xchg(&lock->count, -1); > > if (old_val == 1) > > @@ -175,11 +199,19 @@ __mutex_lock_common(struct mutex *lock, > > debug_mutex_free_waiter(&waiter); > > return -EINTR; > > } > > - __set_task_state(task, state); > > > > - /* didnt get the lock, go to sleep: */ > > + owner = lock->owner; > > What prevents *owner from exitting right here? Yeah, this should be commented. Why this is not a race is because the owner has the mutex here, and we have the lock->wait_lock. When the owner releases the mutex it must go into the slow unlock path and grab the lock->wait_lock spinlock. Thus it will block and not go away. Of course if there's a bug elsewhere in the kernel that lets the task exit without releasing the mutex, this will fail. But in that case, there's bigger problems that need to be fixed. > > > + get_task_struct(owner); > > spin_unlock_mutex(&lock->wait_lock, flags); Here, we get the owner before releasing the wait_lock. > > - schedule(); > > + > > + if (adaptive_wait(&waiter, owner, state)) { > > + put_task_struct(owner); > > + __set_task_state(task, state); > > + /* didnt get the lock, go to sleep: */ > > + schedule(); > > + } else > > + put_task_struct(owner); > > + > > spin_lock_mutex(&lock->wait_lock, flags); > > } > > > > @@ -187,7 +219,7 @@ done: > > lock_acquired(&lock->dep_map, ip); > > /* got the lock - rejoice! */ > > mutex_remove_waiter(lock, &waiter, task_thread_info(task)); > > - debug_mutex_set_owner(lock, task_thread_info(task)); > > + lock->owner = task; > > > > /* set it to 0 if there are no waiters left: */ > > if (likely(list_empty(&lock->wait_list))) > > @@ -260,7 +292,7 @@ __mutex_unlock_common_slowpath(atomic_t > > wake_up_process(waiter->task); > > } > > > > - debug_mutex_clear_owner(lock); > > + lock->owner = NULL; > > > > spin_unlock_mutex(&lock->wait_lock, flags); > > } > > @@ -352,7 +384,7 @@ static inline int __mutex_trylock_slowpa > > > > prev = atomic_xchg(&lock->count, -1); > > if (likely(prev == 1)) { > > - debug_mutex_set_owner(lock, current_thread_info()); > > + lock->owner = current; > > mutex_acquire(&lock->dep_map, 0, 1, _RET_IP_); > > } > > /* Set it back to 0 if there are no waiters: */ > > Index: linux-2.6/kernel/sched.c > > =================================================================== > > --- linux-2.6.orig/kernel/sched.c > > +++ linux-2.6/kernel/sched.c > > @@ -697,6 +697,11 @@ int runqueue_is_locked(void) > > return ret; > > } > > > > +int task_is_current(struct task_struct *p) > > +{ > > + return task_rq(p)->curr == p; > > +} > > Please don't add kernel-wide infrastructure and leave it completely > undocumented. Particularly functions which are as vague and dangerous > as this one. > > What locking must the caller provide? What are the semantics of the > return value? > > What must the caller do to avoid oopses if *p is concurrently exiting? > > etc. > > > The overall design intent seems very smart to me, as long as the races > can be plugged, if they're indeed present. Thanks for the complement on the design. It came about lots of work by many people in the -rt tree. What we need here is to comment the code for those that did not see the evolution of the changes that were made. -- Steve -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html