On Mon, 2012-03-19 at 16:49 +0000, Adamson, Andy wrote: > On Mar 19, 2012, at 12:44 PM, Andy Adamson wrote: > > > > > The reason rpc_wake_up() does not wake up all the tasks, is that it uses list_for_each_entry_safe which assigns the "next" task->u.tk_wait.lists pointer in it's for loop without taking into consideration the priority queue tasks hanging off the task->u.tk_wait.links list (note "list" vrs "link") assigned in the for loop body by the call to rpc_wake_up_task_queue_locked. > > > > Say we have a wait queue with the following: One task in the list (T1) and two priority tasks in T1's "links" list (T2, T3). > > > > H -> T1 -> H the next pointers in the "lists" list (T1->u.tk_wait.list) hanging off the list head "H" > > > > T1 -> T2 -> T3->T1 the next pointers in the "links" list. (T1->u.tk_wait.links is the list head) > > > > Here is what rpc_wake_up_task_queue_locked does (it calls __rpc_remove_wait_queue_priority on priority queues) > > > > H -> T2 ->H > > > > T2 -> T3 -> T2 > > > > with T1 removed. This is exactly what should happen, T1 is removed, T1's u.tk_wait.links list is spliced onto T2's u.tk_wait.links, and T2 is moved to H's list of u.tk_wait.list tasks. > > > > > > #define list_for_each_entry_safe(pos, n, head, member) \ > > for (pos = list_entry((head)->next, typeof(*pos), member), \ > > n = list_entry(pos->member.next, typeof(*pos), member); \ > > &pos->member != (head); \ > > pos = n, n = list_entry(n->member.next, typeof(*n), member)) > > > > > > BUT! the for loop in list_for_each_entry_safe(task, next, head, u.tk_wait.list) does the following on the above example: > > > > Start: > > > > H-> T1 -> H > > T1 -> T2 -> T3 -> T1 > > > > for loop Iniitializer: > > > > task = list_entry((head)->next, typeof(*task), u.tk_wait.list): > > next = list_entry(task->u.tk_wait.list.next, typeof(*task), u.tk_wait.list) > > > > so task = T1, next = H. > > > > for loop test: > > task != H. This is FALSE > > oops - got my TRUE/FALSE mixed ;) > this is TRUE so execute body > > > > > for loop body: > > call rpc_wake_up_task_queue_locked > > > > H -> T2 -> H > > T2 ->.T3 -> T2 > > > > for loop increment step > > task = next; > > next = list_entry(next->u.tk_wait.list.next, typeof(*next), u.tk_wait.list); > > > > so task = H, next = H. > > > > for loop test > > > > task != H This is TRUE > > and this is FALSE do don't execute body > > -->Andy > > > > so stop. > > > > Note that only T1 was processed - T2 and T3 are still on the queue. > > > > So list_for_each_entry_safe will not process any u.tk_wait.links tasks, because the next pointer is assigned prior to the call to rpc_wake_up_task_queue_locked, so the for loop increment (or initialization) setting of: > > > > task = next: > > > > does NOT pick up the fact that (in our example) T1 was REPLACED by T2, not just removed. > > > > On the other hand, the rpc_drain_queue() patch uses while(!list_empty()) instead of list_for_each_entry_safe() and happily processes all of the tasks on the queue: > > > > H -> T1 -> H > > > > lilst is not empty, so call rpc_wake_up_task_queue_locked. > > > > H -> T2 -> H > > > > list is not empty; so call rpc_wake_up _task_queue_locked. > > > > H -> T3 -> H > > > > list is not empty; so call rpc_wake_up_task_queue_locked > > > > list is empty. > > > > > > -->Andy > > Thanks Andy!!! Your explanation makes perfect sense. I can't believe that we've missed this... I'll write a patch for the various cases that we need to fix. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@xxxxxxxxxx www.netapp.com ��.n��������+%������w��{.n�����{��w���jg��������ݢj����G�������j:+v���w�m������w�������h�����٥