On Mar 19, 2012, at 12:44 PM, Andy Adamson wrote: > > The reason rpc_wake_up() does not wake up all the tasks, is that it uses list_for_each_entry_safe which assigns the "next" task->u.tk_wait.lists pointer in it's for loop without taking into consideration the priority queue tasks hanging off the task->u.tk_wait.links list (note "list" vrs "link") assigned in the for loop body by the call to rpc_wake_up_task_queue_locked. > > Say we have a wait queue with the following: One task in the list (T1) and two priority tasks in T1's "links" list (T2, T3). > > H -> T1 -> H the next pointers in the "lists" list (T1->u.tk_wait.list) hanging off the list head "H" > > T1 -> T2 -> T3->T1 the next pointers in the "links" list. (T1->u.tk_wait.links is the list head) > > Here is what rpc_wake_up_task_queue_locked does (it calls __rpc_remove_wait_queue_priority on priority queues) > > H -> T2 ->H > > T2 -> T3 -> T2 > > with T1 removed. This is exactly what should happen, T1 is removed, T1's u.tk_wait.links list is spliced onto T2's u.tk_wait.links, and T2 is moved to H's list of u.tk_wait.list tasks. > > > #define list_for_each_entry_safe(pos, n, head, member) \ > for (pos = list_entry((head)->next, typeof(*pos), member), \ > n = list_entry(pos->member.next, typeof(*pos), member); \ > &pos->member != (head); \ > pos = n, n = list_entry(n->member.next, typeof(*n), member)) > > > BUT! the for loop in list_for_each_entry_safe(task, next, head, u.tk_wait.list) does the following on the above example: > > Start: > > H-> T1 -> H > T1 -> T2 -> T3 -> T1 > > for loop Iniitializer: > > task = list_entry((head)->next, typeof(*task), u.tk_wait.list): > next = list_entry(task->u.tk_wait.list.next, typeof(*task), u.tk_wait.list) > > so task = T1, next = H. > > for loop test: > task != H. This is FALSE oops - got my TRUE/FALSE mixed ;) this is TRUE so execute body > > for loop body: > call rpc_wake_up_task_queue_locked > > H -> T2 -> H > T2 ->.T3 -> T2 > > for loop increment step > task = next; > next = list_entry(next->u.tk_wait.list.next, typeof(*next), u.tk_wait.list); > > so task = H, next = H. > > for loop test > > task != H This is TRUE and this is FALSE do don't execute body -->Andy > > so stop. > > Note that only T1 was processed - T2 and T3 are still on the queue. > > So list_for_each_entry_safe will not process any u.tk_wait.links tasks, because the next pointer is assigned prior to the call to rpc_wake_up_task_queue_locked, so the for loop increment (or initialization) setting of: > > task = next: > > does NOT pick up the fact that (in our example) T1 was REPLACED by T2, not just removed. > > On the other hand, the rpc_drain_queue() patch uses while(!list_empty()) instead of list_for_each_entry_safe() and happily processes all of the tasks on the queue: > > H -> T1 -> H > > lilst is not empty, so call rpc_wake_up_task_queue_locked. > > H -> T2 -> H > > list is not empty; so call rpc_wake_up _task_queue_locked. > > H -> T3 -> H > > list is not empty; so call rpc_wake_up_task_queue_locked > > list is empty. > > > -->Andy > > On Mar 16, 2012, at 11:25 AM, Adamson, Andy wrote: > >> >> On Mar 16, 2012, at 11:21 AM, Myklebust, Trond wrote: >> >>> On Fri, 2012-03-16 at 11:13 -0400, Andy Adamson wrote: >>>> On Thu, Mar 15, 2012 at 8:10 PM, Myklebust, Trond >>>> <Trond.Myklebust@xxxxxxxxxx> wrote: >>>>> On Thu, 2012-03-15 at 14:40 -0400, andros@xxxxxxxxxx wrote: >>>>>> From: Andy Adamson <andros@xxxxxxxxxx> >>>>>> >>>>>> Signed-off-by: Andy Adamson <andros@xxxxxxxxxx> >>>>>> --- >>>>>> include/linux/sunrpc/sched.h | 1 + >>>>>> net/sunrpc/sched.c | 27 +++++++++++++++++++++++++++ >>>>>> 2 files changed, 28 insertions(+), 0 deletions(-) >>>>>> >>>>>> diff --git a/include/linux/sunrpc/sched.h b/include/linux/sunrpc/sched.h >>>>>> index dc0c3cc..fce0873 100644 >>>>>> --- a/include/linux/sunrpc/sched.h >>>>>> +++ b/include/linux/sunrpc/sched.h >>>>>> @@ -235,6 +235,7 @@ void rpc_sleep_on_priority(struct rpc_wait_queue *, >>>>>> void rpc_wake_up_queued_task(struct rpc_wait_queue *, >>>>>> struct rpc_task *); >>>>>> void rpc_wake_up(struct rpc_wait_queue *); >>>>>> +void rpc_drain_queue(struct rpc_wait_queue *); >>>>>> struct rpc_task *rpc_wake_up_next(struct rpc_wait_queue *); >>>>>> struct rpc_task *rpc_wake_up_first(struct rpc_wait_queue *, >>>>>> bool (*)(struct rpc_task *, void *), >>>>>> diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c >>>>>> index 1c570a8..11928ff 100644 >>>>>> --- a/net/sunrpc/sched.c >>>>>> +++ b/net/sunrpc/sched.c >>>>>> @@ -551,6 +551,33 @@ void rpc_wake_up(struct rpc_wait_queue *queue) >>>>>> EXPORT_SYMBOL_GPL(rpc_wake_up); >>>>>> >>>>>> /** >>>>>> + * rpc_drain_queue - empty the queue and wake up all rpc_tasks >>>>>> + * @queue: rpc_wait_queue on which the tasks are sleeping >>>>>> + * >>>>>> + * Grabs queue->lock >>>>>> + */ >>>>>> +void rpc_drain_queue(struct rpc_wait_queue *queue) >>>>>> +{ >>>>>> + struct rpc_task *task; >>>>>> + struct list_head *head; >>>>>> + >>>>>> + spin_lock_bh(&queue->lock); >>>>>> + head = &queue->tasks[queue->maxpriority]; >>>>>> + for (;;) { >>>>>> + while (!list_empty(head)) { >>>>>> + task = list_entry(head->next, struct rpc_task, >>>>>> + u.tk_wait.list); >>>>>> + rpc_wake_up_task_queue_locked(queue, task); >>>>>> + } >>>>>> + if (head == &queue->tasks[0]) >>>>>> + break; >>>>>> + head--; >>>>>> + } >>>>>> + spin_unlock_bh(&queue->lock); >>>>>> +} >>>>>> +EXPORT_SYMBOL_GPL(rpc_drain_queue); >>>>>> + >>>>> >>>>> Confused... How is this function any different from rpc_wake_up()? >>>> >>>> Because it actually drains the queues where rpc_wake_up does not. See >>>> the attached output where I added the same printks to both >>>> rpc_drain_queue and rpc_wake_up. >>> >>> So you are seeing a bug in rpc_wake_up()? I'm surprised; a bug of that >>> magnitude should have caused a lot of hangs. Can you please look into >>> what is happening. >> >> OK. Didn't know it was a bug - just thought the comment was off... >> >> -->Andy >> >>> >>> -- >>> Trond Myklebust >>> Linux NFS client maintainer >>> >>> NetApp >>> Trond.Myklebust@xxxxxxxxxx >>> www.netapp.com >>> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html