Hi Dave, Thanks for the review and comments. On Sat, 2011-11-19 at 12:14 +1100, Dave Chinner wrote: > On Fri, Nov 18, 2011 at 01:20:54PM -0600, Chandra Seetharaman wrote: > > l_reserveq and l_writeq maintains a list of processes waiting to get log > > space. Processes are supposed to get in the list when the amount of free > > space available in the log is less than what they need. > > > > When space becomes available current code, wakes up the processes, but > > expect the processes to remove themselves from the queue. > > > > Since the lock protecting the list is only acquired later by the woken > > up process, there is a window of time were a new process that is looking > > for space can wrongly get into the queue while there is enough space > > available. > > > > Since there is enough space available, this process can never be woken > > up, which leads to the hang of the process. > > Excellent work, Chandra. Thanks Dave. Made all the changes (except one as noted below). Testing the new code and will sent it soon. : : : <snip> > > @@ -2550,8 +2552,7 @@ redo: > > free_bytes = xlog_space_left(log, &log->l_grant_reserve_head); > > if (free_bytes < need_bytes) { > > spin_lock(&log->l_grant_reserve_lock); > > - if (list_empty(&tic->t_queue)) > > - list_add_tail(&tic->t_queue, &log->l_reserveq); > > + list_add_tail(&tic->t_queue, &log->l_reserveq); > > > > trace_xfs_log_grant_sleep2(log, tic); > > > > Ok, we now have the assumption that when we enter this code the > ticket is not on any queue at all. Can you add an > "ASSERT(list_empty(&tic->t_queue));" to the code before the above > xlog_space_left() call? That way we'll know if we violate that > assumption as potentially corrupt memory.... As Christoph pointed, we might have been woken up by some system event. So, I changed the code a bit around here. Please comment on the new code. <snip> _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs