On Sun, Nov 20, 2011 at 07:22:35PM +1100, Dave Chinner wrote: > Be nice to factor these into a single function - they do exactly the > same thing except for the queue they walk and the need_bytes > calculation. I thought about it, and I think we can do it as a broader refactoring of the two list/lock/head tuples for the reserve/grant queues. But I'd rather do a simple minimal patch that can easily be backported first. > > > + > > void > > xfs_log_move_tail(xfs_mount_t *mp, > > xfs_lsn_t tail_lsn) > > Also, wouldn't it be a good idea to use these helpers in > xfs_log_move_tail() and remove the code duplication from there as > well? i.e. the four places we do this open coded wakeup walk could > be replaced with a single implementation.... Yes, I even hinted at that in the changelog. To make it not ugly it requires removing the opportunistic wakers. Fortunately I already have patches for that, which I plan to sumit for 3.3. We an easily do that factoring as part of that patchset. > Same with this function and xlog_writeq_wait - they are identical > code just operating on a different grant head. If we had > > struct xlog_grant_head { > spinlock_t lock; > struct list_head queue; > atomic64_t head; > }; > > Rather than the current open-coded definitions, then this could all > share the same code rather than continuing to duplicate it. If we've > got to restructure the code to fix the bug, it makes sense to me to > remove the code duplication as well... I agree. But let's fix the issue first and to the cleanups later. > ..... > > free_bytes = xlog_space_left(log, &log->l_grant_reserve_head); > > + if (!list_empty_careful(&log->l_reserveq)) { > > spin_lock(&log->l_grant_reserve_lock); > ..... > > + if (!xlog_wake_reserveq(log, free_bytes)) > > + error = xlog_reserveq_wait(log, tic, need_bytes); > > + spin_unlock(&log->l_grant_reserve_lock); > > + } else if (free_bytes < need_bytes) { > > spin_lock(&log->l_grant_reserve_lock); > > - list_del_init(&tic->t_queue); > > + error = xlog_reserveq_wait(log, tic, need_bytes); > > spin_unlock(&log->l_grant_reserve_lock); > > } > > I don't think this logic is quite correct. When we have a waiter on > the queue, we try to wake all the waiters before sleeping if that > did not succeed. If we woke everyone, we allow this process to > continue. The problem is that we don't check to see if there is > space remaining for this process to continue after doing all the > wakeups. > > i.e. if need_bytes = 10000 and free_bytes = 50000, and the waiters > on the queue total 45000 bytes, then we fall through with free_bytes > = 5000 and need_bytes = 10000. In that case, we should have gone to > sleep because we are 5000 bytes short of our required space but we > don't and potentially deadlock due to oversubscribing the log space. > > IOWs, xlog_wake_reserveq() needs to return the amount of remaining > free space after the wakeups so we can then run the if (free_bytes < > need_bytes) check unconditionally.... Indeed. I think the better way to do it is to make the free_bytes in xlog_wake_reserveq a pointer, so that the caller sees the update and then do: free_bytes = xlog_space_left(log, &log->l_grant_reserve_head); if (!list_empty_careful(&log->l_reserveq)) { spin_lock(&log->l_grant_reserve_lock); if (!xlog_wake_reserveq(log, &free_bytes) || free_bytes < need_bytes) error = xlog_reserveq_wait(log, tic, need_bytes); spin_unlock(&log->l_grant_reserve_lock); } else if (free_bytes < need_bytes) { spin_lock(&log->l_grant_reserve_lock); list_del_init(&tic->t_queue); error = xlog_reserveq_wait(log, tic, need_bytes); spin_unlock(&log->l_grant_reserve_lock); } _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs