Hi, On Mon, Nov 28, 2011 at 03:17:36AM -0500, Christoph Hellwig wrote: > Apply the scheme used in log_regrant_write_log_space to wake up any other > threads waiting for log space before the newly added one to > log_regrant_write_log_space as well, and factor the code into readable > helpers. For each of the queues we have add two helpers: > > - one to try to wake up all waiting threads. This helper will also be > usable by xfs_log_move_tail once we remove the current opportunistic > wakeups in it. > - one to sleep on t_wait until enough log space is available, loosely > modelled after Linux waitqueues. > > And use them to reimplement the guts of log_regrant_write_log_space and > log_regrant_write_log_space. These two function now use one and the same > algorithm for waiting on log space instead of subtly different ones before, > with an option to completely unify them in the near future. > > Also move the filesystem shutdown handling to the common caller given > that we had to touch it anyway. > > Based on hard debugging and an earlier patch from > Chandra Seetharaman <sekharan@xxxxxxxxxx>. > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> I'd like to make sure that I understand the race that Chandra debugged and reported. 2499 STATIC int 2500 xlog_grant_log_space(xlog_t *log, 2501 xlog_ticket_t *tic) 2502 { 2503 int free_bytes; 2504 int need_bytes; ... 2517 /* something is already sleeping; insert new transaction at end */ 2518 if (!list_empty_careful(&log->l_reserveq)) { 2519 spin_lock(&log->l_grant_reserve_lock); 2520 /* recheck the queue now we are locked */ 2521 if (list_empty(&log->l_reserveq)) { 2522 spin_unlock(&log->l_grant_reserve_lock); 2523 goto redo; 2524 } 2525 list_add_tail(&tic->t_queue, &log->l_reserveq); 2526 2527 trace_xfs_log_grant_sleep1(log, tic); 2528 2529 /* 2530 * Gotta check this before going to sleep, while we're 2531 * holding the grant lock. 2532 */ 2533 if (XLOG_FORCED_SHUTDOWN(log)) 2534 goto error_return; 2535 2536 XFS_STATS_INC(xs_sleep_logspace); 2537 xlog_wait(&tic->t_wait, &log->l_grant_reserve_lock); 2538 2539 /* 2540 * If we got an error, and the filesystem is shutting down, 2541 * we'll catch it down below. So just continue... 2542 */ 2543 trace_xfs_log_grant_wake1(log, tic); 2544 } 2545 2546 redo: 2547 if (XLOG_FORCED_SHUTDOWN(log)) 2548 goto error_return_unlocked; 2549 2550 free_bytes = xlog_space_left(log, &log->l_grant_reserve_head); 2551 if (free_bytes < need_bytes) { 2552 spin_lock(&log->l_grant_reserve_lock); 2553 if (list_empty(&tic->t_queue)) 2554 list_add_tail(&tic->t_queue, &log->l_reserveq); 2555 2556 trace_xfs_log_grant_sleep2(log, tic); 2557 2558 if (XLOG_FORCED_SHUTDOWN(log)) 2559 goto error_return; 2560 2561 xlog_grant_push_ail(log, need_bytes); 2562 2563 XFS_STATS_INC(xs_sleep_logspace); 2564 xlog_wait(&tic->t_wait, &log->l_grant_reserve_lock); 2565 2566 trace_xfs_log_grant_wake2(log, tic); 2567 goto redo; 2568 } 2569 2570 if (!list_empty(&tic->t_queue)) { 2571 spin_lock(&log->l_grant_reserve_lock); 2572 list_del_init(&tic->t_queue); 2573 spin_unlock(&log->l_grant_reserve_lock); 2574 } So the race that we're looking at here is: process A was added to the reserve queue at either 2525 or 2554 and, pushes the AIL at 2561, xfsaild frees up enough log space for process A (possibly B?), eventually xfs_log_move_tail is called to wake process A, process A wakes at line 2564, and he is on the reserveq already, process B sees that there are tickets on the queue at 2518 and gets the grant reserve lock at 2519, process A spins at 2571 waiting for the grant reserve lock, process B adds itself to the queue at 2525, process B drops the grant reserve lock and goes to sleep at 2537 process A takes the grant reserve lock at 2571 and removes it's ticket from the list. ...and there is nothing to wake process B until the ail is pushed by some other process. Is that about right? Thanks, Ben _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs