On Thu 23-06-11 16:19:08, Moffett, Kyle D wrote: > On Jun 23, 2011, at 16:55, Sean Ryle wrote: > > Maybe I am wrong here, but shouldn't the cast be to (unsigned long) or to (sector_t)? > > > > Line 534 of commit.c: > > jbd_debug(4, "JBD: got buffer %llu (%p)\n", > > (unsigned long long)bh->b_blocknr, bh->b_data); > > No, that printk() is fine, the format string says "%llu" so the cast is > unsigned long long. > > Besides which, line 534 in the Debian 2.6.32 kernel I am using is this > one: > > J_ASSERT(commit_transaction->t_nr_buffers <= > commit_transaction->t_outstanding_credits); Hmm, OK, so we've used more metadata buffers than we told JBD2 to reserve. I suppose you are not using data=journal mode and the filesystem was created as ext4 (i.e. not converted from ext3), right? Are you using quotas? > If somebody can tell me what information would help to debug this I'd be > more than happy to throw a whole bunch of debug printks under that error > condition and try to trigger the crash with that. > > Alternatively I could remove that J_ASSERT() and instead add some debug > further down around the "commit_transaction->t_outstanding_credits--;" > to try to see exactly what IO it's handling when it runs out of credits. The trouble is that the problem is likely in some journal list shuffling code because if just some operation wrongly estimated the number of needed buffers, we'd fail the assertion in jbd2_journal_dirty_metadata(): J_ASSERT_JH(jh, handle->h_buffer_credits > 0); The patch below might catch the problem closer to the place where it happens... Also possibly you can try current kernel whether the bug happens with it or not. Honza -- Jan Kara <jack@xxxxxxx> SUSE Labs, CR
diff -rupX /crypted/home/jack/.kerndiffexclude linux-2.6.32-SLE11-SP1/fs/jbd2/transaction.c linux-2.6.32-SLE11-SP1-1-jbd2-credits-bug//fs/jbd2/transaction.c --- linux-2.6.32-SLE11-SP1/fs/jbd2/transaction.c 2011-06-23 23:01:55.600988795 +0200 +++ linux-2.6.32-SLE11-SP1-1-jbd2-credits-bug//fs/jbd2/transaction.c 2011-06-24 15:43:40.569213743 +0200 @@ -416,6 +416,7 @@ int jbd2_journal_restart(handle_t *handl spin_lock(&journal->j_state_lock); spin_lock(&transaction->t_handle_lock); transaction->t_outstanding_credits -= handle->h_buffer_credits; + WARN_ON(transaction->t_outstanding_credits < transaction->t_nr_buffers); transaction->t_updates--; if (!transaction->t_updates) @@ -1317,6 +1318,7 @@ int jbd2_journal_stop(handle_t *handle) spin_lock(&journal->j_state_lock); spin_lock(&transaction->t_handle_lock); transaction->t_outstanding_credits -= handle->h_buffer_credits; + WARN_ON(transaction->t_outstanding_credits < transaction->t_nr_buffers); transaction->t_updates--; if (!transaction->t_updates) { wake_up(&journal->j_wait_updates); @@ -1924,6 +1926,7 @@ void __jbd2_journal_file_buffer(struct j return; case BJ_Metadata: transaction->t_nr_buffers++; + WARN_ON(transaction->t_outstanding_credits < transaction->t_nr_buffers); list = &transaction->t_buffers; break; case BJ_Forget: