Forgot to mention that the metadump from this repro is here: https://drive.google.com/file/d/0ByBy89zr3kJNeWdBQ1RFbmJESTg/edit?usp=sharing On Sun, Aug 24, 2014 at 12:20 PM, Alex Lyakas <alex@xxxxxxxxxxxxxxxxx> wrote: > Hi Brian, > > On Thu, Aug 21, 2014 at 10:18 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote: >> XFS log recovery builds up an xlog_recover object as it passes through >> the log operations on the physical log. These structures are managed via >> a hash table and are allocated when a new transaction is encountered and >> freed once a commit operation for the transaction is encountered. >> >> This state machine for active transactions is implemented by a >> combination of xlog_do_recovery_pass(), which walks through the log >> buffers and xlog_recover_process_data() which processes log operations >> within each buffer. The latter function decides whether to allocate a >> new xlog_recover, add to it or commit and ultimately free it. If an >> error occurs at any point during the lifecycle of a particular >> xlog_recover, xlog_recover_process_data() frees the object and returns >> an error. >> >> xlog_recover_commit_trans() handles the final processing of the >> transaction. It submits whatever I/O is required for the transaction and >> frees xlog_recover object along with the transaction items it tracks. If >> an error occurs at the final stages of the commit operation, such as I/O >> failure, both xlog_recover_commit_trans() and >> xlog_recover_process_data() attempt to free the trans object. >> >> Modify xlog_recover_commit_trans() to only free the trans object on >> successful completion of the trans, including any I/O errors that might >> occur when recovering the log. >> >> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> >> --- >> >> Hi all, >> >> I found that the recent buffer I/O rework fixes didn't address the crash >> reproduced by the dm-flakey/log recovery test case I posted recently. I >> tracked the crash down to this, which allows the test to pass. This >> addresses the crash I saw when running the reproducer manually with the >> metadump that Alex posted as well. >> >> FWIW, I also went back and tested the xfs_buf_iowait() experiment in >> both scenarios (Alex's metadump and xfstests test) and they all >> reproduce the same crash for me. I think that either I'm still not >> reproducing the original problem, something else might have contaminated >> the original xfs_buf_iowait() test to give a false positive, or >> something else entirely is going on. >> >> Alex, >> >> If you have a chance, I think it might be interesting to see whether you >> reproduce any problems with this patch. It looks like this is a >> regression introduced by: >> >> 2a84108f xfs: free the list of recovery items on error >> >> ... but I have no idea if that's in whatever kernel you're running. > I am running kernel 3.8.13 with some changes (published at > https://github.com/zadarastorage/zadara-xfs-pushback), but this > problem also happens on pristine 3.8.13 from > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git, > branch linux-stable-3.8.y. > > I do not have commit 2a84108f in this kernel. It was introduced in 3.14. > I applied your patch to 3.8.13, but it doesn't fix the issue. Same > problem happens when testing scenario that I described in > http://oss.sgi.com/pipermail/xfs/2014-August/037637.html. > > Thanks, > Alex. > >> >> Brian >> >> fs/xfs/xfs_log_recover.c | 11 ++++++++--- >> 1 file changed, 8 insertions(+), 3 deletions(-) >> >> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c >> index 176c4b3..daca9a6 100644 >> --- a/fs/xfs/xfs_log_recover.c >> +++ b/fs/xfs/xfs_log_recover.c >> @@ -3528,10 +3528,15 @@ out: >> if (!list_empty(&done_list)) >> list_splice_init(&done_list, &trans->r_itemq); >> >> - xlog_recover_free_trans(trans); >> - >> error2 = xfs_buf_delwri_submit(&buffer_list); >> - return error ? error : error2; >> + >> + if (!error) >> + error = error2; >> + /* caller frees trans on error */ >> + if (!error) >> + xlog_recover_free_trans(trans); >> + >> + return error; >> } >> >> STATIC int >> -- >> 1.8.3.1 >> _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs