On Sun, Aug 31, 2014 at 11:50:52AM +0300, Alex Lyakas wrote: > Hi Brian, Dave, > I tested this patch on kernel 3.16, top commit: > > commit 19583ca584d6f574384e17fe7613dfaeadcdc4a6 > Author: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Date: Sun Aug 3 15:25:02 2014 -0700 > > Linux 3.16 > > and, yes, it appears to fix the issue. > Thanks. That settles that then, I think. We're reproducing different problems on the 3.8 stable kernel vs. a recent kernel using the same test case. > Trouble is that our production kernel is 3.8.13, and we cannot upgrade to > mainline kernel easily. Question is whether we can expect some patch > suitable for our kernel, or, since our kernel is EOL and not a long-term > one, we cannot? > Dave wrote a patch specifically to resolve this problem on older kernels: http://oss.sgi.com/archives/xfs/2014-08/msg00204.html Brian > Thanks for your help, > Alex. > > > -----Original Message----- From: Brian Foster > Sent: 25 August, 2014 5:20 PM > To: Alex Lyakas > Cc: xfs@xxxxxxxxxxx ; Dave Chinner > Subject: Re: [PATCH] xfs: fix double free of trans in log recovery on I/O > error > > On Sun, Aug 24, 2014 at 12:20:20PM +0300, Alex Lyakas wrote: > >Hi Brian, > > > >On Thu, Aug 21, 2014 at 10:18 PM, Brian Foster <bfoster@xxxxxxxxxx> wrote: > >> XFS log recovery builds up an xlog_recover object as it passes through > >> the log operations on the physical log. These structures are managed via > >> a hash table and are allocated when a new transaction is encountered and > >> freed once a commit operation for the transaction is encountered. > >> > >> This state machine for active transactions is implemented by a > >> combination of xlog_do_recovery_pass(), which walks through the log > >> buffers and xlog_recover_process_data() which processes log operations > >> within each buffer. The latter function decides whether to allocate a > >> new xlog_recover, add to it or commit and ultimately free it. If an > >> error occurs at any point during the lifecycle of a particular > >> xlog_recover, xlog_recover_process_data() frees the object and returns > >> an error. > >> > >> xlog_recover_commit_trans() handles the final processing of the > >> transaction. It submits whatever I/O is required for the transaction and > >> frees xlog_recover object along with the transaction items it tracks. If > >> an error occurs at the final stages of the commit operation, such as I/O > >> failure, both xlog_recover_commit_trans() and > >> xlog_recover_process_data() attempt to free the trans object. > >> > >> Modify xlog_recover_commit_trans() to only free the trans object on > >> successful completion of the trans, including any I/O errors that might > >> occur when recovering the log. > >> > >> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > >> --- > >> > >> Hi all, > >> > >> I found that the recent buffer I/O rework fixes didn't address the crash > >> reproduced by the dm-flakey/log recovery test case I posted recently. I > >> tracked the crash down to this, which allows the test to pass. This > >> addresses the crash I saw when running the reproducer manually with the > >> metadump that Alex posted as well. > >> > >> FWIW, I also went back and tested the xfs_buf_iowait() experiment in > >> both scenarios (Alex's metadump and xfstests test) and they all > >> reproduce the same crash for me. I think that either I'm still not > >> reproducing the original problem, something else might have contaminated > >> the original xfs_buf_iowait() test to give a false positive, or > >> something else entirely is going on. > >> > >> Alex, > >> > >> If you have a chance, I think it might be interesting to see whether you > >> reproduce any problems with this patch. It looks like this is a > >> regression introduced by: > >> > >> 2a84108f xfs: free the list of recovery items on error > >> > >> ... but I have no idea if that's in whatever kernel you're running. > >I am running kernel 3.8.13 with some changes (published at > >https://github.com/zadarastorage/zadara-xfs-pushback), but this > >problem also happens on pristine 3.8.13 from > >git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git, > >branch linux-stable-3.8.y. > > > >I do not have commit 2a84108f in this kernel. It was introduced in 3.14. > >I applied your patch to 3.8.13, but it doesn't fix the issue. Same > >problem happens when testing scenario that I described in > >http://oss.sgi.com/pipermail/xfs/2014-August/037637.html. > > > > Ok, thanks. Yeah, I don't see the double free regression in the 3.8.13 > stable branch. I went back to that kernel to try and confirm some > things. I do reproduce the problem with your metadump as well as the > test case I put together. I tested Dave's buf hold across sync I/O patch > and that does indeed prevent the problem. > > For whatever reason, neither the test case nor your metadump reproduce > the same problem on latest kernels. Instead, they reproduce this double > free regression. I suspect this is what you ran into when you reproduced > on a more recent kernel. If you'd like, feel free to try and verify that > by running your reproducer again on a recent kernel with this patch and > see if you can still reproduce a crash as with the 3.8.13 kernel. > > Brian > > >Thanks, > >Alex. > > > >> > >> Brian > >> > >> fs/xfs/xfs_log_recover.c | 11 ++++++++--- > >> 1 file changed, 8 insertions(+), 3 deletions(-) > >> > >> diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > >> index 176c4b3..daca9a6 100644 > >> --- a/fs/xfs/xfs_log_recover.c > >> +++ b/fs/xfs/xfs_log_recover.c > >> @@ -3528,10 +3528,15 @@ out: > >> if (!list_empty(&done_list)) > >> list_splice_init(&done_list, &trans->r_itemq); > >> > >> - xlog_recover_free_trans(trans); > >> - > >> error2 = xfs_buf_delwri_submit(&buffer_list); > >> - return error ? error : error2; > >> + > >> + if (!error) > >> + error = error2; > >> + /* caller frees trans on error */ > >> + if (!error) > >> + xlog_recover_free_trans(trans); > >> + > >> + return error; > >> } > >> > >> STATIC int > >> -- > >> 1.8.3.1 > >> > _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs