On Sun, Aug 21, 2016 at 05:30:20PM +0200, Willy Tarreau wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > commit b1438f477934f5a4d5a44df26f3079a7575d5946 upstream. > > When a failure due to an inode buffer occurs, the error handling > fails to abort the inode writeback correctly. This can result in the > inode being reclaimed whilst still in the AIL, leading to > use-after-free situations as well as filesystems that cannot be > unmounted as the inode log items left in the AIL never get removed. > > Fix this by ensuring fatal errors from xfs_imap_to_bp() result in > the inode flush being aborted correctly. > > Reported-by: Shyam Kaushik <shyam@xxxxxxxxxxxxxxxxx> > Diagnosed-by: Shyam Kaushik <shyam@xxxxxxxxxxxxxxxxx> > Tested-by: Shyam Kaushik <shyam@xxxxxxxxxxxxxxxxx> > Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> > Reviewed-by: Christoph Hellwig <hch@xxxxxx> > Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx> > Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> > Signed-off-by: Willy Tarreau <w@xxxxxx> .... > @@ -2768,14 +2768,22 @@ xfs_iflush( > } > > /* > - * Get the buffer containing the on-disk inode. > + * Get the buffer containing the on-disk inode. We are doing a try-lock > + * operation here, so we may get an EAGAIN error. In that case, we > + * simply want to return with the inode still dirty. > + * > + * If we get any other error, we effectively have a corruption situation > + * and we cannot flush the inode, so we treat it the same as failing > + * xfs_iflush_int(). > */ > error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK, > 0); > - if (error || !bp) { > + if (error == -EAGAIN) { Wrong. Errors changed sign in XFS in 3.17. /rant So, after just having to point this out (again!) for a different stable kernel patchset review, and this specific problem causing user-reported stable kernel regression and filesystem corruption *months ago*. That resulted in discussion and new stable commits to fix the problem. So now I'm left to wonder about the process of stable kernels. AFAICT, stable kernel maintainers are not watching what happens with other stable kernels, nor are they talking to other stable kernel maintainers. I should not have to tell every single stable kernel maintainer that a specific patch needs to be changed after it's already been reported broken, triaged and fixed in other stable kernels. You've all got a record that the patch needs to be included in a stable kernel, but nobody is seems to notice when it comes to fixing problems with a stable patch even when that all happens on stable@xxxxxxxxxxxxxxx. Seriously, guys, pick up your act a bit and start talking between yourselvesi and tracking regressions and fixes so the burden of catching known reported and fixed problems with backports doesn't rely on the upstream developers noticing the problem when hundreds of patches for random stable kernels go past on lkml every week... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html