Re: [PATCH 3.10 090/180] xfs: xfs_iflush_cluster fails to abort on error

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 22 Aug 2016 14:21:08 +1000

On Sun, Aug 21, 2016 at 05:30:20PM +0200, Willy Tarreau wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
> 
> commit b1438f477934f5a4d5a44df26f3079a7575d5946 upstream.
> 
> When a failure due to an inode buffer occurs, the error handling
> fails to abort the inode writeback correctly. This can result in the
> inode being reclaimed whilst still in the AIL, leading to
> use-after-free situations as well as filesystems that cannot be
> unmounted as the inode log items left in the AIL never get removed.
> 
> Fix this by ensuring fatal errors from xfs_imap_to_bp() result in
> the inode flush being aborted correctly.
> 
> Reported-by: Shyam Kaushik <shyam@xxxxxxxxxxxxxxxxx>
> Diagnosed-by: Shyam Kaushik <shyam@xxxxxxxxxxxxxxxxx>
> Tested-by: Shyam Kaushik <shyam@xxxxxxxxxxxxxxxxx>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> Reviewed-by: Christoph Hellwig <hch@xxxxxx>
> Signed-off-by: Dave Chinner <david@xxxxxxxxxxxxx>
> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx>
> Signed-off-by: Willy Tarreau <w@xxxxxx>

....

> @@ -2768,14 +2768,22 @@ xfs_iflush(
>  	}
>  
>  	/*
> -	 * Get the buffer containing the on-disk inode.
> +	 * Get the buffer containing the on-disk inode. We are doing a try-lock
> +	 * operation here, so we may get  an EAGAIN error. In that case, we
> +	 * simply want to return with the inode still dirty.
> +	 *
> +	 * If we get any other error, we effectively have a corruption situation
> +	 * and we cannot flush the inode, so we treat it the same as failing
> +	 * xfs_iflush_int().
>  	 */
>  	error = xfs_imap_to_bp(mp, NULL, &ip->i_imap, &dip, &bp, XBF_TRYLOCK,
>  			       0);
> -	if (error || !bp) {
> +	if (error == -EAGAIN) {

Wrong. Errors changed sign in XFS in 3.17.

/rant

So, after just having to point this out (again!) for a different
stable kernel patchset review, and this specific problem causing
user-reported stable kernel regression and filesystem corruption
*months ago*. That resulted in discussion and new stable commits to
fix the problem. So now I'm left to wonder about the process of
stable kernels.

AFAICT, stable kernel maintainers are not watching what happens with
other stable kernels, nor are they talking to other stable kernel
maintainers. I should not have to tell every single stable kernel
maintainer that a specific patch needs to be changed after it's
already been reported broken, triaged and fixed in other stable
kernels. You've all got a record that the patch needs to be included
in a stable kernel, but nobody is seems to notice when it comes to
fixing problems with a stable patch even when that all happens on
stable@xxxxxxxxxxxxxxx.

Seriously, guys, pick up your act a bit and start talking between
yourselvesi and tracking regressions and fixes so the burden of
catching known reported and fixed problems with backports doesn't
rely on the upstream developers noticing the problem when hundreds
of patches for random stable kernels go past on lkml every week...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html