Re: [PATCH 1/2] xfs: fix up xfs_swap_extent_forks inline extent handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 04, 2016 at 09:24:16PM -0500, Eric Sandeen wrote:
> There have been several reports over the years of NULL pointer
> dereferences in xfs_trans_log_inode during xfs_fsr processes,
> when the process is doing an fput and tearing down extents
> on the temporary inode, something like:
> 
> BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
> PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
>     [exception RIP: xfs_trans_log_inode+0x10]
>  #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
> #10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
> #11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
> #12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
> #13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
> #14 [ffff8800a57bbe00] evict at ffffffff811e1b67
> #15 [ffff8800a57bbe28] iput at ffffffff811e23a5
> #16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
> #17 [ffff8800a57bbe88] dput at ffffffff811dd06c
> #18 [ffff8800a57bbea8] __fput at ffffffff811c823b
> #19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
> #20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
> #21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
> #22 [ffff8800a57bbf50] int_signal at ffffffff8161405d
> 
> As it turns out, this is because the i_itemp pointer, along
> with the d_ops pointer, has been overwritten with zeros
> when we tear down the extents during truncate.  When the in-core
> inode fork on the temporary inode used by xfs_fsr was originally
> set up during the extent swap, we mistakenly looked at di_nextents
> to determine whether all extents fit inline, but this misses extents
> generated by speculative preallocation; we should be using if_bytes
> instead.
> 

Neat bug. :P The fix looks fine, but technically di_nextents doesn't
include any delalloc extents, right? From taking a quick skim through
the test case, it looks like the problematic situation is when the file
flush doesn't end up converting post-eof delalloc (created via
speculative prealloc) due to free space fragmentation.

So in other words, in most cases a file flush will convert all delalloc,
including post-eof preallocation, by virtue of being part of an extent
that is partly within file eof. In the fragmented free space situation,
however, the file flush is not necessarily guaranteed to convert all
delalloc blocks past eof, thus di_nextents is not consistent with the
in-core inode state, and badness ensues...

If I'm following that correctly it would be nice to just clarify that a
bit in the commit log. Otherwise looks good to me:

Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>

> This mistake corrupts the in-memory inode, and code in
> xfs_iext_remove_inline eventually gets bad inputs, causing
> it to memmove and memset incorrect ranges; this became apparent
> because the two values in ifp->if_u2.if_inline_ext[1] contained
> what should have been in d_ops and i_itemp; they were memmoved due
> to incorrect array indexing and then the original locations
> were zeroed with memset, again due to an array overrun.
> 
> Fix this by properly using i_df.if_bytes to determine the number
> of extents, not di_nextents.
> 
> Thanks to dchinner for looking at this with me and spotting the
> root cause.
> 
> Cc: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> ---
> 
> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
> index 552465e..47074e0 100644
> --- a/fs/xfs/xfs_bmap_util.c
> +++ b/fs/xfs/xfs_bmap_util.c
> @@ -1792,6 +1792,7 @@
>  	struct xfs_ifork	tempifp, *ifp, *tifp;
>  	int			aforkblks = 0;
>  	int			taforkblks = 0;
> +	xfs_extnum_t		nextents;
>  	__uint64_t		tmp;
>  	int			error;
>  
> @@ -1881,7 +1882,8 @@
>  		 * pointer.  Otherwise it's already NULL or
>  		 * pointing to the extent.
>  		 */
> -		if (ip->i_d.di_nextents <= XFS_INLINE_EXTS) {
> +		nextents = ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
> +		if (nextents <= XFS_INLINE_EXTS) {
>  			ifp->if_u1.if_extents =
>  				ifp->if_u2.if_inline_ext;
>  		}
> @@ -1900,7 +1902,8 @@
>  		 * pointer.  Otherwise it's already NULL or
>  		 * pointing to the extent.
>  		 */
> -		if (tip->i_d.di_nextents <= XFS_INLINE_EXTS) {
> +		nextents = tip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
> +		if (nextents <= XFS_INLINE_EXTS) {
>  			tifp->if_u1.if_extents =
>  				tifp->if_u2.if_inline_ext;
>  		}
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux