Re: [PATCH 1/2] xfs: fix up xfs_swap_extent_forks inline extent handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/7/16 8:57 AM, Brian Foster wrote:
> On Fri, Nov 04, 2016 at 09:24:16PM -0500, Eric Sandeen wrote:
>> There have been several reports over the years of NULL pointer
>> dereferences in xfs_trans_log_inode during xfs_fsr processes,
>> when the process is doing an fput and tearing down extents
>> on the temporary inode, something like:
>>
>> BUG: unable to handle kernel NULL pointer dereference at 0000000000000018
>> PID: 29439  TASK: ffff880550584fa0  CPU: 6   COMMAND: "xfs_fsr"
>>     [exception RIP: xfs_trans_log_inode+0x10]
>>  #9 [ffff8800a57bbbe0] xfs_bunmapi at ffffffffa037398e [xfs]
>> #10 [ffff8800a57bbce8] xfs_itruncate_extents at ffffffffa0391b29 [xfs]
>> #11 [ffff8800a57bbd88] xfs_inactive_truncate at ffffffffa0391d0c [xfs]
>> #12 [ffff8800a57bbdb8] xfs_inactive at ffffffffa0392508 [xfs]
>> #13 [ffff8800a57bbdd8] xfs_fs_evict_inode at ffffffffa035907e [xfs]
>> #14 [ffff8800a57bbe00] evict at ffffffff811e1b67
>> #15 [ffff8800a57bbe28] iput at ffffffff811e23a5
>> #16 [ffff8800a57bbe58] dentry_kill at ffffffff811dcfc8
>> #17 [ffff8800a57bbe88] dput at ffffffff811dd06c
>> #18 [ffff8800a57bbea8] __fput at ffffffff811c823b
>> #19 [ffff8800a57bbef0] ____fput at ffffffff811c846e
>> #20 [ffff8800a57bbf00] task_work_run at ffffffff81093b27
>> #21 [ffff8800a57bbf30] do_notify_resume at ffffffff81013b0c
>> #22 [ffff8800a57bbf50] int_signal at ffffffff8161405d
>>
>> As it turns out, this is because the i_itemp pointer, along
>> with the d_ops pointer, has been overwritten with zeros
>> when we tear down the extents during truncate.  When the in-core
>> inode fork on the temporary inode used by xfs_fsr was originally
>> set up during the extent swap, we mistakenly looked at di_nextents
>> to determine whether all extents fit inline, but this misses extents
>> generated by speculative preallocation; we should be using if_bytes
>> instead.
>>
> 
> Neat bug. :P The fix looks fine, but technically di_nextents doesn't
> include any delalloc extents, right? From taking a quick skim through
> the test case, it looks like the problematic situation is when the file
> flush doesn't end up converting post-eof delalloc (created via
> speculative prealloc) due to free space fragmentation.
> 
> So in other words, in most cases a file flush will convert all delalloc,
> including post-eof preallocation, by virtue of being part of an extent
> that is partly within file eof. In the fragmented free space situation,
> however, the file flush is not necessarily guaranteed to convert all
> delalloc blocks past eof, thus di_nextents is not consistent with the
> in-core inode state, and badness ensues...

To be honest I hadn't thought it through that far.  Could modify the
testcase to free up more contiguous space prior to the last flush, perhaps,
and see what happens.

-Eric

> If I'm following that correctly it would be nice to just clarify that a
> bit in the commit log. Otherwise looks good to me:
> 
> Reviewed-by: Brian Foster <bfoster@xxxxxxxxxx>
> 
>> This mistake corrupts the in-memory inode, and code in
>> xfs_iext_remove_inline eventually gets bad inputs, causing
>> it to memmove and memset incorrect ranges; this became apparent
>> because the two values in ifp->if_u2.if_inline_ext[1] contained
>> what should have been in d_ops and i_itemp; they were memmoved due
>> to incorrect array indexing and then the original locations
>> were zeroed with memset, again due to an array overrun.
>>
>> Fix this by properly using i_df.if_bytes to determine the number
>> of extents, not di_nextents.
>>
>> Thanks to dchinner for looking at this with me and spotting the
>> root cause.
>>
>> Cc: stable@xxxxxxxxxxxxxxx
>> Signed-off-by: Eric Sandeen <sandeen@xxxxxxxxxx>
>> ---
>>
>> diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c
>> index 552465e..47074e0 100644
>> --- a/fs/xfs/xfs_bmap_util.c
>> +++ b/fs/xfs/xfs_bmap_util.c
>> @@ -1792,6 +1792,7 @@
>>  	struct xfs_ifork	tempifp, *ifp, *tifp;
>>  	int			aforkblks = 0;
>>  	int			taforkblks = 0;
>> +	xfs_extnum_t		nextents;
>>  	__uint64_t		tmp;
>>  	int			error;
>>  
>> @@ -1881,7 +1882,8 @@
>>  		 * pointer.  Otherwise it's already NULL or
>>  		 * pointing to the extent.
>>  		 */
>> -		if (ip->i_d.di_nextents <= XFS_INLINE_EXTS) {
>> +		nextents = ip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
>> +		if (nextents <= XFS_INLINE_EXTS) {
>>  			ifp->if_u1.if_extents =
>>  				ifp->if_u2.if_inline_ext;
>>  		}
>> @@ -1900,7 +1902,8 @@
>>  		 * pointer.  Otherwise it's already NULL or
>>  		 * pointing to the extent.
>>  		 */
>> -		if (tip->i_d.di_nextents <= XFS_INLINE_EXTS) {
>> +		nextents = tip->i_df.if_bytes / (uint)sizeof(xfs_bmbt_rec_t);
>> +		if (nextents <= XFS_INLINE_EXTS) {
>>  			tifp->if_u1.if_extents =
>>  				tifp->if_u2.if_inline_ext;
>>  		}
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux