Re: [PATCH 1/4] xfs: don't unconditionally null args->pag in xfs_bmap_btalloc_at_eof

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 27, 2023 at 03:48:59PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@xxxxxxxxxx>
> 
> xfs/170 on a filesystem with su=128k,sw=4 produces this splat:
> 
> BUG: kernel NULL pointer dereference, address: 0000000000000010
> #PF: supervisor write access in kernel mode
> #PF: error_code(0x0002) - not-present page
> PGD 0 P4D 0
> Oops: 0002 [#1] PREEMPT SMP
> CPU: 1 PID: 4022907 Comm: dd Tainted: G        W          6.3.0-xfsx #2 6ebeeffbe9577d32
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-bu
> RIP: 0010:xfs_perag_rele+0x10/0x70 [xfs]
> RSP: 0018:ffffc90001e43858 EFLAGS: 00010217
> RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000100
> RDX: ffffffffa054e717 RSI: 0000000000000005 RDI: 0000000000000000
> RBP: ffff888194eea000 R08: 0000000000000000 R09: 0000000000000037
> R10: ffff888100ac1cb0 R11: 0000000000000018 R12: 0000000000000000
> R13: ffffc90001e43a38 R14: ffff888194eea000 R15: ffff888194eea000
> FS:  00007f93d1a0e740(0000) GS:ffff88843fc80000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 0000000000000010 CR3: 000000018a34f000 CR4: 00000000003506e0
> Call Trace:
>  <TASK>
>  xfs_bmap_btalloc+0x1a7/0x5d0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  xfs_bmapi_allocate+0xee/0x470 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  xfs_bmapi_write+0x539/0x9e0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  xfs_iomap_write_direct+0x1bb/0x2b0 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  xfs_direct_write_iomap_begin+0x51c/0x710 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  iomap_iter+0x132/0x2f0
>  __iomap_dio_rw+0x2f8/0x840
>  iomap_dio_rw+0xe/0x30
>  xfs_file_dio_write_aligned+0xad/0x180 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  xfs_file_write_iter+0xfb/0x190 [xfs f85291d6841cbb3dc740083f1f331c0327394518]
>  vfs_write+0x2eb/0x410
>  ksys_write+0x65/0xe0
>  do_syscall_64+0x2b/0x80
> 
> This crash occurs under the "out_low_space" label.  We grabbed a perag
> reference, passed it via args->pag into xfs_bmap_btalloc_at_eof, and
> afterwards args->pag is NULL.  Fix the second function not to clobber
> args->pag if the caller had passed one in.
> 
> Fixes: 85843327094f ("xfs: factor xfs_bmap_btalloc()")
> Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> ---
>  fs/xfs/libxfs/xfs_bmap.c |    5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> 
> diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c
> index b512de0540d5..cd8870a16fd1 100644
> --- a/fs/xfs/libxfs/xfs_bmap.c
> +++ b/fs/xfs/libxfs/xfs_bmap.c
> @@ -3494,8 +3494,10 @@ xfs_bmap_btalloc_at_eof(
>  		if (!caller_pag)
>  			args->pag = xfs_perag_get(mp, XFS_FSB_TO_AGNO(mp, ap->blkno));
>  		error = xfs_alloc_vextent_exact_bno(args, ap->blkno);
> -		if (!caller_pag)
> +		if (!caller_pag) {
>  			xfs_perag_put(args->pag);
> +			args->pag = NULL;
> +		}
>  		if (error)
>  			return error;
>  
> @@ -3505,7 +3507,6 @@ xfs_bmap_btalloc_at_eof(
>  		 * Exact allocation failed. Reset to try an aligned allocation
>  		 * according to the original allocation specification.
>  		 */
> -		args->pag = NULL;
>  		args->alignment = stripe_align;
>  		args->minlen = nextminlen;
>  		args->minalignslop = 0;

Yup, that'll fix the problem.

Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>

FWIW, I'm working on some patches to take this further by always
providing a caller perag to this function.  That gets rid of all the
conditional code here and it gets rid of the only case where we call
xfs_alloc_vextent_near_bno() without args->pag set so the
conditional caller_pag code goes from there, too. This makes
xfs_alloc_vextent_{near,exact}_bno() identical except for one line
so they can be collapsed. And now that we always specify args->pag
for these functions, we can ignore the agno part of the target block
meaning we can select the perag to allocate from at a much higher
level via setting up args->pag when we initially scan the AGs to
set up args->minlen/maxlen based on the the nearest AG with 
contiguous free space large enough for the whole allocation. This
avoids attempting allocations that are guaranteed to fail before
we fall back to iterating from the target block agno and eventually
finding the same AG we originally found that had enough contiguous
free space in it for a maxlen allocation....

I think I can take it further an make both filestreams and the
normal allocator use the same "this ag" algorithm for EOF and
aligned allocation and then have them both fall back to the same
unaligned allocation attempts, which will then allow a bunch of code
to be collapsed in the xfs_bmap_btalloc() path...

-Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux