On Mon, Sep 18, 2023 at 01:05:09PM +0200, Hannes Reinecke wrote: > We now support block sizes larger than PAGE_SIZE, so this > check is pointless. > > Signed-off-by: Hannes Reinecke <hare@xxxxxxx> > --- > fs/xfs/xfs_super.c | 12 ------------ > 1 file changed, 12 deletions(-) > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index 1f77014c6e1a..67dcdd4dcf2d 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -1651,18 +1651,6 @@ xfs_fs_fill_super( > goto out_free_sb; > } > > - /* > - * Until this is fixed only page-sized or smaller data blocks work. > - */ > - if (mp->m_sb.sb_blocksize > PAGE_SIZE) { > - xfs_warn(mp, > - "File system with blocksize %d bytes. " > - "Only pagesize (%ld) or less will currently work.", > - mp->m_sb.sb_blocksize, PAGE_SIZE); > - error = -ENOSYS; > - goto out_free_sb; > - } This really needs to be replaced with an EXPERIMENTAL warning - we're not going to support these LBS configurations until we are sure it doesn't eat data. Anyway, smoke tests.... # mkfs.xfs -f -b size=64k /dev/pmem0 meta-data=/dev/pmem0 isize=512 agcount=4, agsize=32768 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=1 bigtime=1 inobtcount=1 nrext64=0 data = bsize=65536 blocks=131072, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=65536 ascii-ci=0, ftype=1 log =internal log bsize=65536 blocks=1024, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=65536 blocks=0, rtextents=0 # mount /dev/pmem0 /mnt/test Message from syslogd@test3 at Sep 20 11:23:32 ... kernel:[ 73.521819] XFS: Assertion failed: PAGE_SHIFT >= sbp->sb_blocklog, file: fs/xfs/xfs_mount.c, line: 134 Message from syslogd@test3 at Sep 20 11:23:32 ... kernel:[ 73.521819] XFS: Assertion failed: PAGE_SHIFT >= sbp->sb_blocklog, file: fs/xfs/xfs_mount.c, line: 134 Segmentation fault # Looks like this hasn't been tested with CONFIG_XFS_DEBUG=y. If that's the case, I expect that none of this actually works... :/ I've attached a patch at the end of the email that allows XFS filesystems to mount with debug enabled. Next problem, filesystem created with a 32kB sector size: # mkfs.xfs -f -b size=64k -s size=32k /dev/pmem0 meta-data=/dev/pmem0 isize=512 agcount=4, agsize=32768 blks = sectsz=32768 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=1 = reflink=1 bigtime=1 inobtcount=1 nrext64=0 data = bsize=65536 blocks=131072, imaxpct=25 = sunit=0 swidth=0 blks naming =version 2 bsize=65536 ascii-ci=0, ftype=1 log =internal log bsize=65536 blocks=2709, version=2 = sectsz=32768 sunit=1 blks, lazy-count=1 realtime =none extsz=65536 blocks=0, rtextents=0 # and then running xfs_db on it to change the UUID: # xfs_admin -U generate /dev/pmem0 Results in a kernel panic: [ 132.151886] XFS (pmem0): Mounting V5 Filesystem 3d96f860-2aa2-4e50-970c-134508b7954a [ 132.161673] XFS (pmem0): Ending clean mount [ 175.824015] XFS (pmem0): Unmounting Filesystem 3d96f860-2aa2-4e50-970c-134508b7954a [ 185.759251] Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: do_mpage_readpage+0x7e5/0x7f0 [ 185.766632] CPU: 1 PID: 4383 Comm: xfs_db Not tainted 6.6.0-rc2-dgc+ #1903 [ 185.771882] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 185.778827] Call Trace: [ 185.780706] <TASK> [ 185.782318] dump_stack_lvl+0x37/0x50 [ 185.785069] dump_stack+0x10/0x20 [ 185.787557] panic+0x15b/0x300 [ 185.789763] ? do_mpage_readpage+0x7e5/0x7f0 [ 185.792705] __stack_chk_fail+0x14/0x20 [ 185.795183] do_mpage_readpage+0x7e5/0x7f0 [ 185.797826] ? blkdev_write_begin+0x30/0x30 [ 185.800500] ? blkdev_readahead+0x15/0x20 [ 185.802894] ? read_pages+0x5c/0x230 [ 185.805023] ? page_cache_ra_order+0x2ae/0x310 [ 185.807538] ? ondemand_readahead+0x1f1/0x3a0 [ 185.809899] ? page_cache_async_ra+0x26/0x30 [ 185.812175] ? filemap_get_pages+0x540/0x6d0 [ 185.814327] ? _copy_to_iter+0x65/0x4c0 [ 185.816283] ? filemap_read+0xfc/0x3a0 [ 185.818086] ? __fsnotify_parent+0x107/0x340 [ 185.820142] ? __might_sleep+0x42/0x70 [ 185.821854] ? blkdev_read_iter+0x6d/0x150 [ 185.823697] ? vfs_read+0x1b1/0x300 [ 185.825307] ? __x64_sys_pread64+0x8f/0xc0 [ 185.827159] ? irqentry_exit+0x33/0x40 [ 185.828858] ? do_syscall_64+0x35/0x80 [ 185.830485] ? entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 185.832677] </TASK> [ 185.834068] Kernel Offset: disabled [ 185.835510] ---[ end Kernel panic - not syncing: stack-protector: Kernel stack is corrupted in: do_mpage_readpage+0x7e5/0x7f0 ]--- Probably related to the block device having a 32kB block size set on the pmem device by xfs_db when it only really has a 4kB sector size.... Anyway, back to just using 64k fsb, 4k sector. generic/001 ASSERT fails immediately with: [ 111.785796] run fstests generic/001 at 2023-09-20 11:50:19 [ 113.346797] XFS: Assertion failed: imap.br_startblock != DELAYSTARTBLOCK, file: fs/xfs/xfs_reflink.c, line: 1392 [ 113.352512] ------------[ cut here ]------------ [ 113.354793] kernel BUG at fs/xfs/xfs_message.c:102! [ 113.358444] invalid opcode: 0000 [#1] PREEMPT SMP [ 113.360769] CPU: 8 PID: 7581 Comm: cp Not tainted 6.6.0-rc2-dgc+ #1903 [ 113.364183] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014 [ 113.369784] RIP: 0010:assfail+0x35/0x40 [ 113.372038] Code: c9 48 c7 c2 00 d8 6c 82 48 89 e5 48 89 f1 48 89 fe 48 c7 c7 d8 d3 60 82 e8 a8 fd ff ff 80 3d d1 36 ec 02 00 75 04 0f 0b 5d c3 <0f> 0b 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 63 f6 49 89 [ 113.384178] RSP: 0018:ffffc9000962bca0 EFLAGS: 00010202 [ 113.387467] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 000000007fffffff [ 113.392326] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff8260d3d8 [ 113.397235] RBP: ffffc9000962bca0 R08: 0000000000000000 R09: 000000000000000a [ 113.401449] R10: 000000000000000a R11: 0fffffffffffffff R12: 0000000000000000 [ 113.406312] R13: 00000000ffffff8b R14: 0000000000000000 R15: ffff88810de00f00 [ 113.410562] FS: 00007f4f6219b500(0000) GS:ffff8885fec00000(0000) knlGS:0000000000000000 [ 113.415506] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 113.418688] CR2: 0000564951bfcf50 CR3: 00000005cd6b8005 CR4: 0000000000060ee0 [ 113.422825] Call Trace: [ 113.424302] <TASK> [ 113.425566] ? show_regs+0x61/0x70 [ 113.427444] ? die+0x37/0x90 [ 113.429175] ? do_trap+0xec/0x100 [ 113.431016] ? do_error_trap+0x6c/0x90 [ 113.432951] ? assfail+0x35/0x40 [ 113.434677] ? exc_invalid_op+0x52/0x70 [ 113.436755] ? assfail+0x35/0x40 [ 113.438659] ? asm_exc_invalid_op+0x1b/0x20 [ 113.440864] ? assfail+0x35/0x40 [ 113.442671] xfs_reflink_remap_blocks+0x197/0x350 [ 113.445278] xfs_file_remap_range+0xf3/0x320 [ 113.447504] do_clone_file_range+0xfe/0x2b0 [ 113.449689] vfs_clone_file_range+0x3f/0x150 [ 113.452080] ioctl_file_clone+0x52/0xa0 [ 113.453600] do_vfs_ioctl+0x485/0x8d0 [ 113.455054] ? selinux_file_ioctl+0x96/0x120 [ 113.456637] ? selinux_file_ioctl+0x96/0x120 [ 113.458213] __x64_sys_ioctl+0x73/0xd0 [ 113.459598] do_syscall_64+0x35/0x80 [ 113.460840] entry_SYSCALL_64_after_hwframe+0x63/0xcd Problems with unexpected extent types in the reflink remap code. This is due to the reflink operation finding a delalloc extent where it should be finding a real extent. this implies the xfs_flush_unmap_range() call in xfs_reflink_remap_prep() didn't flush the full data range it was supposed to. xfs_flush_unmap_range() is supposed to round the range out to: rounding = max_t(xfs_off_t, mp->m_sb.sb_blocksize, PAGE_SIZE); the block size for these cases, so it is LBS aware. So maybe there's a problem with filemap_write_and_wait_range() and/or truncate_pagecache_range() when dealing with LBS enabled page cache? So, yeah, debug XFS builds tell us straight away that important stuff does not appear to be not working correctly. Debug really needs to be enabled, otherwise silent data corruption situations like this can go unnoticed by fstests tests... -Dave -- Dave Chinner david@xxxxxxxxxxxxx xfs: fix page cache size validation for LBS configuations From: Dave Chinner <dchinner@xxxxxxxxxx> The page cache can index larger filesystem sizes on 32 bit systems if large block sizes are enabled. Fix this code to take these configurations into account. Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx> --- fs/xfs/xfs_mount.c | 10 +++++++--- 1 file changed, 7 insertions(+), 3 deletions(-) diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c index 0a0fd19573d8..35dfcbc1e576 100644 --- a/fs/xfs/xfs_mount.c +++ b/fs/xfs/xfs_mount.c @@ -131,11 +131,15 @@ xfs_sb_validate_fsb_count( xfs_sb_t *sbp, uint64_t nblocks) { - ASSERT(PAGE_SHIFT >= sbp->sb_blocklog); - ASSERT(sbp->sb_blocklog >= BBSHIFT); + ASSERT(sbp->sb_blocklog <= XFS_MAX_BLOCKSIZE_LOG); + ASSERT(sbp->sb_blocklog >= XFS_MIN_BLOCKSIZE_LOG); /* Limited by ULONG_MAX of page cache index */ - if (nblocks >> (PAGE_SHIFT - sbp->sb_blocklog) > ULONG_MAX) + if (PAGE_SHIFT >= sbp->sb_blocklog) + nblocks >>= (PAGE_SHIFT - sbp->sb_blocklog); + else + nblocks <<= (sbp->sb_blocklog - PAGE_SHIFT); + if (nblocks > ULONG_MAX) return -EFBIG; return 0; }