XFS metadata CRC errors on zram block device on ppc64le architecture

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



In Fedora CoreOS we found an issue with an interaction of an XFS filesystem on a zram block device on ppc64le:

- https://github.com/coreos/fedora-coreos-tracker/issues/1489
- https://bugzilla.redhat.com/show_bug.cgi?id=2221314

The dmesg output shows several errors:

```
[ 3247.206007] XFS (zram0): Mounting V5 Filesystem 0b7d6149-614c-4f4c-9a1f-a80a9810f58f
[ 3247.210781] XFS (zram0): Metadata CRC error detected at xfs_agf_read_verify+0x108/0x150 [xfs], xfs_agf block 0x80008 
[ 3247.211121] XFS (zram0): Unmount and run xfs_repair
[ 3247.211198] XFS (zram0): First 128 bytes of corrupted metadata buffer:
[ 3247.211293] 00000000: fe ed ba be 00 00 00 00 00 00 00 02 00 00 00 00 ................
[ 3247.211405] 00000010: 00 00 00 00 00 00 00 18 00 00 00 01 00 00 00 00  ................
[ 3247.211515] 00000020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.211625] 00000030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.211735] 00000040: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.211842] 00000050: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.211951] 00000060: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.212063] 00000070: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[ 3247.212171] XFS (zram0): metadata I/O error in "xfs_read_agf+0xb4/0x180 [xfs]" at daddr 0x80008 len 8 error 74
[ 3247.212485] XFS (zram0): Error -117 reserving per-AG metadata reserve pool.
[ 3247.212497] XFS (zram0): Corruption of in-memory data (0x8) detected at xfs_fs_reserve_ag_blocks+0x1e0/0x220 [xfs] (fs/xfs/xfs_fsops.c:587).  Shutting down filesystem.
[ 3247.212828] XFS (zram0): Please unmount the filesystem and rectify the problem(s)
[ 3247.212943] XFS (zram0): Ending clean mount
[ 3247.212970] XFS (zram0): Error -5 reserving per-AG metadata reserve pool.
```

The issue can be reproduced easily with a simple script:

```
[root@p8 ~]# cat test.sh 
#!/bin/bash
set -eux -o pipefail
modprobe zram num_devices=0
read dev < /sys/class/zram-control/hot_add
echo 10G > /sys/block/zram"${dev}"/disksize
mkfs.xfs /dev/zram"${dev}"
mkdir -p /tmp/foo
mount -t xfs /dev/zram"${dev}" /tmp/foo
```

We ran a kernel bisect and narrowed it down to offending commit af8b04c6:

```
[root@ibm-p8-kvm-03-guest-02 linux]# git bisect good
af8b04c63708fa730c0257084fab91fb2a9cecc4 is the first bad commit
commit af8b04c63708fa730c0257084fab91fb2a9cecc4
Author: Christoph Hellwig <hch@xxxxxx>
Date:   Tue Apr 11 19:14:46 2023 +0200

    zram: simplify bvec iteration in __zram_make_request
    
    bio_for_each_segment synthetize bvecs that never cross page boundaries, so
    don't duplicate that work in an inner loop.
    
    Link: https://lkml.kernel.org/r/20230411171459.567614-5-hch@xxxxxx
    Signed-off-by: Christoph Hellwig <hch@xxxxxx>
    Reviewed-by: Sergey Senozhatsky <senozhatsky@xxxxxxxxxxxx>
    Acked-by: Minchan Kim <minchan@xxxxxxxxxx>
    Cc: Jens Axboe <axboe@xxxxxxxxx>
    Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>

 drivers/block/zram/zram_drv.c | 42 +++++++++++-------------------------------
 1 file changed, 11 insertions(+), 31 deletions(-)
```

Any ideas on how to fix the problem?

Thanks!
Dusty



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux