Re: [RFC PATCH] xfs_db: sanitize geometry on load

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 11, 2017 at 12:01:22PM +0200, Amir Goldstein wrote:
> On Wed, Jan 11, 2017 at 10:34 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > On Tue, Jan 10, 2017 at 9:42 PM, Darrick J. Wong
> > <darrick.wong@xxxxxxxxxx> wrote:
> >> xfs_db doesn't check the filesystem geometry when it's mounting, which
> >> means that garbage agcount values can cause OOMs when we try to allocate
> >> all the per-AG incore metadata.  If we see geometry that looks
> >> suspicious, try to derive the actual AG geometry to avoid crashing the
> >> system.  This should help with xfs/1301 fuzzing.
> >>
> >> Also fix up xfs_repair to use the min/max dblocks macros.
> >>
> >> Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> >
> > Test machine is back to health with this patch, but some test are failing due
> > to the new error messages.
> > I guess its no surprise to you.
> >
> >
> > xfs/1300 5s ... 5s
> > xfs/1301         - output mismatch (see
> > /home/amir/src/xfstests-dev/results//xfs/1301.out.bad)
> >     --- tests/xfs/1301.out      2017-01-08 15:35:07.647897359 +0200
> >     +++ /home/amir/src/xfstests-dev/results//xfs/1301.out.bad
> > 2017-01-11 09:58:10.981678272 +0200
> >     @@ -1,4 +1,61 @@
> >      QA output created by 1301
> >      Format and populate
> >      Fuzz superblock
> >     +xfs_db: device /dev/mapper/storage-scratch AG geometry is insane.
> > Using agcount=4.
> >     +SB sanity check failed
> >     +Metadata corruption detected at xfs_sb block 0x0/0x200
> >     +xfs_db: device /dev/mapper/storage-scratch AG geometry is insane.
> > Using agcount=4.
> >     ...
> >     (Run 'diff -u tests/xfs/1301.out
> > /home/amir/src/xfstests-dev/results//xfs/1301.out.bad'  to see the

Uh.... this is odd, all that stuff should go into 1301.full.

> > entire diff)
> > _check_xfs_filesystem: filesystem on /dev/mapper/storage-scratch is
> > inconsistent (c) (see
> > /home/amir/src/xfstests-dev/results//xfs/1301.full)
> > _check_xfs_filesystem: filesystem on /dev/mapper/storage-scratch is
> > inconsistent (r) (see
> > /home/amir/src/xfstests-dev/results//xfs/1301.full)
> > xfs/1302         - output mismatch (see
> > /home/amir/src/xfstests-dev/results//xfs/1302.out.bad)
> >     --- tests/xfs/1302.out      2017-01-08 15:35:07.647897359 +0200
> >     +++ /home/amir/src/xfstests-dev/results//xfs/1302.out.bad
> > 2017-01-11 10:05:16.710031113 +0200
> >     @@ -1,4 +1,26 @@
> >      QA output created by 1302
> >      Format and populate
> >      Fuzz AGF
> >     +Metadata corruption detected at xfs_agf block 0x1/0x200
> >     +xfs_db: cannot init perag data (117). Continuing anyway.
> >     +Metadata corruption detected at xfs_agf block 0x1/0x200
> >     +xfs_db: cannot init perag data (117). Continuing anyway.

I just ran 1302, all the output goes into 1302.full.

Now I wonder what's different with your setup than mine?

> >     ...
> >     (Run 'diff -u tests/xfs/1302.out
> > /home/amir/src/xfstests-dev/results//xfs/1302.out.bad'  to see the
> > entire diff)
> > _check_xfs_filesystem: filesystem on /dev/mapper/storage-scratch is
> > inconsistent (c) (see
> > /home/amir/src/xfstests-dev/results//xfs/1302.full)
> > xfs/1303 132s ... 130s
> > _check_xfs_filesystem: filesystem on /dev/mapper/storage-scratch is
> > inconsistent (c) (see
> > /home/amir/src/xfstests-dev/results//xfs/1303.full)
> > _check_xfs_filesystem: filesystem on /dev/mapper/storage-scratch is
> > inconsistent (r) (see
> > /home/amir/src/xfstests-dev/results//xfs/1303.full)
> > xfs/1304         - output mismatch (see
> > /home/amir/src/xfstests-dev/results//xfs/1304.out.bad)
> >     --- tests/xfs/1304.out      2017-01-08 15:35:07.647897359 +0200
> >     +++ /home/amir/src/xfstests-dev/results//xfs/1304.out.bad
> > 2017-01-11 10:12:26.506167776 +0200
> >     @@ -1,4 +1,12 @@
> >      QA output created by 1304
> >      Format and populate
> >      Fuzz AGI
> >     +Metadata corruption detected at xfs_agi block 0x2/0x200
> >     +xfs_db: cannot init perag data (117). Continuing anyway.
> >     +Metadata corruption detected at xfs_agi block 0x2/0x200
> >     +xfs_db: cannot init perag data (117). Continuing anyway.
> >     ...
> >     (Run 'diff -u tests/xfs/1304.out
> > /home/amir/src/xfstests-dev/results//xfs/1304.out.bad'  to see the
> > entire diff)
> > _check_xfs_filesystem: filesystem on /dev/mapper/storage-scratch is
> > inconsistent (c) (see
> > /home/amir/src/xfstests-dev/results//xfs/1304.full)
> > _check_xfs_filesystem: filesystem on /dev/mapper/storage-scratch is
> > inconsistent (r) (see
> > /home/amir/src/xfstests-dev/results//xfs/1304.full)
> > xfs/1305 224s ... 218s
> > _check_xfs_filesystem: filesystem on /dev/mapper/storage-scratch is
> > inconsistent (c) (see
> > /home/amir/src/xfstests-dev/results//xfs/1305.full)
> > _check_xfs_filesystem: filesystem on /dev/mapper/storage-scratch is
> > inconsistent (r) (see
> > /home/amir/src/xfstests-dev/results//xfs/1305.full)
> > xfs/1306 239s ... 234s
> 
> Now I am hitting these xfs_db crashes during xfs/1316, which are apparently not
> related to OOM killer. I have seen them last run as well but dmesg is quiet now.
> 
> xfs/1316        *** Error in `/usr/sbin/xfs_db': free(): invalid
> pointer: 0x00007f9dbf036b78 ***
> ======= Backtrace: =========
> /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f9dbecea725]
> /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f9dbecf2f4a]
> /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f9dbecf6abc]
> /usr/sbin/xfs_db[0x414961]
> /usr/sbin/xfs_db[0x4154de]
> /usr/sbin/xfs_db[0x420d38]
> /usr/sbin/xfs_db[0x420926]
> /usr/sbin/xfs_db[0x405125]
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f9dbec93830]
> /usr/sbin/xfs_db[0x405179]

Ok well I definitely don't see /this/ happening.  I gather you built
xfsprogs with the insane geometry patch; if so, against what git commit?
And, did the binary get installed as /usr/sbin/xfs_db, or is this just
the system xfs_db?

In any case, all that extra output is supposed to end up in
$seqres.full, not on stdout.  At most you should see admonishments about
scrub or repair failing to detect/fix things; those messages look like:

"offline repair failed (4) with $field = $fuzzverb"

--D

> ======= Memory map: ========
> 00400000-0049e000 r-xp 00000000 08:01 15995842
>   /usr/sbin/xfs_db
> 0069d000-0069e000 r--p 0009d000 08:01 15995842
>   /usr/sbin/xfs_db
> 0069e000-006a1000 rw-p 0009e000 08:01 15995842
>   /usr/sbin/xfs_db
> 006a1000-006b0000 rw-p 00000000 00:00 0
> 0119e000-011e0000 rw-p 00000000 00:00 0                                  [heap]
> 7f9db8000000-7f9db8021000 rw-p 00000000 00:00 0
> 7f9db8021000-7f9dbc000000 ---p 00000000 00:00 0
> 7f9dbea5d000-7f9dbea73000 r-xp 00000000 08:01 6033829
>   /lib/x86_64-linux-gnu/libgcc_s.so.1
> 7f9dbea73000-7f9dbec72000 ---p 00016000 08:01 6033829
>   /lib/x86_64-linux-gnu/libgcc_s.so.1
> 7f9dbec72000-7f9dbec73000 rw-p 00015000 08:01 6033829
>   /lib/x86_64-linux-gnu/libgcc_s.so.1
> 7f9dbec73000-7f9dbee33000 r-xp 00000000 08:01 6033791
>   /lib/x86_64-linux-gnu/libc-2.23.so
> 7f9dbee33000-7f9dbf032000 ---p 001c0000 08:01 6033791
>   /lib/x86_64-linux-gnu/libc-2.23.so
> 7f9dbf032000-7f9dbf036000 r--p 001bf000 08:01 6033791
>   /lib/x86_64-linux-gnu/libc-2.23.so
> 7f9dbf036000-7f9dbf038000 rw-p 001c3000 08:01 6033791
>   /lib/x86_64-linux-gnu/libc-2.23.so
> 7f9dbf038000-7f9dbf03c000 rw-p 00000000 00:00 0
> 7f9dbf03c000-7f9dbf054000 r-xp 00000000 08:01 6033937
>   /lib/x86_64-linux-gnu/libpthread-2.23.so
> 7f9dbf054000-7f9dbf253000 ---p 00018000 08:01 6033937
>   /lib/x86_64-linux-gnu/libpthread-2.23.so
> 7f9dbf253000-7f9dbf254000 r--p 00017000 08:01 6033937
>   /lib/x86_64-linux-gnu/libpthread-2.23.so
> 7f9dbf254000-7f9dbf255000 rw-p 00018000 08:01 6033937
>   /lib/x86_64-linux-gnu/libpthread-2.23.so
> 7f9dbf255000-7f9dbf259000 rw-p 00000000 00:00 0
> 7f9dbf259000-7f9dbf25d000 r-xp 00000000 08:01 6033975
>   /lib/x86_64-linux-gnu/libuuid.so.1.3.0
> 7f9dbf25d000-7f9dbf45c000 ---p 00004000 08:01 6033975
>   /lib/x86_64-linux-gnu/libuuid.so.1.3.0
> 7f9dbf45c000-7f9dbf45d000 r--p 00003000 08:01 6033975
>   /lib/x86_64-linux-gnu/libuuid.so.1.3.0
> 7f9dbf45d000-7f9dbf45e000 rw-p 00004000 08:01 6033975
>   /lib/x86_64-linux-gnu/libuuid.so.1.3.0
> 7f9dbf45e000-7f9dbf484000 r-xp 00000000 08:01 6033763
>   /lib/x86_64-linux-gnu/ld-2.23.so
> 7f9dbf667000-7f9dbf66b000 rw-p 00000000 00:00 0
> 7f9dbf680000-7f9dbf683000 rw-p 00000000 00:00 0
> 7f9dbf683000-7f9dbf684000 r--p 00025000 08:01 6033763
>   /lib/x86_64-linux-gnu/ld-2.23.so
> 7f9dbf684000-7f9dbf685000 rw-p 00026000 08:01 6033763
>   /lib/x86_64-linux-gnu/ld-2.23.so
> 7f9dbf685000-7f9dbf686000 rw-p 00000000 00:00 0
> 7ffdde2cb000-7ffdde2ed000 rw-p 00000000 00:00 0                          [stack]
> 7ffdde366000-7ffdde368000 r--p 00000000 00:00 0                          [vvar]
> 7ffdde368000-7ffdde36a000 r-xp 00000000 00:00 0                          [vdso]
> ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0
>   [vsyscall]
> *** Error in `/usr/sbin/xfs_db': free(): invalid pointer: 0x00007f202b108b78 ***
> 
> ...
> 
>  - output mismatch (see /home/amir/src/xfstests-dev/results//xfs/1316.out.bad)
>     --- tests/xfs/1316.out      2017-01-08 15:35:07.647897359 +0200
>     +++ /home/amir/src/xfstests-dev/results//xfs/1316.out.bad
> 2017-01-11 11:56:06.156948852 +0200
>     @@ -2,4 +2,20 @@
>      Format and populate
>      Find bmbt block
>      Fuzz bmbt
>     +./common/xfs: line 157: 19209 Aborted                 (core
> dumped) $XFS_DB_PROG "$@" $(_scratch_xfs_db_options)
>     +./common/xfs: line 157: 19219 Aborted                 (core
> dumped) $XFS_DB_PROG "$@" $(_scratch_xfs_db_options)
>     +./common/xfs: line 157: 19256 Aborted                 (core
> dumped) $XFS_DB_PROG "$@" $(_scratch_xfs_db_options)
>     +./common/xfs: line 157: 19264 Aborted                 (core
> dumped) $XFS_DB_PROG "$@" $(_scratch_xfs_db_options)
>     ...
>     (Run 'diff -u tests/xfs/1316.out
> /home/amir/src/xfstests-dev/results//xfs/1316.out.bad'  to see the
> entire diff)
> --
> To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux