Re: [PATCH v4 00/47] xfs: online scrub/repair support

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Thu, 12 Jan 2017 12:10:55 -0800

On Thu, Jan 12, 2017 at 07:18:05PM +0200, Amir Goldstein wrote:
> On Tue, Jan 10, 2017 at 10:42 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> > On Tue, Jan 10, 2017 at 10:13 AM, Amir Goldstein <amir73il@xxxxxxxxx> wrote:
> >> On Tue, Jan 10, 2017 at 9:54 AM, Eryu Guan <eguan@xxxxxxxxxx> wrote:
> >>> On Mon, Jan 09, 2017 at 01:15:40PM -0800, Darrick J. Wong wrote:
> ...
> >>>>
> >>>> All the tests?  The full dmesg output would be useful to narrow it down to
> >>>> a specific xfstest number, field name, and fuzz verb.  I'm running them
> >>>
> >>
> >> In my case, yes, most of the test (51 out of 65) failed due to
> >> some sort of crash, but the entire system is so unstable due to all the OOM
> >> killing that the entire dmesg output is a big mess.
> >>
> >> I'll rerun only 1301 to send my logs.
> >>
> >>
> >
> > See attached:
> >
> > 1. full results of first run of ./check -g dangerous_scrub,scrub
> >     with TEST_XFS_SCRUB=1
> >
> > 2. dmesg from the same run (51 out of 65 failed)
> >
> > 3. dmesg from rerun of few selected tests without TEST_XFS_SCRUB=1
> >     (all tests failed)
> 
> Darrick,
> 
> Before I am heading home for the weekend, here is another dump of test results
> from re-running 1301 and 1316.
> 
> The changes I had to make in order to get to these results are:
> 
> 1. Apply your patch for geometry sanity check to xfs_db/xfs_repair
> 75581a8 xfs_db: sanitize geometry on load
> 2efc292 xfs_scrub: create a script to scrub all xfs filesystems
> 
> 2. Apply my patch to common/fuzzy
> 0bf843b fuzzy: use xfs_db -c to execute commands
> 1377e1e xfs: fuzz every field of every structure
> 
> 3. Convert ASSERT() with XFS_DEBUG=y to asswarn(), because fuzzing
> keeps tripping the kernel over with ASSERTS (see attached dmesg logs)
> 
> In the attached results, both tests get ASSERTs (see *.dmesg).
> In test 1301, xfs_repair gets several SIGSEGV and SIGPFE (see 1301.full)
> and xfs_db gets several SIGFPE (see 1301.out.bad).
> 
> This is a sample backtrace from SIGPFE in xfs_repair:

The xfs_repair problems I think stem from trying to use the fubar'd AG0
superblock instead of giving up on it and searching for another sb.  Try
the patch "xfs_repair: strengthen geometry checks" to see if the repair
crashes go away.

As for xfs_db, yeah, bonkers geometry can make it explode... not clear
what we realistically can do about that; second-guessing the geometry
hasn't proven popular.

--D

> Core was generated by `/sbin/xfs_repair -n /dev/mapper/storage-scratch'.
> Program terminated with signal SIGFPE, Arithmetic exception.
> #0  0x0000000000434053 in libxfs_mount (mp=mp@entry=0x7ffd5e9cdd50,
> sb=sb@entry=0x7ffd5e9cdc40, dev=64514, logdev=<optimized out>,
> rtdev=<optimized out>, flags=flags@entry=0) at init.c:702
> 702                     mp->m_maxicount = ((mp->m_maxicount /
> mp->m_ialloc_blks) *
> (gdb) bt
> #0  0x0000000000434053 in libxfs_mount (mp=mp@entry=0x7ffd5e9cdd50,
> sb=sb@entry=0x7ffd5e9cdc40, dev=64514, logdev=<optimized out>,
> rtdev=<optimized out>, flags=flags@entry=0) at init.c:702
> #1  0x0000000000403758 in main (argc=<optimized out>, argv=<optimized
> out>) at xfs_repair.c:724
> (gdb) p mp->m_ialloc_blks
> $1 = 0
> (gdb)
> 
> Here is another from SIGSEGV in xfs_repair:
> 
> Core was generated by `/sbin/xfs_repair -n /dev/mapper/storage-scratch'.
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0  xfs_inode_buf_verify (bp=0x7fe02400d410, readahead=false) at
> xfs_inode_buf.c:102
> 102                     di_ok = dip->di_magic ==
> cpu_to_be16(XFS_DINODE_MAGIC) &&
> [Current thread is 1 (Thread 0x7fe042d18700 (LWP 22284))]
> (gdb) bt
> #0  xfs_inode_buf_verify (bp=0x7fe02400d410, readahead=false) at
> xfs_inode_buf.c:102
> #1  0x0000000000436b53 in libxfs_readbuf_verify
> (bp=bp@entry=0x7fe02400d410, ops=<optimized out>) at rdwr.c:966
> #2  0x0000000000426d6d in pf_read_inode_dirs (bp=0x7fe02400d410,
> args=0xcdcaf0) at prefetch.c:402
> #3  pf_batch_read (args=args@entry=0xcdcaf0,
> which=which@entry=PF_PRIMARY, buf=buf@entry=0x7fe03c026400) at
> prefetch.c:599
> #4  0x000000000042705c in pf_io_worker (param=0xcdcaf0) at prefetch.c:661
> #5  0x00007fe052ee06fa in start_thread (arg=0x7fe042d18700) at
> pthread_create.c:333
> #6  0x00007fe0529d5b5d in clone () at
> ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
> (gdb) p dip
> $1 = (xfs_dinode_t *) 0x7fe02420d600
> (gdb) p mp
> $2 = (struct xfs_mount *) 0x7ffca0f3dd80
> (gdb)
> 
> And here is one from SIGPFE in xfs_db, which is similar to the xfs_repair one:
> 
> Core was generated by `/usr/sbin/xfs_db -F -i -p xfs_check -c check
> /dev/mapper/storage-scratch'.
> Program terminated with signal SIGFPE, Arithmetic exception.
> #0  0x0000000000426a63 in libxfs_mount (mp=mp@entry=0x6af480 <xmount>,
> sb=sb@entry=0x6af480 <xmount>, dev=64514, logdev=<optimized out>,
> rtdev=<optimized out>, flags=flags@entry=1) at init.c:702
> 702                     mp->m_maxicount = ((mp->m_maxicount /
> mp->m_ialloc_blks) *
> (gdb) bt
> #0  0x0000000000426a63 in libxfs_mount (mp=mp@entry=0x6af480 <xmount>,
> sb=sb@entry=0x6af480 <xmount>, dev=64514, logdev=<optimized out>,
> rtdev=<optimized out>, flags=flags@entry=1) at init.c:702
> #1  0x0000000000418233 in init (argc=<optimized out>,
> argv=argv@entry=0x7ffd0a100058) at init.c:222
> #2  0x0000000000404fd7 in main (argc=<optimized out>,
> argv=0x7ffd0a100058) at init.c:267
> (gdb) p mp->m_ialloc_blks
> $1 = 0
> (gdb)
> 
> Hope this will help you narrow down the suspects.
> 
> Amir.

--
To unsubscribe from this list: send the line "unsubscribe linux-xfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html