Re: Fragmentation metadata checks incomplete in process_bmbt_reclist()

Eric Sandeen <sandeen@xxxxxxxxxxx> · Sat, 19 Oct 2019 18:45:10 -0500

On 10/16/19 7:24 PM, "Marc Schönefeld" wrote:
> Hi all, 
>  
> there seems to be a problem with correctly rejecting invalid metadata when using the frag command. This was tested with xfsprogs-dev, the 5.2.1 tarball, and 4.190 as in CentOS8).  
>  
> xfsprogs-dev/db/xfs_db -c frag ../xfsprogs_xfs_db_c_frag_convert_extent_invalid_read.xfsfile
>  
> Metadata CRC error detected at 0x42c836, xfs_agf block 0x1/0x200
> xfs_db: cannot init perag data (74). Continuing anyway.
> Metadata CRC error detected at 0x457316, xfs_agi block 0x2/0x200
> Metadata CRC error detected at 0x45e2ed, xfs_inobt block 0x18/0x1000
> Metadata corruption detected at 0x429885, xfs_inode block 0x1b00/0x8000
>  
> Program received signal SIGSEGV, Segmentation fault.
> convert_extent (rp=rp@entry=0x1537000, op=op@entry=0x7ffd95f7e020, sp=sp@entry=0x7ffd95f7e028, cp=cp@entry=0x7ffd95f7e018, 
>     fp=fp@entry=0x7ffd95f7e014) at ../include/xfs_arch.h:249
>  249 return (uint64_t)get_unaligned_be32(p) << 32 |
>  250                            get_unaligned_be32(p + 4);
>  251 }

As Dave mentioned elsewhere, xfs_db is a developer tool and it does its best
to carry on in the face of trouble...

The "frag" command is largely useless (as its output states) and the normal course
of action if it detects corruption (even if it coredumps as a result) would be to
run xfs_repair to fix it, and try again.

If you want to send a patch to handle this more gracefully, I'd review it, but I'm
not likely to spend any time digging into it because this is not a problem any
user is likely to face.  If their filesystem is corrupted, inability to run "frag"
is the least of their problems.

-Eric

> (gdb) bt
> #0  convert_extent (rp=rp@entry=0x1537000, op=op@entry=0x7ffd95f7e020, sp=sp@entry=0x7ffd95f7e028, 
>     cp=cp@entry=0x7ffd95f7e018, fp=fp@entry=0x7ffd95f7e014) at ../include/xfs_arch.h:249
> #1  0x0000000000416211 in process_bmbt_reclist (rp=0x1537000, numrecs=<optimized out>, extmapp=extmapp@entry=0x7ffd95f7e068)
>     at frag.c:229
> #2  0x0000000000416685 in process_btinode (whichfork=<optimized out>, extmapp=<optimized out>, dip=<optimized out>)
>     at ../include/xfs_arch.h:145
> #3  process_fork (dip=dip@entry=0x150e800, whichfork=whichfork@entry=0) at frag.c:287
> #4  0x0000000000416a81 in process_inode (agf=0x1506a00, dip=0x150e800, agino=6913) at frag.c:337
> #5  scanfunc_ino (block=0x1508200, level=level@entry=0, agf=agf@entry=0x1506a00) at frag.c:513
> #6  0x0000000000416cc5 in scan_sbtree (agf=agf@entry=0x1506a00, root=3, nlevels=1, btype=TYP_INOBT, 
>     func=0x416750 <scanfunc_ino>) at frag.c:416
> #7  0x0000000000416f2d in scan_ag (agno=0) at ../include/xfs_arch.h:158
> #8  frag_f (argc=<optimized out>, argv=<optimized out>) at frag.c:155
> #9  0x00000000004029e0 in main (argc=<optimized out>, argv=<optimized out>) at init.c:195
> (gdb) disass $pc,$pc+10
> Dump of assembler code from 0x405210 to 0x40521a:
> => 0x0000000000405210 <convert_extent+0>: mov    rax,QWORD PTR [rdi]
>    0x0000000000405213 <convert_extent+3>: mov    rdi,QWORD PTR [rdi+0x8]
>    0x0000000000405217 <convert_extent+7>: bswap  rax
> End of assembler dump.
> (gdb) info registers rdi
> rdi            0x1537000           22245376
> (gdb) x/4 $rdi
> 0x1537000: Cannot access memory at address 0x1537000
>  
> If required I can provide an image that triggers the issue via pm. 
>  
> Regards
> Marc Schoenefeld
>