Re: [PATCH] xfs_repair: -EFSBADCRC needs action when read verifier detects it.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 26, 2025 at 10:20:02AM -0800, Darrick J. Wong wrote:
> On Wed, Feb 26, 2025 at 11:32:22AM -0600, bodonnel@xxxxxxxxxx wrote:
> > From: Bill O'Donnell <bodonnel@xxxxxxxxxx>
> > 
> > For xfs_repair, there is a case when -EFSBADCRC is encountered but not
> > acted on. Modify da_read_buf to check for and repair. The current
> > implementation fails for the case:
> > 
> > $ xfs_repair xfs_metadump_hosting.dmp.image
> > Phase 1 - find and verify superblock...
> > Phase 2 - using internal log
> >         - zero log...
> >         - scan filesystem freespace and inode maps...
> >         - found root inode chunk
> > Phase 3 - for each AG...
> >         - scan and clear agi unlinked lists...
> >         - process known inodes and perform inode discovery...
> >         - agno = 0
> > Metadata CRC error detected at 0x46cde8, xfs_dir3_block block 0xd3c50/0x1000
> > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> > corrupt directory block 0 for inode 867467
> 
> Curious -- this corrupt directory block fails the magic checks but
> process_dir2_data returns 0 because it didn't find any corruption.
> So it looks like we release the directory buffer (without dirtying it to
> reset the checksum)...
> 
> >         - agno = 1
> >         - agno = 2
> >         - agno = 3
> >         - process newly discovered inodes...
> > Phase 4 - check for duplicate blocks...
> >         - setting up duplicate extent list...
> >         - check for inodes claiming duplicate blocks...
> >         - agno = 0
> >         - agno = 1
> >         - agno = 3
> >         - agno = 2
> > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> 
> ...and then it shows up here again...
> 
> > Phase 5 - rebuild AG headers and trees...
> >         - reset superblock...
> > Phase 6 - check inode connectivity...
> >         - resetting contents of realtime bitmap and summary inodes
> >         - traversing filesystem ...
> > bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
> 
> ...and again here.  Now we reset the magic and dirty the buffer...
> 
> >         - traversal finished ...
> >         - moving disconnected inodes to lost+found ...
> > Phase 7 - verify and correct link counts...
> > Metadata corruption detected at 0x46cc88, xfs_dir3_block block 0xd3c50/0x1000
> 
> ...but I guess we haven't fixed anything in the buffer, so the verifier
> trips.  What code does 0x46cc88 map to in the dir3 block verifier
> function?  That might reflect some missing code in process_dir2_data.

0x46cc88 maps to xfs_dir3_block_verify (bp=bp@entry=0x7fffbc103610) at xfs_dir2_block.c:56

> 
> > libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0xd3c50/0x8
> > xfs_repair: Releasing dirty buffer to free list!
> > xfs_repair: Refusing to write a corrupt buffer to the data device!
> > xfs_repair: Lost a write to the data device!
> > 
> > fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.
> > 
> > 
> > With the patch applied:
> > $ xfs_repair xfs_metadump_hosting.dmp.image
> > Phase 1 - find and verify superblock...
> > Phase 2 - using internal log
> >         - zero log...
> >         - scan filesystem freespace and inode maps...
> >         - found root inode chunk
> > Phase 3 - for each AG...
> >         - scan and clear agi unlinked lists...
> >         - process known inodes and perform inode discovery...
> >         - agno = 0
> > Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
> > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> >         - agno = 1
> >         - agno = 2
> >         - agno = 3
> >         - process newly discovered inodes...
> > Phase 4 - check for duplicate blocks...
> >         - setting up duplicate extent list...
> >         - check for inodes claiming duplicate blocks...
> >         - agno = 0
> >         - agno = 1
> >         - agno = 2
> >         - agno = 3
> > bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > Phase 5 - rebuild AG headers and trees...
> >         - reset superblock...
> > Phase 6 - check inode connectivity...
> >         - resetting contents of realtime bitmap and summary inodes
> >         - traversing filesystem ...
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > rebuilding directory inode 867467
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> > cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> > cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> >         - traversal finished ...
> >         - moving disconnected inodes to lost+found ...
> > Phase 7 - verify and correct link counts...
> > done
> > 
> > Signed-off-by: Bill O'Donnell <bodonnel@xxxxxxxxxx>
> > ---
> >  repair/da_util.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/repair/da_util.c b/repair/da_util.c
> > index 7f94f4012062..0a4785e6f69b 100644
> > --- a/repair/da_util.c
> > +++ b/repair/da_util.c
> > @@ -66,6 +66,9 @@ da_read_buf(
> >  	}
> >  	libxfs_buf_read_map(mp->m_dev, map, nex, LIBXFS_READBUF_SALVAGE,
> >  			&bp, ops);
> > +	if (bp->b_error == -EFSBADCRC) {
> > +		libxfs_buf_relse(bp);
> 
> This introduces a use-after-free on the buffer pointer.
> 
> --D
> 
> > +	}
> >  	if (map != map_array)
> >  		free(map);
> >  	return bp;
> > -- 
> > 2.48.1
> > 
> > 
> 





[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux