Re: [PATCH] xfs_repair: -EFSBADCRC needs action when read verifier detects it.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 26, 2025 at 11:32:22AM -0600, bodonnel@xxxxxxxxxx wrote:
> From: Bill O'Donnell <bodonnel@xxxxxxxxxx>
> 
> For xfs_repair, there is a case when -EFSBADCRC is encountered but not
> acted on. Modify da_read_buf to check for and repair. The current
> implementation fails for the case:
> 
> $ xfs_repair xfs_metadump_hosting.dmp.image
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
> Metadata CRC error detected at 0x46cde8, xfs_dir3_block block 0xd3c50/0x1000
> bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> corrupt directory block 0 for inode 867467

Curious -- this corrupt directory block fails the magic checks but
process_dir2_data returns 0 because it didn't find any corruption.
So it looks like we release the directory buffer (without dirtying it to
reset the checksum)...

>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 3
>         - agno = 2
> bad directory block magic # 0x16011664 in block 0 for directory inode 867467

...and then it shows up here again...

> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
> bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233

...and again here.  Now we reset the magic and dirty the buffer...

>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> Metadata corruption detected at 0x46cc88, xfs_dir3_block block 0xd3c50/0x1000

...but I guess we haven't fixed anything in the buffer, so the verifier
trips.  What code does 0x46cc88 map to in the dir3 block verifier
function?  That might reflect some missing code in process_dir2_data.

> libxfs_bwrite: write verifier failed on xfs_dir3_block bno 0xd3c50/0x8
> xfs_repair: Releasing dirty buffer to free list!
> xfs_repair: Refusing to write a corrupt buffer to the data device!
> xfs_repair: Lost a write to the data device!
> 
> fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.
> 
> 
> With the patch applied:
> $ xfs_repair xfs_metadump_hosting.dmp.image
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
>         - zero log...
>         - scan filesystem freespace and inode maps...
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan and clear agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
> Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
> bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
>         - agno = 1
>         - agno = 2
>         - agno = 3
>         - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
>         - setting up duplicate extent list...
>         - check for inodes claiming duplicate blocks...
>         - agno = 0
>         - agno = 1
>         - agno = 2
>         - agno = 3
> bad directory block magic # 0x16011664 in block 0 for directory inode 867467
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> Phase 5 - rebuild AG headers and trees...
>         - reset superblock...
> Phase 6 - check inode connectivity...
>         - resetting contents of realtime bitmap and summary inodes
>         - traversing filesystem ...
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> Metadata CRC error detected at 0x46ce28, xfs_dir3_block block 0xd3c50/0x1000
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> bad directory block magic # 0x16011664 for directory inode 867467 block 0: fixing magic # to 0x58444233
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> rebuilding directory inode 867467
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
> cache_node_put: node put on refcount 0 (node=0x7f46ac0c5610)
> cache_node_put: node put on node (0x7f46ac0c5610) in MRU list
>         - traversal finished ...
>         - moving disconnected inodes to lost+found ...
> Phase 7 - verify and correct link counts...
> done
> 
> Signed-off-by: Bill O'Donnell <bodonnel@xxxxxxxxxx>
> ---
>  repair/da_util.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/repair/da_util.c b/repair/da_util.c
> index 7f94f4012062..0a4785e6f69b 100644
> --- a/repair/da_util.c
> +++ b/repair/da_util.c
> @@ -66,6 +66,9 @@ da_read_buf(
>  	}
>  	libxfs_buf_read_map(mp->m_dev, map, nex, LIBXFS_READBUF_SALVAGE,
>  			&bp, ops);
> +	if (bp->b_error == -EFSBADCRC) {
> +		libxfs_buf_relse(bp);

This introduces a use-after-free on the buffer pointer.

--D

> +	}
>  	if (map != map_array)
>  		free(map);
>  	return bp;
> -- 
> 2.48.1
> 
> 




[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux