On Mon, Jun 03, 2024 at 12:14:10PM +0800, lei lu wrote: > Make sure xlog_op_header don't stray beyond valid memory region. > Check if there is enough space for the fixed members of each > xlog_op_header before visiting. > > Signed-off-by: lei lu <llfamsec@xxxxxxxxx> Finding a corrupted ophdr chain this deep in recovery implies that there is either a runtime bug when writing out the log buffers during checkpointing or there is memory corruption occurring during log recovery after the log items have been read from the journal and attached to the transaction structure for processing. If that's the case, then addressing the buffer overrun is not fixing the underlying issue that needs to be triaged and fixed. The only other way I can think of that this might occur is malicious corruption of the journal structures on disk. If that's the case, then you need to state this explicitly so we know that this isn't a potentially critical runtime bug we need to spend time on immediately. Hence we need to know how you found the issue, what it's symptoms are, how to reproduce the issue, etc. ie. you need to explain where you found a CRC validated journal structure that is internally inconsistent. That's -critical information- that we need to assess the scope of the issue that you have uncovered. I asked for you to provide this sort of detail to be provided with your last overrun fix - we need to know how such failures come about because they often are just a symptom of some other issue we failed to capture. You might have all the details in your head, but none of us know anything about what you are doing. Can you describe what tools and processes you are using to find these issues, and please try to include such descriptions in the commit message for future fixes to issues you find? > --- > fs/xfs/xfs_log_recover.c | 6 +++++- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_log_recover.c b/fs/xfs/xfs_log_recover.c > index 1251c81e55f9..660e79a07ce6 100644 > --- a/fs/xfs/xfs_log_recover.c > +++ b/fs/xfs/xfs_log_recover.c > @@ -2361,6 +2361,11 @@ xlog_recover_process_ophdr( > unsigned int len; > int error; > > + if (dp > end) { > + xfs_warn(log->l_mp, "%s: op header overrun", __func__); > + return -EFSCORRUPTED; > + } Why did you put the check here, rather than... > + > /* Do we understand who wrote this op? */ > if (ohead->oh_clientid != XFS_TRANSACTION && > ohead->oh_clientid != XFS_LOG) { > @@ -2456,7 +2461,6 @@ xlog_recover_process_data( > > ohead = (struct xlog_op_header *)dp; > dp += sizeof(*ohead); > - ASSERT(dp <= end); .... where the pointer is advanced and the overrun is exposed? -Dave. -- Dave Chinner david@xxxxxxxxxxxxx