Re: [PATCH v3 4/6] jbd2: Fix wrongly judgement for buffer head removing while doing checkpoint

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2023/6/15 4:37, Theodore Ts'o wrote:
> On Wed, Jun 14, 2023 at 09:25:28PM +0800, Zhang Yi wrote:
>>
>> Sorry about the regression, I found that this issue is not introduced
>> by the first patch in this patch series ("jbd2: recheck chechpointing
>> non-dirty buffer"), is d9eafe0afafa ("jbd2: factor out journal
>> initialization from journal_get_superblock()") [1] on your dev branch.
>>
>> The problem is the journal super block had been failed to write out
>> due to IO fault, it's uptodate bit was cleared by
>> end_buffer_write_syn() and didn't reset yet in jbd2_write_superblock().
>> And it raced by jbd2_journal_revoke()->jbd2_journal_set_features()->
>> jbd2_journal_check_used_features()->journal_get_superblock()->bh_read(),
>> unfortunately, the read IO is also fail, so the error handling in
>> journal_fail_superblock() clear the journal->j_sb_buffer, finally lead
>> to above NULL pointer dereference issue.
> 
> Thanks for looking into this.  What I believe you are saying is that
> the root cause is that earlier patch, but it is still something about
> the patch "jbd2: recheck chechpointing non-dirty buffer" which is
> changing the timing enough that we're hitting this buffer (because
> without the "recheck checkpointing" patch, I'm not seeing the NULL
> pointer dereference.

I have send out a separate patch names "jbd2: skip reading super block
if it has been verified" to fix above NULL pointer dereference issue, I
have been runing ext3 generic/475 about 12hours and have not reproduced
the problem again (I will also do more tests later). Please take a look
at it.

> 
> As far as the e2fsck bug that was causing it to hang in the ext4/adv
> test scenario, the patch was a simple one, and I have also checked in
> a test case which was a reliable reproducer of the problem.  (See
> attached for the patches and more detail.)
> 
> It is really interesting that "recheck checkpointing" patch is making
> enough of a difference that it is unmasking these bugs.  If you could
> take a look at these changes and perhaps think about how this patch
> series could be changing the nature of the corruption (specifically,
> how symlink inodes referenced from inline directories could be
> corupted with "rechecking checkpointing", thus unmasking the
> e2fsprogs, I'd really appreciate it.
> 

Sure, we will take a look at it for details.

Thanks,
Yi.




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux