Re: [PATCH] ext4: flush s_error_work before journal destroy in ext4_fill_super

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





在 2021/7/23 20:11, Theodore Ts'o 写道:
On Tue, Jul 20, 2021 at 02:24:09PM +0800, yangerkun wrote:
'commit c92dc856848f ("ext4: defer saving error info from atomic
context")' and '2d01ddc86606 ("ext4: save error info to sb through journal
if available")' add s_error_work to fix checksum error problem. But the
error path in ext4_fill_super can lead the follow BUG_ON.

Can you share with me your test case?  Your patch will result in the
shrinker potentially not getting released in some error paths (which
will cause other kernel panics), and in any case, once the journal is
destroyed here:

@@ -5173,15 +5173,15 @@ static int ext4_fill_super(struct super_block *sb, void *data, int silent)
ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
  	sbi->s_ea_block_cache = NULL;
+failed_mount3a:
+	ext4_es_unregister_shrinker(sbi);
+failed_mount3:
+	flush_work(&sbi->s_error_work);
if (sbi->s_journal) {
  		jbd2_journal_destroy(sbi->s_journal);
  		sbi->s_journal = NULL;
  	}
-failed_mount3a:
-	ext4_es_unregister_shrinker(sbi);
-failed_mount3:
-	flush_work(&sbi->s_error_work);

sbi->s_journal is set to NULL, which means that in
flush_stashed_error_work(), journal will be NULL, which means we won't
call start_this_handle and so this change will not make a difference
given the kernel stack trace in the commit description.


For example, before wo goto failed_mount_wq, we may meet some error and will goto ext4_handle_error which can call schedule_work(&EXT4_SB(sb)->s_error_work). So the work may start concurrent with ext4_fill_super goto failed_mount_wq. There does not have any lock to protect the concurrent read and modifies for sbi->s_journal.


Injection fault some delay between jbd2_journal_destory and sbi->s_journal and

We can injection fault while we do mount and add some delay like follow. We will get some panic report easily...


 failed_mount_wq:
         ext4_xattr_destroy_cache(sbi->s_ea_inode_cache);
         sbi->s_ea_inode_cache = NULL;

         ext4_xattr_destroy_cache(sbi->s_ea_block_cache);
         sbi->s_ea_block_cache = NULL;

         if (sbi->s_journal) {
                 jbd2_journal_destroy(sbi->s_journal);
                 msleep(300); <---- add some delay
                 sbi->s_journal = NULL;
         }


So we need to make sure to work will exists while we destroy the journal. It maybe better to fix it by move the flush_work before destory journal.



Thanks,

						- Ted
.




[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux