Re: possible deadlock in start_this_handle (2)

Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx> · Fri, 19 Feb 2021 19:15:33 +0900

On 2021/02/15 23:29, Jan Kara wrote:
> On Mon 15-02-21 23:06:15, Tetsuo Handa wrote:
>> On 2021/02/15 21:45, Jan Kara wrote:
>>> On Sat 13-02-21 23:26:37, Tetsuo Handa wrote:
>>>> Excuse me, but it seems to me that nothing prevents
>>>> ext4_xattr_set_handle() from reaching ext4_xattr_inode_lookup_create()
>>>> without memalloc_nofs_save() when hitting ext4_get_nojournal() path.
>>>> Will you explain when ext4_get_nojournal() path is executed?
>>>
>>> That's a good question but sadly I don't think that's it.
>>> ext4_get_nojournal() is called when the filesystem is created without a
>>> journal. In that case we also don't acquire jbd2_handle lockdep map. In the
>>> syzbot report we can see:
>>
>> Since syzbot can test filesystem images, syzbot might have tested a filesystem
>> image created both with and without journal within this boot.
> 
> a) I think that syzbot reboots the VM between executing different tests to
> get reproducible conditions. But in theory I agree the test may have
> contained one image with and one image without a journal.

syzkaller reboots the VM upon a crash.
syzkaller can test multiple filesystem images within one boot.

https://storage.googleapis.com/syzkaller/cover/ci-qemu-upstream-386.html (this
statistic snapshot is volatile) reports that ext4_get_nojournal() is partially covered
( https://github.com/google/syzkaller/blob/master/docs/coverage.md ) by syzkaller.

      /* Just increment the non-pointer handle value */
      static handle_t *ext4_get_nojournal(void)
      {
   86         handle_t *handle = current->journal_info;
              unsigned long ref_cnt = (unsigned long)handle;

              BUG_ON(ref_cnt >= EXT4_NOJOURNAL_MAX_REF_COUNT);

   86         ref_cnt++;
              handle = (handle_t *)ref_cnt;

              current->journal_info = handle;
 2006         return handle;
      }

      /* Decrement the non-pointer handle value */
      static void ext4_put_nojournal(handle_t *handle)
      {
              unsigned long ref_cnt = (unsigned long)handle;

   85         BUG_ON(ref_cnt == 0);

   85         ref_cnt--;
              handle = (handle_t *)ref_cnt;

              current->journal_info = handle;
      }

      handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
                                        int type, int blocks, int rsv_blocks,
                                        int revoke_creds)
      {
              journal_t *journal;
              int err;

 2006         trace_ext4_journal_start(sb, blocks, rsv_blocks, revoke_creds,
 2006                                  _RET_IP_);
 2006         err = ext4_journal_check_start(sb);
              if (err < 0)
                      return ERR_PTR(err);

 2005         journal = EXT4_SB(sb)->s_journal;
 1969         if (!journal || (EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY))
 2006                 return ext4_get_nojournal();
 1969         return jbd2__journal_start(journal, blocks, rsv_blocks, revoke_creds,
                                         GFP_NOFS, type, line);
      }

> 
> *but*
> 
> b) as I wrote in the email you are replying to, the jbd2_handle key is
> private per filesystem. Thus for lockdep to complain about
> jbd2_handle->fs_reclaim->jbd2_handle deadlock, those jbd2_handle lockdep
> maps must come from the same filesystem.
> 
> *and*
> 
> c) filesystem without journal doesn't use jbd2_handle lockdep map at all so
> for such filesystems lockdep creates no dependency for jbd2_handle map.
> 

What about "EXT4_SB(sb)->s_mount_state & EXT4_FC_REPLAY)" case?
Does this case happen on filesystem with journal, and can this case be executed
by fuzzing a crafted (a sort of erroneous) filesystem with journal, and are
the jbd2_handle for calling ext4_get_nojournal() case and the jbd2_handle for
calling jbd2__journal_start() case the same?

Also, I worry that jbd2__journal_restart() is problematic, for it calls
stop_this_handle() (which calls memalloc_nofs_restore()) and then calls
start_this_handle() (which fails to call memalloc_nofs_save() if an error
occurs). An error from start_this_handle() from journal restart operation
might need special handling (i.e. either remount-ro or panic) ?