Re: [PATCH v2 2/5] ext4: for committing inode, make ext4_fc_track_inode wait

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hmm, after removing ext4_fc_track_inode() from ext4_journal_start(), I
see one deadlock - there are some places in code where
ext4_mark_inode_dirty gets called while holding i_data_sem. Commit
path requires i_data_sem to commit inode data (via ext4_map_blocks).
So, if an inode is being committed, ext4_mark_inode_dirty may start
waiting for the inode to commit while holding i_data_sem and the
commit path may wait on i_data_sem. The right way to fix this is to
call ext4_fc_track_inode in such places before acquiring i_data_sem in
write mode. But that would mean we sprinkle code with more
ext4_fc_track_inode() calls which is something that I preferably
wanted to avoid.

This makes me wonder though, for fast commits, is it a terrible idea
to extend the meaning of ext4_journal_start() from "start a new
handle" to "start a new handle with an intention to modify the passed
inode"? Most of the handles modify only one inode, and for other
places where we do modify multiple inodes, ext4_reserve_inode_write()
would ensure that those inodes are tracked as well. All of the
existing places where inode gets modified after grabbing i_data_sem,
i_data_sem is grabbed only after starting the handle. This would take
care of the deadlock mentioned above and similar deadlocks. Another
advantage with doing this is that developers wouldn't need to worry
about adding explicit ext4_fc_track_inode() calls for future changes.

If we decide to do this, we would need to do a thorough code review to
ensure that the above rule is followed everywhere. But note that
ext4_fc_track_inode() is idempotent, so it doesn't matter if this
function gets called multiple times in the same handle. So to avoid
breaking fast commits, we can be super careful and in the first
version, we can have ext4_fc_track_range() calls in
ext4_reserve_inode_dirty(), ext4_journal_start(), inline.c and in
handles where i_data_sem gets acquired in write mode. We can then
carefully evaluate each code path and remove redundant
ext4_fc_track_range() calls.

What do you think?

- Harshad

On Thu, 10 Mar 2022 at 20:17, harshad shirwadkar
<harshadshirwadkar@xxxxxxxxx> wrote:
>
> Thanks for the reviews Jan! I'll update inline.c as you mentioned in
> the next version.
>
> - Harshad
>
> On Wed, 9 Mar 2022 at 02:14, Jan Kara <jack@xxxxxxx> wrote:
> >
> > On Tue 08-03-22 08:33:16, Harshad Shirwadkar wrote:
> > > From: Harshad Shirwadkar <harshadshirwadkar@xxxxxxxxx>
> > >
> > > If the inode that's being requested to track using ext4_fc_track_inode
> > > is being committed, then wait until the inode finishes the commit.
> > >
> > > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@xxxxxxxxx>
> >
> > One comment below...
> >
> > > --- a/fs/ext4/ext4_jbd2.c
> > > +++ b/fs/ext4/ext4_jbd2.c
> > > @@ -106,6 +106,18 @@ handle_t *__ext4_journal_start_sb(struct super_block *sb, unsigned int line,
> > >                                  GFP_NOFS, type, line);
> > >  }
> > >
> > > +handle_t *__ext4_journal_start(struct inode *inode, unsigned int line,
> > > +                               int type, int blocks, int rsv_blocks,
> > > +                               int revoke_creds)
> > > +{
> > > +     handle_t *handle = __ext4_journal_start_sb(inode->i_sb, line,
> > > +                                                type, blocks, rsv_blocks,
> > > +                                                revoke_creds);
> > > +     if (ext4_handle_valid(handle) && !IS_ERR(handle))
> > > +             ext4_fc_track_inode(handle, inode);
> > > +     return handle;
> > > +}
> > > +
> >
> > Please fix fs/ext4/inline.c rather than papering over the problem like
> > this. Because it is just a landmine waiting to explode in some strange
> > cornercase when someone does not call ext4_journal_start() but other handle
> > starting function.
> >
> >                                                                 Honza
> > --
> > Jan Kara <jack@xxxxxxxx>
> > SUSE Labs, CR



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux