On Thu 19-05-22 07:28:11, harshad shirwadkar wrote: > On Wed, 27 Apr 2022 at 08:50, Jan Kara <jack@xxxxxxx> wrote: > > > > On Tue 19-04-22 10:31:39, Harshad Shirwadkar wrote: > > > From: Harshad Shirwadkar <harshadshirwadkar@xxxxxxxxx> > > > > > > If the inode that's being requested to track using ext4_fc_track_inode > > > is being committed, then wait until the inode finishes the > > > commit. Also, add calls to ext4_fc_track_inode at the right places. > > > > > > With this patch, now calling ext4_reserve_inode_write() results in > > > inode being tracked for next fast commit. A subtle lock ordering > > > requirement with i_data_sem (which is documented in the code) requires > > > that ext4_fc_track_inode() be called before grabbing i_data_sem. So, > > > this patch also adds explicit ext4_fc_track_inode() calls in places > > > where i_data_sem grabbed. > > > > > > Signed-off-by: Harshad Shirwadkar <harshadshirwadkar@xxxxxxxxx> > > > --- > > > fs/ext4/fast_commit.c | 38 ++++++++++++++++++++++++++++++++++++++ > > > fs/ext4/inline.c | 3 +++ > > > fs/ext4/inode.c | 5 ++++- > > > 3 files changed, 45 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c > > > index c278060a15bc..55f4c5ddd8e5 100644 > > > --- a/fs/ext4/fast_commit.c > > > +++ b/fs/ext4/fast_commit.c > > > + /* > > > + * If we come here, we may sleep while waiting for the inode to > > > + * commit. We shouldn't be holding i_data_sem in write mode when we go > > > + * to sleep since the commit path needs to grab the lock while > > > + * committing the inode. > > > + */ > > > + WARN_ON(lockdep_is_held_type(&ei->i_data_sem, 1)); > > > > Note that we can deadlock even if we had i_data_sem for reading because > > another reader is not allowed to get the rwsem if there is writer waiting > > for it. So we need to check even that case here. > I turned the above WARN_ON to check if data_sem is held in either read > or write mode and now I am seeing many other places where data_sem > gets grabbed in read mode before calling ext4_fc_track_inode(). Hum, that's unpleasant. Which places BTW? I'd expect this mostly happens in ext4_map_blocks() paths. Anywhere else? > We either need to call ext4_fc_track_inode() before all > ext4_reserve_inode_write() in all those cases, or ensure that places that > acquire in data_sem in write mode, wait if there's an ongoing commit and > only then lock data_sem. Neither is particularly appealing I guess. As we discussed on the call, probably using extent status tree for the fastcommit code might be a cleaner option. > > > + while (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) { > > > +#if (BITS_PER_LONG < 64) > > > + DEFINE_WAIT_BIT(wait, &ei->i_state_flags, > > > + EXT4_STATE_FC_COMMITTING); > > > + wq = bit_waitqueue(&ei->i_state_flags, > > > + EXT4_STATE_FC_COMMITTING); > > > +#else > > > + DEFINE_WAIT_BIT(wait, &ei->i_flags, > > > + EXT4_STATE_FC_COMMITTING); > > > + wq = bit_waitqueue(&ei->i_flags, > > > + EXT4_STATE_FC_COMMITTING); > > > +#endif > > > + prepare_to_wait(wq, &wait.wq_entry, TASK_UNINTERRUPTIBLE); > > > + if (ext4_test_inode_state(inode, EXT4_STATE_FC_COMMITTING)) > > > + schedule(); > > > + finish_wait(wq, &wait.wq_entry); > > > + } > > > + > > > ret = ext4_fc_track_template(handle, inode, __track_inode, NULL, 1); > > > trace_ext4_fc_track_inode(handle, inode, ret); > > > > As we discussed in the call we should tell lockdep that this is equivalent > > to lock+unlock of let's say fc_committing_lock and the fastcommit code > > setting / clearing EXT4_STATE_FC_COMMITTING is equivalent to lock / unlock > > of fc_committing_lock. That way we get proper lockdep tracking of this > > waiting primitive. > Sure, so you mean just adding __acquires() / __releases() annotations > in these places right? No. __acquires() and __releases() are sparse annotations. Sparse does also some lock checking but it is a static checker and is pretty trivial. Here you need to instrument lockdep. We do similar thing in jbd2 to tell lockdep that starting a transaction handle effectively behaves as a lock - see the rwsem_acquire_read() and rwsem_release() in start_this_handle() and stop_this_handle(), respectively. Honza -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR