Re: [PATCH v4] ext4: fix fast commit inode enqueueing during a full journal commit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 16 2024, Jan Kara wrote:

> On Thu 11-07-24 09:35:20, Luis Henriques (SUSE) wrote:
>> When a full journal commit is on-going, any fast commit has to be enqueued
>> into a different queue: FC_Q_STAGING instead of FC_Q_MAIN.  This enqueueing
>> is done only once, i.e. if an inode is already queued in a previous fast
>> commit entry it won't be enqueued again.  However, if a full commit starts
>> _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to
>> be done into FC_Q_STAGING.  And this is not being done in function
>> ext4_fc_track_template().
>> 
>> This patch fixes the issue by re-enqueuing an inode into the STAGING queue
>> during the fast commit clean-up callback if it has a tid (i_sync_tid)
>> greater than the one being handled.  The STAGING queue will then be spliced
>> back into MAIN.
>> 
>> This bug was found using fstest generic/047.  This test creates several 32k
>> bytes files, sync'ing each of them after it's creation, and then shutting
>> down the filesystem.  Some data may be loss in this operation; for example a
>> file may have it's size truncated to zero.
>> 
>> Signed-off-by: Luis Henriques (SUSE) <luis.henriques@xxxxxxxxx>
>
> ...
>
>> diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c
>> index 3926a05eceee..facbc8dbbaa2 100644
>> --- a/fs/ext4/fast_commit.c
>> +++ b/fs/ext4/fast_commit.c
>> @@ -1290,6 +1290,16 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid)
>>  				       EXT4_STATE_FC_COMMITTING);
>>  		if (tid_geq(tid, iter->i_sync_tid))
>>  			ext4_fc_reset_inode(&iter->vfs_inode);
>> +		} else if (tid) {
>> +			/*
>> +			 * If the tid is valid (i.e. non-zero) re-enqueue the
>> +			 * inode into STAGING, which will then be splice back
>> +			 * into MAIN
>> +			 */
>> +			list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
>> +				      &sbi->s_fc_q[FC_Q_STAGING]);
>> +		}
>
> I don't think this is going to work (even if we fix the tid 0 being special
> assumption). With this there would be a race like:
>
> Task 1					Task2
> modify inode I
> ext4_fc_commit()
>   jbd2_fc_begin_commit()
>   commits changes
>   jbd2_fc_end_commit()
>     __jbd2_fc_end_commit(journal, 0, false)
>       jbd2_journal_unlock_updates(journal)
> 					jbd2_journal_start()
> 					modify inode I
> 					...
> 					ext4_mark_iloc_dirty()
> 					  ext4_fc_track_inode()
> 					    ext4_fc_track_template()
> 					      - doesn't add inode anywhere
> 					      because i_fc_list is not empty
>       ext4_fc_cleanup(journal, 0, 0)
>         removes inode I from i_fc_list => next fastcommit will not properly
> flush it.
>
> To avoid this race I think we could move the
> journal->j_fc_cleanup_callback() call to happen before we call
> jbd2_journal_unlock_updates(). Then we are sure that inode cannot be
> modified (journal is locked) until we are done processing the fastcommit
> lists when doing fastcommit. Hence your patch could then be changed like:
>
> +		} else if (full) {
> +			/*
> +			 * We are called after a full commit, inode has been
> +			 * modified while the commit was running. Re-enqueue
> +			 * the inode into STAGING, which will then be splice
> +			 * back into MAIN. This cannot happen during
> +			 * fastcommit because the journal is locked all the
> +			 * time in that case (and tid doesn't increase so
> +			 * tid check above isn't reliable).
> +			 */
> +			list_add_tail(&EXT4_I(&iter->vfs_inode)->i_fc_list,
> +				      &sbi->s_fc_q[FC_Q_STAGING]);
> +		}
>
> Later, Harshad's patches change the code to use EXT4_STATE_FC_COMMITTING
> for protecting inodes during fastcommit and that will also deal with these
> races without having to keep the whole journal locked.

OK, this looks like it should fix all the issues I was trying to fix
(g/047, g/472, and a few others Ted pointed out).  I'll go run a few more
tests on this to try to catch any possible regression.

Once again, thanks a lot for your help, Jan.

Cheers,
-- 
Luís





[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux