Re: [PATCH V3 4/5] ext4: get discard out of jbd2 commit kthread contex

Wang Jianchao <jianchao.wan9@xxxxxxxxx> · Thu, 26 Aug 2021 15:15:55 +0800

On 2021/8/4 11:45 PM, Jan Kara wrote:
> On Sat 24-07-21 15:41:23, Wang Jianchao wrote:
>> From: Wang Jianchao <wangjianchao@xxxxxxxxxxxx>
>>
>> Right now, discard is issued and waited to be completed in jbd2
>> commit kthread context after the logs are committed. When large
>> amount of files are deleted and discard is flooding, jbd2 commit
>> kthread can be blocked for long time. Then all of the metadata
>> operations can be blocked to wait the log space.
>>
>> One case is the page fault path with read mm->mmap_sem held, which
>> wants to update the file time but has to wait for the log space.
>> When other threads in the task wants to do mmap, then write mmap_sem
>> is blocked. Finally all of the following read mmap_sem requirements
>> are blocked, even the ps command which need to read the /proc/pid/
>> -cmdline. Our monitor service which needs to read /proc/pid/cmdline
>> used to be blocked for 5 mins.
>>
>> This patch frees the blocks back to buddy after commit and then do
>> discard in a async kworker context in fstrim fashion, namely,
>>  - mark blocks to be discarded as used if they have not been allocated
>>  - do discard
>>  - mark them free
>> After this, jbd2 commit kthread won't be blocked any more by discard
>> and we won't get NOSPC even if the discard is slow or throttled.
>>
>> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2
>> Suggested-by: Theodore Ts'o <tytso@xxxxxxx>
>> Signed-off-by: Wang Jianchao <wangjianchao@xxxxxxxxxxxx>
> 
> Looks good to me. Just one small comment below. With that addressed feel
> free to add:
> 
> Reviewed-by: Jan Kara <jack@xxxxxxx>
> 
> 
>> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb)
>>  	struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits);
>>  	int count;
>>  
>> +	if (test_opt(sb, DISCARD)) {
>> +		/*
>> +		 * wait the discard work to drain all of ext4_free_data
>> +		 */
>> +		queue_work(ext4_discard_wq, &sbi->s_discard_work);
> 
> Do we really need to queue the work here? The filesystem should be
> quiescent by now, we take care to queue the work whenever we add item to
> empty list. So it should be enough to have flush_work() here and then
> possibly
> 
> 	WARN_ON_ONCE(!list_empty(&sbi->s_discard_list))
> 
> Or am I missing something?

queue_work here is indeed redundant.

Thanks so much for you point out this.
Jianchao

> 
> 								Honza
> 
>> +		flush_work(&sbi->s_discard_work);
>> +	}
>> +