On 2021/8/4 11:45 PM, Jan Kara wrote: > On Sat 24-07-21 15:41:23, Wang Jianchao wrote: >> From: Wang Jianchao <wangjianchao@xxxxxxxxxxxx> >> >> Right now, discard is issued and waited to be completed in jbd2 >> commit kthread context after the logs are committed. When large >> amount of files are deleted and discard is flooding, jbd2 commit >> kthread can be blocked for long time. Then all of the metadata >> operations can be blocked to wait the log space. >> >> One case is the page fault path with read mm->mmap_sem held, which >> wants to update the file time but has to wait for the log space. >> When other threads in the task wants to do mmap, then write mmap_sem >> is blocked. Finally all of the following read mmap_sem requirements >> are blocked, even the ps command which need to read the /proc/pid/ >> -cmdline. Our monitor service which needs to read /proc/pid/cmdline >> used to be blocked for 5 mins. >> >> This patch frees the blocks back to buddy after commit and then do >> discard in a async kworker context in fstrim fashion, namely, >> - mark blocks to be discarded as used if they have not been allocated >> - do discard >> - mark them free >> After this, jbd2 commit kthread won't be blocked any more by discard >> and we won't get NOSPC even if the discard is slow or throttled. >> >> Link: https://marc.info/?l=linux-kernel&m=162143690731901&w=2 >> Suggested-by: Theodore Ts'o <tytso@xxxxxxx> >> Signed-off-by: Wang Jianchao <wangjianchao@xxxxxxxxxxxx> > > Looks good to me. Just one small comment below. With that addressed feel > free to add: > > Reviewed-by: Jan Kara <jack@xxxxxxx> > > >> @@ -3474,6 +3530,14 @@ int ext4_mb_release(struct super_block *sb) >> struct kmem_cache *cachep = get_groupinfo_cache(sb->s_blocksize_bits); >> int count; >> >> + if (test_opt(sb, DISCARD)) { >> + /* >> + * wait the discard work to drain all of ext4_free_data >> + */ >> + queue_work(ext4_discard_wq, &sbi->s_discard_work); > > Do we really need to queue the work here? The filesystem should be > quiescent by now, we take care to queue the work whenever we add item to > empty list. So it should be enough to have flush_work() here and then > possibly > > WARN_ON_ONCE(!list_empty(&sbi->s_discard_list)) > > Or am I missing something? queue_work here is indeed redundant. Thanks so much for you point out this. Jianchao > > Honza > >> + flush_work(&sbi->s_discard_work); >> + } >> +