Cc: Al Viro <viro@xxxxxxxxxxxxxxxxxx> On 08/19/2013 04:06 PM, Jan Kara wrote: > On Sat 17-08-13 11:36:03, Junxiao Bi wrote: >> There is a race between mark inode dirty and writeback thread, >> see the following scenario. In this case, writeback thread will >> not run though there is dirty_io. >> >> __mark_inode_dirty() bdi_writeback_workfn() >> ... ... >> spin_lock(&inode->i_lock); >> ... >> if (bdi_cap_writeback_dirty(bdi)) { >> <<< assume wb has dirty_io, so wakeup_bdi is false. >> <<< the following inode_dirty also have wakeup_bdi false. >> if (!wb_has_dirty_io(&bdi->wb)) >> wakeup_bdi = true; >> } >> spin_unlock(&inode->i_lock); >> <<< assume last dirty_io is removed here. >> pages_written = wb_do_writeback(wb); >> ... >> <<< work_list empty and wb has no dirty_io, >> <<< delayed_work will not be queued. >> if (!list_empty(&bdi->work_list) || >> (wb_has_dirty_io(wb) && dirty_writeback_interval)) >> queue_delayed_work(bdi_wq, &wb->dwork, >> msecs_to_jiffies(dirty_writeback_interval * 10)); >> spin_lock(&bdi->wb.list_lock); >> inode->dirtied_when = jiffies; >> <<< new dirty_io is added. >> list_move(&inode->i_wb_list, &bdi->wb.b_dirty); >> spin_unlock(&bdi->wb.list_lock); >> >> <<< though there is dirty_io, but wakeup_bdi is false, >> <<< so writeback thread will not be waked up and >> <<< the new dirty_io will not be flushed. >> if (wakeup_bdi) >> bdi_wakeup_thread_delayed(bdi); >> >> Writeback will run until there is a new flush work queued. >> This may cause a lot of dirty pages stay in memory for a long time. >> >> Cc: Jan Kara <jack@xxxxxxx> >> Cc: Fengguang Wu <fengguang.wu@xxxxxxxxx> >> Signed-off-by: Junxiao Bi <junxiao.bi@xxxxxxxxxx> > Looks good. You can add: > Reviewed-by: Jan Kara <jack@xxxxxxx> > > Honza > >> --- >> fs/fs-writeback.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c >> index 68851ff..053ec91 100644 >> --- a/fs/fs-writeback.c >> +++ b/fs/fs-writeback.c >> @@ -1173,6 +1173,8 @@ void __mark_inode_dirty(struct inode *inode, int flags) >> bool wakeup_bdi = false; >> bdi = inode_to_bdi(inode); >> >> + spin_unlock(&inode->i_lock); >> + spin_lock(&bdi->wb.list_lock); >> if (bdi_cap_writeback_dirty(bdi)) { >> WARN(!test_bit(BDI_registered, &bdi->state), >> "bdi-%s not registered\n", bdi->name); >> @@ -1187,8 +1189,6 @@ void __mark_inode_dirty(struct inode *inode, int flags) >> wakeup_bdi = true; >> } >> >> - spin_unlock(&inode->i_lock); >> - spin_lock(&bdi->wb.list_lock); >> inode->dirtied_when = jiffies; >> list_move(&inode->i_wb_list, &bdi->wb.b_dirty); >> spin_unlock(&bdi->wb.list_lock); >> -- >> 1.7.9.5 >> -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html