On Thu, Jul 11, 2013 at 4:58 AM, Jan Kara <jack@xxxxxxx> wrote: > On Thu 11-07-13 12:53:46, Jan Kara wrote: >> On Wed 10-07-13 16:12:36, Paul Taysom wrote: >> > The following commit introduced a 10x regression for >> > syncing inodes in ext4 with relatime enabled where just >> > the atime had been modified. >> > >> > commit 4ea425b63a3dfeb7707fc7cc7161c11a51e871ed >> > Author: Jan Kara <jack@xxxxxxx> >> > Date: Tue Jul 3 16:45:34 2012 +0200 >> > vfs: Avoid unnecessary WB_SYNC_NONE writeback during sys_sync and reorder sync passes >> > >> > See also: http://www.kernelhub.org/?msg=93100&p=2 >> > >> > Fixed by putting back in the call to writeback_inodes_sb. >> > >> > I'll attach the test in a reply to this e-mail. >> > >> > The test starts by creating 512 files, syncing, reading one byte >> > from each of those files, syncing, and then deleting each file >> > and syncing. The time to do each sync is printed. The process >> > is then repeated for 1024 files and then the next power of >> > two up to 262144 files. >> > >> > Note, when running the test, the slow down doesn't always happen >> > but most of the tests will show a slow down. >> > >> > In response to crbug.com/240422 >> > >> > Signed-off-by: Paul Taysom <taysom@xxxxxxxxxxxx> >> Thanks for report. Rather than blindly reverting the change, I'd like to >> understand why you see so huge regression. As the changelog in the patch >> says, flusher thread should be doing async writeback equivalent to the >> removed one because it gets woken via wakeup_flusher_threads(). But my >> guess is that for some reason we end up doing all the writeback from >> sync_inodes_one_sb(). I'll try to reproduce your results and investigate... > Hum, so it must be something timing sensitive. I wasn't able to reproduce > the issue on my test machine in 4 runs of your test program. I was able to > reproduce it on my laptop on every second run of the test program but once > I've enabled some tracepoints, the issue disappeared and I didn't see it in > about 10 runs. > > That being said I think that reverting my patch is just papering over the > problem. We will do the async pass over inodes twice instead of once > and thus the timing changes enough that you aren't able to observe the > problem. > > I'm looking into this more... > > Honza >> > --- >> > fs/sync.c | 7 +++++++ >> > 1 file changed, 7 insertions(+) >> > >> > diff --git a/fs/sync.c b/fs/sync.c >> > index 905f3f6..55c3316 100644 >> > --- a/fs/sync.c >> > +++ b/fs/sync.c >> > @@ -73,6 +73,12 @@ static void sync_inodes_one_sb(struct super_block *sb, void *arg) >> > sync_inodes_sb(sb); >> > } >> > >> > +static void writeback_inodes_one_sb(struct super_block *sb, void *arg) >> > +{ >> > + if (!(sb->s_flags & MS_RDONLY)) >> > + writeback_inodes_sb(sb, WB_REASON_SYNC); >> > +} >> > + >> > static void sync_fs_one_sb(struct super_block *sb, void *arg) >> > { >> > if (!(sb->s_flags & MS_RDONLY) && sb->s_op->sync_fs) >> > @@ -104,6 +110,7 @@ SYSCALL_DEFINE0(sync) >> > int nowait = 0, wait = 1; >> > >> > wakeup_flusher_threads(0, WB_REASON_SYNC); >> > + iterate_supers(writeback_inodes_one_sb, NULL); >> > iterate_supers(sync_inodes_one_sb, NULL); >> > iterate_supers(sync_fs_one_sb, &nowait); >> > iterate_supers(sync_fs_one_sb, &wait); >> > -- >> > 1.8.3 >> > >> -- >> Jan Kara <jack@xxxxxxx> >> SUSE Labs, CR > -- > Jan Kara <jack@xxxxxxx> > SUSE Labs, CR I've tried Dave Chinner's patch but it doesn't seem to help. Looking at the references to WB_SYNC_NONE flag I found the interesting comment in fs/ext4/inodes.c write_cache_pages_da: ... lock_page(page); /* * If the page is no longer dirty, or its * mapping no longer corresponds to inode we * are writing (which means it has been * truncated or invalidated), or the page is * already under writeback and we are not * doing a data integrity writeback, skip the page */ if (!PageDirty(page) || (PageWriteback(page) && (wbc->sync_mode == WB_SYNC_NONE)) || unlikely(page->mapping != mapping)) { unlock_page(page); continue; } wait_on_page_writeback(page); BUG_ON(PageWriteback(page)); ... I'm wondering if one inode is getting written out then the next inode in the same page waits for the writeback to finish. writeback_inodes_sb_nt sets the sync_mode to WB_SYNC_NONE sync_inodes_sb set the sync_mode to WB_SYNC_ALL. -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html