Hi Jaegeuk, > -----Original Message----- > From: Jaegeuk Kim [mailto:jaegeuk@xxxxxxxxxx] > Sent: Wednesday, October 07, 2015 7:22 AM > To: Chao Yu > Cc: linux-kernel@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx; > linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [f2fs-dev] [PATCH] f2fs: merge meta writes as many possible > > Hi Chao, > > On Tue, Oct 06, 2015 at 10:54:12PM +0800, Chao Yu wrote: > > Hi Jaegeuk, > > > > > -----Original Message----- > > > From: Jaegeuk Kim [mailto:jaegeuk@xxxxxxxxxx] > > > Sent: Saturday, October 03, 2015 12:48 AM > > > To: linux-kernel@xxxxxxxxxxxxxxx; linux-fsdevel@xxxxxxxxxxxxxxx; > > > linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; > > > linux-fsdevel@xxxxxxxxxxxxxxx; linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx > > > Cc: Jaegeuk Kim; Jaegeuk Kim > > > Subject: [f2fs-dev] [PATCH] f2fs: merge meta writes as many possible > > > > > > This patch tries to merge IOs as many as possible when background flusher > > > conducts flushing the dirty meta pages. > > > > > > [Before] > > > > > > ... > > > 2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124320, size = 4096 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124560, size = 32768 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95720, size = 987136 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123928, size = 4096 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123944, size = 8192 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123968, size = 45056 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124064, size = 4096 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 97648, size = 1007616 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123776, size = 8192 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123800, size = 32768 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 124624, size = 4096 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 99616, size = 921600 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123608, size = 4096 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123624, size = 77824 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123792, size = 4096 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 123864, size = 32768 > > > ... > > > > > > [After] > > > > > > ... > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 92168, size = 892928 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 93912, size = 753664 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 95384, size = 716800 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 96784, size = 712704 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104160, size = 364544 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 104872, size = 356352 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 105568, size = 278528 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106112, size = 319488 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 106736, size = 258048 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107240, size = 270336 > > > f2fs_submit_write_bio: dev = (8,18), WRITE_SYNC(MP), META, sector = 107768, size = 180224 > > > ... > > > > > > Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx> > > > --- > > > fs/f2fs/checkpoint.c | 8 +++++++- > > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c > > > index ff53405..da5ee7e 100644 > > > --- a/fs/f2fs/checkpoint.c > > > +++ b/fs/f2fs/checkpoint.c > > > @@ -257,7 +257,7 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, > > > long nr_to_write) > > > { > > > struct address_space *mapping = META_MAPPING(sbi); > > > - pgoff_t index = 0, end = LONG_MAX; > > > + pgoff_t index = 0, end = LONG_MAX, prev = LONG_MAX; > > > struct pagevec pvec; > > > long nwritten = 0; > > > struct writeback_control wbc = { > > > @@ -277,6 +277,11 @@ long sync_meta_pages(struct f2fs_sb_info *sbi, enum page_type type, > > > for (i = 0; i < nr_pages; i++) { > > > struct page *page = pvec.pages[i]; > > > > > > + if (prev == LONG_MAX) > > > + prev = page->index - 1; > > > + if (nr_to_write != LONG_MAX && page->index != prev + 1) If we reach this condition, should we break the 'while' loop outside? Because it's not possible to meet the page with 'prev + 1' index in any page vec found afterward. > > > > Does this mean we only writeback consecutive meta page of SSA region? If these meta > > pages are updated randomly (in collapse range or insert range case), we will writeback > > very few meta pages in one round of flush, it may cause low performance since FTL will > > do the force GC on our meta page update region if we writeback meta pages in one segment > > repeatly. > > I don't think this would degrade the performance pretty much. Rather than that, > as the above traces, I could see more possitive impact on IO patterns when I > gave some delay on flushing meta pages. (I got the traces when copying kernel > source tree.) Moreover, the amount of the meta page writes is relatively lower > than other data types. Previously I only track collapse/insert case, so I'm just worry about those corner cases. Today I track the normal case, the trace info looks good to me! :) > > > IMO, when the distribution of dirty SSA pages are random, at least we should writeback > > all dirty meta pages in SSA region which align to our segment size under block plug. > > > > One more thing is that I found in xfstest tests/generic/019 always cause inconsistent > > between ssa info and meta info of inode (i_blocks != total_blk_cnt). The reason of the > > problem is in ->falloc::collapse, we will update ssa meta page and node page info > > together, however, unfortunately ssa meta page is writeback by kworker, node page will > > no longer be persisted since fail_make_request made f2fs flagged as CP_ERROR_FLAG. > > > > So writeback SSA pages leads the potential risk of producing consistent issue. I think > > there are some ways can fix this: > > 1) don't writeback SSA pages in kworker, but side-effect is more memory will be cost > > since cp is executed, of course, periodical cp can fix the memory cost issue. > > 2) add some tag in somewhere, we can recover the ssa info with dnode page, but this is > > really a big change. > > > > How do you think? Any better idea? > > Good catch! > I think we can allocate a new block address for this case like the below patch. > > From 4979a1cbd7c718cff946cb421fbc0763a3dbd0df Mon Sep 17 00:00:00 2001 > From: Jaegeuk Kim <jaegeuk@xxxxxxxxxx> > Date: Tue, 6 Oct 2015 15:41:58 -0700 > Subject: [PATCH] f2fs: avoid SSA corruption when replacing block addresses > > If block addresses are pointed by the previous checkpoint, we should not update > the SSA when replacing block addresses by f2fs_collapse_range. > > This patch allocates a new block address if there is an exising block address. This patch doesn't fix that issue, because SSA block related to new_blkaddr will be overwritten, not the one related to old_blkaddr. So still I can reproduce the issue after applying this patch. Thanks, > > Signed-off-by: Jaegeuk Kim <jaegeuk@xxxxxxxxxx> > --- > fs/f2fs/segment.c | 12 ++++++++++++ > 1 file changed, 12 insertions(+) > > diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c > index 1d86a35..3c546eb 100644 > --- a/fs/f2fs/segment.c > +++ b/fs/f2fs/segment.c > @@ -1391,8 +1391,20 @@ void f2fs_replace_block(struct f2fs_sb_info *sbi, struct dnode_of_data > *dn, > { > struct f2fs_summary sum; > > + /* > + * we should not use the existing address to avoid SSA > + * corruption. > + */ > set_summary(&sum, dn->nid, dn->ofs_in_node, version); > > + if (recover_curseg && old_addr != NULL_ADDR && old_addr != NEW_ADDR) { > + allocate_data_block(sbi, NULL, dn->data_blkaddr, > + &dn->data_blkaddr, &sum, CURSEG_WARM_DATA); > + set_data_blkaddr(dn); > + f2fs_update_extent_cache(dn); > + old_addr = dn->data_blkaddr; > + } > + > __f2fs_replace_block(sbi, &sum, old_addr, new_addr, recover_curseg); > > dn->data_blkaddr = new_addr; > -- > 2.1.1 -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html