Re: [f2fs-dev] [PATCH 2/2] f2fs: read contiguous sit entry pages by merging for mount performance

Gu Zheng <guz.fnst@xxxxxxxxxxxxxx> · Thu, 14 Nov 2013 18:31:03 +0800

Hi Yu,
On 11/13/2013 04:10 PM, Chao Yu wrote:

> Hi Gu,
> 
>> -----Original Message-----
>> From: Gu Zheng [mailto:guz.fnst@xxxxxxxxxxxxxx]
>> Sent: Wednesday, November 13, 2013 11:39 AM
>> To: Chao Yu
>> Cc: ???; linux-fsdevel@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx; 谭姝
>> Subject: Re: [f2fs-dev] [PATCH 2/2] f2fs: read contiguous sit entry pages by merging for mount performance
>>
>> Hi Yu,
>> On 11/12/2013 01:18 PM, Chao Yu wrote:
>>
>>> Previously we read sit entries page one by one, this method lost the chance of reading contiguous page together.
>>> So we read pages as contiguous as possible for better mount performance.
>>>
>>> Signed-off-by: Chao Yu <chao2.yu@xxxxxxxxxxx>
>>> ---
>>>  fs/f2fs/f2fs.h    |    2 ++
>>>  fs/f2fs/segment.c |   65 ++++++++++++++++++++++++++++++++++++++++++++++++++---
>>>  fs/f2fs/segment.h |    2 ++
>>>  3 files changed, 66 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
>>> index 0afdcec..bfe9d87 100644
>>> --- a/fs/f2fs/f2fs.h
>>> +++ b/fs/f2fs/f2fs.h
>>> @@ -1113,6 +1113,8 @@ struct page *find_data_page(struct inode *, pgoff_t, bool);
>>>  struct page *get_lock_data_page(struct inode *, pgoff_t);
>>>  struct page *get_new_data_page(struct inode *, struct page *, pgoff_t, bool);
>>>  int f2fs_readpage(struct f2fs_sb_info *, struct page *, block_t, int);
>>> +void f2fs_submit_read_bio(struct f2fs_sb_info *, int);
>>> +void submit_read_page(struct f2fs_sb_info *, struct page *, block_t, int);
>>
>> Better to move these declarations into PATCH 1/2.
> 
> Okay, I will move it to the right place.
> 
>>
>>>  int do_write_data_page(struct page *);
>>>
>>>  /*
>>> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
>>> index 86dc289..414c351 100644
>>> --- a/fs/f2fs/segment.c
>>> +++ b/fs/f2fs/segment.c
>>> @@ -1474,19 +1474,72 @@ static int build_curseg(struct f2fs_sb_info *sbi)
>>>  	return restore_curseg_summaries(sbi);
>>>  }
>>>
>>> +static int ra_sit_pages(struct f2fs_sb_info *sbi, int start,
>>> +					int nrpages, bool *is_order)
>>> +{
>>> +	struct address_space *mapping = sbi->meta_inode->i_mapping;
>>> +	struct sit_info *sit_i = SIT_I(sbi);
>>> +	struct page *page;
>>> +	block_t blk_addr;
>>> +	int blkno, readcnt = 0;
>>> +	int sit_blk_cnt = SIT_BLK_CNT(sbi);
>>> +
>>> +	for (blkno = start; blkno < start + nrpages; blkno++) {
>>> +
>>> +		if (blkno >= sit_blk_cnt)
>>
>> Merge these two judgements:
>> for (blkno = start; blkno < start + nrpages && blkno < sit_blk_cnt; blkno++)
> 
> Right, but the line may over 80 characters, if we split this line, it seems not suitable.
> So how about this?
> 	int blkno = start, readcnt = 0;
> 	int sit_blk_cnt = SIT_BLK_CNT(sbi);
> 
> 	for (; blkno < start + nrpages && blkno < sit_blk_cnt; blkno++) {

More neat！

> 
>>
>>> +			goto out;
>>
>>> +		if ((!f2fs_test_bit(blkno, sit_i->sit_bitmap) ^ !*is_order)) {
>>> +			*is_order = !*is_order;
>>> +			goto out;
>>
>> 'Break' seems more suitable.
> 
> Yes, you are right.
> 
>>
>>> +		}
>>> +
>>> +		blk_addr = sit_i->sit_base_addr + blkno;
>>> +		if (*is_order)
>>> +			blk_addr += sit_i->sit_blocks;
>>> +repeat:
>>> +		page = grab_cache_page(mapping, blk_addr);
>>> +		if (!page) {
>>> +			cond_resched();
>>> +			goto repeat;
>>> +		}
>>> +		if (PageUptodate(page)) {
>>> +			f2fs_put_page(page, 1);
>>> +			readcnt++;
>>> +			goto out;
>>
>> Here may be 'Continue'.
> 
> 'Out' label could be removed after this modification.
> It seems more neat.

Right.

> 
>>
>>> +		}
>>> +
>>> +		submit_read_page(sbi, page, blk_addr, READ_SYNC);
>>> +
>>> +		page_cache_release(page);
>>
>> Put page here seems not a good idea, otherwise all your work may be in vain.
> 
> You mean that pages could be reclaimed by VM when out of memory?
> IMO, it is designed more like VM read ahead because we should concern 
> memory state of system, and still we have second chance to read these pages.
> 

Yes, but we can avoid to read the same page secondly, that's a serious waste, if we still
reread the page in get_current_sit_page(), all the improvement will disappear.

> Could we use mark_page_accessed () to delay VM reclaimed them?

IMO, this is the right way.

> 
>>
>>> +		readcnt++;
>>> +	}
>>> +out:
>>> +	f2fs_submit_read_bio(sbi, READ_SYNC);
>>> +	return readcnt;
>>> +}
>>> +
>>>  static void build_sit_entries(struct f2fs_sb_info *sbi)
>>>  {
>>>  	struct sit_info *sit_i = SIT_I(sbi);
>>>  	struct curseg_info *curseg = CURSEG_I(sbi, CURSEG_COLD_DATA);
>>>  	struct f2fs_summary_block *sum = curseg->sum_blk;
>>> -	unsigned int start;
>>> +	bool is_order = f2fs_test_bit(0, sit_i->sit_bitmap) ? true : false;
>>> +	int sit_blk_cnt = SIT_BLK_CNT(sbi);
>>> +	int bio_blocks = MAX_BIO_BLOCKS(max_hw_blocks(sbi));
>>> +	unsigned int i, start, end;
>>> +	unsigned int readed, start_blk = 0;
>>>
>>> -	for (start = 0; start < TOTAL_SEGS(sbi); start++) {
>>> +next:
>>> +	readed = ra_sit_pages(sbi, start_blk, bio_blocks, &is_order);
>>
>> In fact, you know how many blocks that you want to read(SIT_BLK_CNT(sbi)),
>> so here sit_blk_cnt is more suitable than a MAX one, and it also can make
>> the logic of ra_sit_pages more simple.
> 
> Right.
> 
> BTW, I am considering that maybe we should send dynamical cnt which 
> depend on memory state of system more than the logic of ra_sit_pages.
> May it's the work of anther patch. How do you think?

Agree. It's the same as the way that I suggested previously, and it can make
the contiguous pages reading api more common.

Regards,
Gu

> 
>>
>>> +
>>> +	start = start_blk * sit_i->sents_per_block;
>>> +	end = (start_blk + readed) * sit_i->sents_per_block;
>>> +
>>> +	for (; start < end && start < TOTAL_SEGS(sbi); start++) {
>>>  		struct seg_entry *se = &sit_i->sentries[start];
>>>  		struct f2fs_sit_block *sit_blk;
>>>  		struct f2fs_sit_entry sit;
>>>  		struct page *page;
>>> -		int i;
>>>
>>>  		mutex_lock(&curseg->curseg_mutex);
>>>  		for (i = 0; i < sits_in_cursum(sum); i++) {
>>> @@ -1497,6 +1550,7 @@ static void build_sit_entries(struct f2fs_sb_info *sbi)
>>>  			}
>>>  		}
>>>  		mutex_unlock(&curseg->curseg_mutex);
>>> +
>>>  		page = get_current_sit_page(sbi, start);
>>>  		sit_blk = (struct f2fs_sit_block *)page_address(page);
>>>  		sit = sit_blk->entries[SIT_ENTRY_OFFSET(sit_i, start)];
>>> @@ -1509,6 +1563,11 @@ got_it:
>>>  			e->valid_blocks += se->valid_blocks;
>>>  		}
>>>  	}
>>> +
>>> +	start_blk += readed;
>>> +	if (start_blk >= sit_blk_cnt)
>>> +		return;
>>> +	goto next;
>>
>> Using do {...} while(start_blk < sit_blk_cnt) rather than the so big upstream goto.
> 
> Yes, you are right.
> 
>>
>>>  }
>>>
>>>  static void init_free_segmap(struct f2fs_sb_info *sbi)
>>> diff --git a/fs/f2fs/segment.h b/fs/f2fs/segment.h
>>> index 269f690..ad5b9f1 100644
>>> --- a/fs/f2fs/segment.h
>>> +++ b/fs/f2fs/segment.h
>>> @@ -83,6 +83,8 @@
>>>  	(segno / SIT_ENTRY_PER_BLOCK)
>>>  #define	START_SEGNO(sit_i, segno)		\
>>>  	(SIT_BLOCK_OFFSET(sit_i, segno) * SIT_ENTRY_PER_BLOCK)
>>> +#define SIT_BLK_CNT(sbi)			\
>>> +	((TOTAL_SEGS(sbi) + SIT_ENTRY_PER_BLOCK - 1) / SIT_ENTRY_PER_BLOCK)
>>>  #define f2fs_bitmap_size(nr)			\
>>>  	(BITS_TO_LONGS(nr) * sizeof(unsigned long))
>>>  #define TOTAL_SEGS(sbi)	(SM_I(sbi)->main_segments)
> 
> Thanks!
> 
> Regards,
> Yu
> 
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html