Re: [PATCH] block : add larger order folio size instead of pages

Christoph Hellwig <hch@xxxxxx> · Sat, 27 Apr 2024 10:14:21 +0200

On Wed, Apr 24, 2024 at 06:52:46PM +0530, Kundan Kumar wrote:
> On 22/04/24 01:14PM, Christoph Hellwig wrote:
>>> +		folio = page_folio(page);
>>> +
>>> +		if (!folio_test_large(folio) ||
>>> +		   (bio_op(bio) == REQ_OP_ZONE_APPEND)) {
>>
>> I don't understand why you need this branch.  All the arithmetics
>> below should also work just fine for non-large folios
>
> The branch helps to skip these calculations for zero order folio:
> A) folio_offset = (folio_page_idx(folio, page) << PAGE_SHIFT) + offset;
> B) folio_size(folio)

Well, we'll need to just handle folio and stop special casing
order 0 ones eventually.

> If we convert bio_iov_add_page() to bio_iov_add_folio()/bio_add_folio(),
> we see a decline of about 11% for 4K I/O. When mTHP is enabled we may get
> a large order folio even for a 4K I/O. The folio_offset may become larger
> than 4K and we endup using expensive mempool_alloc during nvme_map_data in
> NVMe driver[1].
>
> [1]
> static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req,
>                struct nvme_command *cmnd)
> {
> ...
> ...
>                        if (bv.bv_offset + bv.bv_len <= NVME_CTRL_PAGE_SIZE * 2)

We can replace this with:

	if ((bv->bv_offset & (NVME_CTRL_PAGE_SIZE - 1)) + bv.bv_len <=
	    NVME_CTRL_PAGE_SIZE * 2)

as nvme_setup_prp_simple just masks away the high bits anyway.