On Wed, Apr 24, 2024 at 06:52:46PM +0530, Kundan Kumar wrote: > On 22/04/24 01:14PM, Christoph Hellwig wrote: >>> + folio = page_folio(page); >>> + >>> + if (!folio_test_large(folio) || >>> + (bio_op(bio) == REQ_OP_ZONE_APPEND)) { >> >> I don't understand why you need this branch. All the arithmetics >> below should also work just fine for non-large folios > > The branch helps to skip these calculations for zero order folio: > A) folio_offset = (folio_page_idx(folio, page) << PAGE_SHIFT) + offset; > B) folio_size(folio) Well, we'll need to just handle folio and stop special casing order 0 ones eventually. > If we convert bio_iov_add_page() to bio_iov_add_folio()/bio_add_folio(), > we see a decline of about 11% for 4K I/O. When mTHP is enabled we may get > a large order folio even for a 4K I/O. The folio_offset may become larger > than 4K and we endup using expensive mempool_alloc during nvme_map_data in > NVMe driver[1]. > > [1] > static blk_status_t nvme_map_data(struct nvme_dev *dev, struct request *req, > struct nvme_command *cmnd) > { > ... > ... > if (bv.bv_offset + bv.bv_len <= NVME_CTRL_PAGE_SIZE * 2) We can replace this with: if ((bv->bv_offset & (NVME_CTRL_PAGE_SIZE - 1)) + bv.bv_len <= NVME_CTRL_PAGE_SIZE * 2) as nvme_setup_prp_simple just masks away the high bits anyway.