Thanks Long. Hello Jason, I am the author of the patch. To your comment below : " As I've already said, you are supposed to set the value that limits to ib_sge and *NOT* the value that is related to ib_umem_find_best_pgsz. It is usually 2G because the ib_sge's typically work on a 32 bit length." The ib_sge is limited by the __sg_alloc_table_from_pages() which uses ib_dma_max_seg_size() which is what is set by the eth driver using dma_set_max_seg_size() . Currently our hw does not support PTEs larger than 2M. So ib_umem_find_best_pgsz() takes as an input PG_SZ_BITMAP . The bitmap has all the bits set for the page sizes supported by the HW. #define PAGE_SZ_BM (SZ_4K | SZ_8K | SZ_16K | SZ_32K | SZ_64K | SZ_128K \ | SZ_256K | SZ_512K | SZ_1M | SZ_2M) Are you suggesting we are too restrictive in the bitmap we are passing ? or that we should not set this bitmap let the function choose default ? Regards, Ajay -----Original Message----- From: Jason Gunthorpe <jgg@xxxxxxxx> Sent: Tuesday, May 17, 2022 5:04 PM To: Long Li <longli@xxxxxxxxxxxxx> Cc: Ajay Sharma <sharmaajay@xxxxxxxxxxxxx>; KY Srinivasan <kys@xxxxxxxxxxxxx>; Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>; Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>; Wei Liu <wei.liu@xxxxxxxxxx>; Dexuan Cui <decui@xxxxxxxxxxxxx>; David S. Miller <davem@xxxxxxxxxxxxx>; Jakub Kicinski <kuba@xxxxxxxxxx>; Paolo Abeni <pabeni@xxxxxxxxxx>; Leon Romanovsky <leon@xxxxxxxxxx>; linux-hyperv@xxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; linux-rdma@xxxxxxxxxxxxxxx Subject: [EXTERNAL] Re: [PATCH 05/12] net: mana: Set the DMA device max page size [You don't often get email from jgg@xxxxxxxx. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification.] On Tue, May 17, 2022 at 08:04:58PM +0000, Long Li wrote: > > Subject: Re: [PATCH 05/12] net: mana: Set the DMA device max page > > size > > > > On Tue, May 17, 2022 at 07:32:51PM +0000, Long Li wrote: > > > > Subject: Re: [PATCH 05/12] net: mana: Set the DMA device max > > > > page size > > > > > > > > On Tue, May 17, 2022 at 02:04:29AM -0700, > > > > longli@xxxxxxxxxxxxxxxxx > > wrote: > > > > > From: Long Li <longli@xxxxxxxxxxxxx> > > > > > > > > > > The system chooses default 64K page size if the device does > > > > > not specify the max page size the device can handle for DMA. > > > > > This do not work well when device is registering large chunk > > > > > of memory in that a large page size is more efficient. > > > > > > > > > > Set it to the maximum hardware supported page size. > > > > > > > > For RDMA devices this should be set to the largest segment size > > > > an ib_sge can take in when posting work. It should not be the > > > > page size of MR. 2M is a weird number for that, are you sure it is right? > > > > > > Yes, this is the maximum page size used in hardware page tables. > > > > As I said, it should be the size of the sge in the WQE, not the > > "hardware page tables" > > This driver uses the following code to figure out the largest page > size for memory registration with hardware: > > page_sz = ib_umem_find_best_pgsz(mr->umem, PAGE_SZ_BM, iova); > > In this function, mr->umem is created with ib_dma_max_seg_size() as > its max segment size when creating its sgtable. > > The purpose of setting DMA page size to 2M is to make sure this > function returns the largest possible MR size that the hardware can > take. Otherwise, this function will return 64k: the default DMA size. As I've already said, you are supposed to set the value that limits to ib_sge and *NOT* the value that is related to ib_umem_find_best_pgsz. It is usually 2G because the ib_sge's typically work on a 32 bit length. Jason