(resending in plain text) > -----Original Message----- > From: Michael Kelley <mailto:mhklinux@xxxxxxxxxxx> > Sent: Tuesday, June 11, 2024 12:35 PM > To: Haiyang Zhang <mailto:haiyangz@xxxxxxxxxxxxx>; mailto:linux-hyperv@xxxxxxxxxxxxxxx; > mailto:netdev@xxxxxxxxxxxxxxx > Cc: Dexuan Cui <mailto:decui@xxxxxxxxxxxxx>; mailto:stephen@xxxxxxxxxxxxxxxxxx; KY > Srinivasan <mailto:kys@xxxxxxxxxxxxx>; Paul Rosswurm <mailto:paulros@xxxxxxxxxxxxx>; > mailto:olaf@xxxxxxxxx; vkuznets <mailto:vkuznets@xxxxxxxxxx>; mailto:davem@xxxxxxxxxxxxx; > mailto:wei.liu@xxxxxxxxxx; mailto:edumazet@xxxxxxxxxx; mailto:kuba@xxxxxxxxxx; > mailto:pabeni@xxxxxxxxxx; mailto:leon@xxxxxxxxxx; Long Li <mailto:longli@xxxxxxxxxxxxx>; > mailto:ssengar@xxxxxxxxxxxxxxxxxxx; mailto:linux-rdma@xxxxxxxxxxxxxxx; > mailto:daniel@xxxxxxxxxxxxx; mailto:john.fastabend@xxxxxxxxx; mailto:bpf@xxxxxxxxxxxxxxx; > mailto:ast@xxxxxxxxxx; mailto:hawk@xxxxxxxxxx; mailto:tglx@xxxxxxxxxxxxx; > mailto:shradhagupta@xxxxxxxxxxxxxxxxxxx; mailto:linux-kernel@xxxxxxxxxxxxxxx > Subject: RE: [PATCH net-next] net: mana: Add support for variable page > sizes of ARM64 > > From: LKML haiyangz <mailto:lkmlhyz@xxxxxxxxxxxxx> On Behalf Of Haiyang Zhang > Sent: Monday, June 10, 2024 2:23 PM > > > > As defined by the MANA Hardware spec, the queue size for DMA is 4KB > > minimal, and power of 2. > > You say the hardware requires 4K "minimal". But the definitions in this > patch hardcode to 4K, as if that's the only choice. Is the hardcoding to > 4K a design decision made to simplify the MANA driver? The HWC q size has to be exactly 4k, which is by HW design. Other "regular" queues can be 2^n >= 4k. > > > To support variable page sizes (4KB, 16KB, 64KB) of ARM64, define > > A minor nit, but "variable" page size doesn't seem like quite the right > description -- both here and in the Subject line. On ARM64, the page > size > is a choice among a few fixed options. Perhaps call it support for "page > sizes > other than 4K"? "page sizes other than 4K" sounds good. > > > the minimal queue size as a macro separate from the PAGE_SIZE, which > > we always assumed it to be 4KB before supporting ARM64. > > Also, update the relevant code related to size alignment, DMA region > > calculations, etc. > > > > Signed-off-by: Haiyang Zhang <mailto:haiyangz@xxxxxxxxxxxxx> > > --- > > drivers/net/ethernet/microsoft/Kconfig | 2 +- > > .../net/ethernet/microsoft/mana/gdma_main.c | 8 +++---- > > .../net/ethernet/microsoft/mana/hw_channel.c | 22 +++++++++---------- > > drivers/net/ethernet/microsoft/mana/mana_en.c | 8 +++---- > > .../net/ethernet/microsoft/mana/shm_channel.c | 9 ++++---- > > include/net/mana/gdma.h | 7 +++++- > > include/net/mana/mana.h | 3 ++- > > 7 files changed, 33 insertions(+), 26 deletions(-) > > > > diff --git a/drivers/net/ethernet/microsoft/Kconfig > > b/drivers/net/ethernet/microsoft/Kconfig > > index 286f0d5697a1..901fbffbf718 100644 > > --- a/drivers/net/ethernet/microsoft/Kconfig > > +++ b/drivers/net/ethernet/microsoft/Kconfig > > @@ -18,7 +18,7 @@ if NET_VENDOR_MICROSOFT > > config MICROSOFT_MANA > > tristate "Microsoft Azure Network Adapter (MANA) support" > > depends on PCI_MSI > > - depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN && ARM64_4K_PAGES) > > + depends on X86_64 || (ARM64 && !CPU_BIG_ENDIAN) > > depends on PCI_HYPERV > > select AUXILIARY_BUS > > select PAGE_POOL > > diff --git a/drivers/net/ethernet/microsoft/mana/gdma_main.c > > b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > index 1332db9a08eb..c9df942d0d02 100644 > > --- a/drivers/net/ethernet/microsoft/mana/gdma_main.c > > +++ b/drivers/net/ethernet/microsoft/mana/gdma_main.c > > @@ -182,7 +182,7 @@ int mana_gd_alloc_memory(struct gdma_context *gc, > > unsigned int length, > > dma_addr_t dma_handle; > > void *buf; > > > > - if (length < PAGE_SIZE || !is_power_of_2(length)) > > + if (length < MANA_MIN_QSIZE || !is_power_of_2(length)) > > return -EINVAL; > > > > gmi->dev = gc->dev; > > @@ -717,7 +717,7 @@ EXPORT_SYMBOL_NS(mana_gd_destroy_dma_region, > > NET_MANA); > > static int mana_gd_create_dma_region(struct gdma_dev *gd, > > struct gdma_mem_info *gmi) > > { > > - unsigned int num_page = gmi->length / PAGE_SIZE; > > + unsigned int num_page = gmi->length / MANA_MIN_QSIZE; > > This calculation seems a bit weird when using MANA_MIN_QSIZE. The > number of pages, and the construction of the page_addr_list array > a few lines later, seem unrelated to the concept of a minimum queue > size. Is the right concept really a "mapping chunk", and num_page > would conceptually be "num_chunks", or something like that? Then > a queue must be at least one chunk in size, but that's derived from the > chunk size, and is not the core concept. I think calling it "num_chunks" is fine. May I use "num_chunks" in next version? > > Another approach might be to just call it "MANA_PAGE_SIZE", like > has been done with HV_HYP_PAGE_SIZE. HV_HYP_PAGE_SIZE exists to > handle exactly the same issue of the guest PAGE_SIZE potentially > being different from the fixed 4K size that must be used in host-guest > communication on Hyper-V. Same thing here with MANA. I actually called it "MANA_PAGE_SIZE" in my previous internal patch. But Paul from Hostnet team opposed using that name, because 4kB is the min q size. MANA doesn't have "page" at HW level. > > struct gdma_create_dma_region_req *req = NULL; > > struct gdma_create_dma_region_resp resp = {}; > > struct gdma_context *gc = gd->gdma_context; > > @@ -727,7 +727,7 @@ static int mana_gd_create_dma_region(struct > gdma_dev *gd, > > int err; > > int i; > > > > - if (length < PAGE_SIZE || !is_power_of_2(length)) > > + if (length < MANA_MIN_QSIZE || !is_power_of_2(length)) > > return -EINVAL; > > > > if (offset_in_page(gmi->virt_addr) != 0) > > @@ -751,7 +751,7 @@ static int mana_gd_create_dma_region(struct > gdma_dev *gd, > > req->page_addr_list_len = num_page; > > > > for (i = 0; i < num_page; i++) > > - req->page_addr_list[i] = gmi->dma_handle + i * PAGE_SIZE; > > + req->page_addr_list[i] = gmi->dma_handle + i * > MANA_MIN_QSIZE; > > > > err = mana_gd_send_request(gc, req_msg_size, req, sizeof(resp), > &resp); > > if (err) > > diff --git a/drivers/net/ethernet/microsoft/mana/hw_channel.c > > b/drivers/net/ethernet/microsoft/mana/hw_channel.c > > index bbc4f9e16c98..038dc31e09cd 100644 > > --- a/drivers/net/ethernet/microsoft/mana/hw_channel.c > > +++ b/drivers/net/ethernet/microsoft/mana/hw_channel.c > > @@ -362,12 +362,12 @@ static int mana_hwc_create_cq(struct > hw_channel_context > > *hwc, u16 q_depth, > > int err; > > > > eq_size = roundup_pow_of_two(GDMA_EQE_SIZE * q_depth); > > - if (eq_size < MINIMUM_SUPPORTED_PAGE_SIZE) > > - eq_size = MINIMUM_SUPPORTED_PAGE_SIZE; > > + if (eq_size < MANA_MIN_QSIZE) > > + eq_size = MANA_MIN_QSIZE; > > > > cq_size = roundup_pow_of_two(GDMA_CQE_SIZE * q_depth); > > - if (cq_size < MINIMUM_SUPPORTED_PAGE_SIZE) > > - cq_size = MINIMUM_SUPPORTED_PAGE_SIZE; > > + if (cq_size < MANA_MIN_QSIZE) > > + cq_size = MANA_MIN_QSIZE; > > > > hwc_cq = kzalloc(sizeof(*hwc_cq), GFP_KERNEL); > > if (!hwc_cq) > > @@ -429,7 +429,7 @@ static int mana_hwc_alloc_dma_buf(struct > > hw_channel_context *hwc, u16 q_depth, > > > > dma_buf->num_reqs = q_depth; > > > > - buf_size = PAGE_ALIGN(q_depth * max_msg_size); > > + buf_size = MANA_MIN_QALIGN(q_depth * max_msg_size); > > > > gmi = &dma_buf->mem_info; > > err = mana_gd_alloc_memory(gc, buf_size, gmi); > > @@ -497,8 +497,8 @@ static int mana_hwc_create_wq(struct > hw_channel_context > > *hwc, > > else > > queue_size = roundup_pow_of_two(GDMA_MAX_SQE_SIZE * > > q_depth); > > > > - if (queue_size < MINIMUM_SUPPORTED_PAGE_SIZE) > > - queue_size = MINIMUM_SUPPORTED_PAGE_SIZE; > > + if (queue_size < MANA_MIN_QSIZE) > > + queue_size = MANA_MIN_QSIZE; > > > > hwc_wq = kzalloc(sizeof(*hwc_wq), GFP_KERNEL); > > if (!hwc_wq) > > @@ -628,10 +628,10 @@ static int mana_hwc_establish_channel(struct > > gdma_context *gc, u16 *q_depth, > > init_completion(&hwc->hwc_init_eqe_comp); > > > > err = mana_smc_setup_hwc(&gc->shm_channel, false, > > - eq->mem_info.dma_handle, > > - cq->mem_info.dma_handle, > > - rq->mem_info.dma_handle, > > - sq->mem_info.dma_handle, > > + virt_to_phys(eq->mem_info.virt_addr), > > + virt_to_phys(cq->mem_info.virt_addr), > > + virt_to_phys(rq->mem_info.virt_addr), > > + virt_to_phys(sq->mem_info.virt_addr), > > This change seems unrelated to handling guest PAGE_SIZE values > other than 4K. Does it belong in a separate patch? Or maybe it just > needs an explanation in the commit message of this patch? I know dma_handle is usually just the phys adr. But this is not always True if IOMMU is used... I have no problem to put it to a separate patch if desired. > > > eq->eq.msix_index); > > if (err) > > return err; > > diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c > > b/drivers/net/ethernet/microsoft/mana/mana_en.c > > index d087cf954f75..6a891dbce686 100644 > > --- a/drivers/net/ethernet/microsoft/mana/mana_en.c > > +++ b/drivers/net/ethernet/microsoft/mana/mana_en.c > > @@ -1889,10 +1889,10 @@ static int mana_create_txq(struct > mana_port_context > > *apc, > > * to prevent overflow. > > */ > > txq_size = MAX_SEND_BUFFERS_PER_QUEUE * 32; > > - BUILD_BUG_ON(!PAGE_ALIGNED(txq_size)); > > + BUILD_BUG_ON(!MANA_MIN_QALIGNED(txq_size)); > > > > cq_size = MAX_SEND_BUFFERS_PER_QUEUE * COMP_ENTRY_SIZE; > > - cq_size = PAGE_ALIGN(cq_size); > > + cq_size = MANA_MIN_QALIGN(cq_size); > > > > gc = gd->gdma_context; > > > > @@ -2189,8 +2189,8 @@ static struct mana_rxq *mana_create_rxq(struct > > mana_port_context *apc, > > if (err) > > goto out; > > > > - rq_size = PAGE_ALIGN(rq_size); > > - cq_size = PAGE_ALIGN(cq_size); > > + rq_size = MANA_MIN_QALIGN(rq_size); > > + cq_size = MANA_MIN_QALIGN(cq_size); > > > > /* Create RQ */ > > memset(&spec, 0, sizeof(spec)); > > diff --git a/drivers/net/ethernet/microsoft/mana/shm_channel.c > > b/drivers/net/ethernet/microsoft/mana/shm_channel.c > > index 5553af9c8085..9a54a163d8d1 100644 > > --- a/drivers/net/ethernet/microsoft/mana/shm_channel.c > > +++ b/drivers/net/ethernet/microsoft/mana/shm_channel.c > > @@ -6,6 +6,7 @@ > > #include <linux/io.h> > > #include <linux/mm.h> > > > > +#include <net/mana/gdma.h> > > #include <net/mana/shm_channel.h> > > > > #define PAGE_FRAME_L48_WIDTH_BYTES 6 > > @@ -183,7 +184,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool > > reset_vf, u64 eq_addr, > > > > /* EQ addr: low 48 bits of frame address */ > > shmem = (u64 *)ptr; > > - frame_addr = PHYS_PFN(eq_addr); > > + frame_addr = MANA_PFN(eq_addr); > > *shmem = frame_addr & PAGE_FRAME_L48_MASK; > > all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) << > > (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS); > > In mana_smc_setup_hwc() a few lines above this change, code using > PAGE_ALIGNED() is unchanged. Is it correct that the eq/cq/rq/sq > addresses > must be aligned to 64K if PAGE_SIZE is 64K? Since we still using PHYS_PFN on them, if not aligned to PAGE_SIZE, the lower bits may be lost. (You said the same below.) > > Related, I wonder about how MANA_PFN() is defined. If PAGE_SIZE is 64K, > MANA_PFN() will first right-shift 16, then left shift 4. The net is > right-shift 12, > corresponding to the 4K chunks that MANA expects. But that approach > guarantees > that the rightmost 4 bits of the MANA PFN will always be zero. That's > consistent > with requiring the addresses to be PAGE_ALIGNED() to 64K, but I'm unclear > whether > that is really the requirement. You might compare with the definition of > HVPFN_DOWN(), which has a similar goal for Linux guests communicating > with > Hyper-V. @Paul Rosswurm You said MANA HW has "no page concept". So the "frame_addr" In the mana_smc_setup_hwc() is NOT related to physical page number, correct? Can we just use phys_adr >> 12 like below? #define MANA_MIN_QSHIFT 12 #define MANA_PFN(a) ((a) >> MANA_MIN_QSHIFT) /* EQ addr: low 48 bits of frame address */ shmem = (u64 *)ptr; - frame_addr = PHYS_PFN(eq_addr); + frame_addr = MANA_PFN(eq_addr); > > > @@ -191,7 +192,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool > > reset_vf, u64 eq_addr, > > > > /* CQ addr: low 48 bits of frame address */ > > shmem = (u64 *)ptr; > > - frame_addr = PHYS_PFN(cq_addr); > > + frame_addr = MANA_PFN(cq_addr); > > *shmem = frame_addr & PAGE_FRAME_L48_MASK; > > all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) << > > (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS); > > @@ -199,7 +200,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool > > reset_vf, u64 eq_addr, > > > > /* RQ addr: low 48 bits of frame address */ > > shmem = (u64 *)ptr; > > - frame_addr = PHYS_PFN(rq_addr); > > + frame_addr = MANA_PFN(rq_addr); > > *shmem = frame_addr & PAGE_FRAME_L48_MASK; > > all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) << > > (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS); > > @@ -207,7 +208,7 @@ int mana_smc_setup_hwc(struct shm_channel *sc, bool > > reset_vf, u64 eq_addr, > > > > /* SQ addr: low 48 bits of frame address */ > > shmem = (u64 *)ptr; > > - frame_addr = PHYS_PFN(sq_addr); > > + frame_addr = MANA_PFN(sq_addr); > > *shmem = frame_addr & PAGE_FRAME_L48_MASK; > > all_addr_h4bits |= (frame_addr >> PAGE_FRAME_L48_WIDTH_BITS) << > > (frame_addr_seq++ * PAGE_FRAME_H4_WIDTH_BITS); > > diff --git a/include/net/mana/gdma.h b/include/net/mana/gdma.h > > index 27684135bb4d..b392559c33e9 100644 > > --- a/include/net/mana/gdma.h > > +++ b/include/net/mana/gdma.h > > @@ -224,7 +224,12 @@ struct gdma_dev { > > struct auxiliary_device *adev; > > }; > > > > -#define MINIMUM_SUPPORTED_PAGE_SIZE PAGE_SIZE > > +/* These are defined by HW */ > > +#define MANA_MIN_QSHIFT 12 > > +#define MANA_MIN_QSIZE (1 << MANA_MIN_QSHIFT) > > +#define MANA_MIN_QALIGN(x) ALIGN((x), MANA_MIN_QSIZE) > > +#define MANA_MIN_QALIGNED(addr) IS_ALIGNED((unsigned long)(addr), > MANA_MIN_QSIZE) > > +#define MANA_PFN(a) (PHYS_PFN(a) << (PAGE_SHIFT - MANA_MIN_QSHIFT)) > > See comments above about how this is defined. Replied above. Thank you for all the detailed comments! - Haiyang