On 2021/11/26 1:50, Jason Gunthorpe wrote: > On Mon, Nov 22, 2021 at 10:58:09AM +0200, Leon Romanovsky wrote: >> On Mon, Nov 22, 2021 at 11:38:01AM +0800, Wenpeng Liang wrote: >>> From: Yixing Liu <liuyixing1@xxxxxxxxxx> >>> >>> Add direct wqe enable switch and address mapping. >>> >>> Signed-off-by: Yixing Liu <liuyixing1@xxxxxxxxxx> >>> Signed-off-by: Wenpeng Liang <liangwenpeng@xxxxxxxxxx> >>> drivers/infiniband/hw/hns/hns_roce_device.h | 8 +-- >>> drivers/infiniband/hw/hns/hns_roce_main.c | 38 ++++++++++++--- >>> drivers/infiniband/hw/hns/hns_roce_pd.c | 3 ++ >>> drivers/infiniband/hw/hns/hns_roce_qp.c | 54 ++++++++++++++++++++- >>> include/uapi/rdma/hns-abi.h | 2 + >>> 5 files changed, 94 insertions(+), 11 deletions(-) >> >> <...> >> >>> entry = to_hns_mmap(rdma_entry); >>> pfn = entry->address >> PAGE_SHIFT; >>> - prot = vma->vm_page_prot; >>> >>> - if (entry->mmap_type != HNS_ROCE_MMAP_TYPE_TPTR) >>> - prot = pgprot_noncached(prot); >>> + switch (entry->mmap_type) { >>> + case HNS_ROCE_MMAP_TYPE_DB: >>> + prot = pgprot_noncached(vma->vm_page_prot); >>> + break; >>> + case HNS_ROCE_MMAP_TYPE_TPTR: >>> + prot = vma->vm_page_prot; >>> + break; >>> + case HNS_ROCE_MMAP_TYPE_DWQE: >>> + prot = pgprot_device(vma->vm_page_prot); >> >> Everything fine, except this pgprot_device(). You probably need to check >> WC internally in your driver and use or pgprot_writecombine() or >> pgprot_noncached() explicitly. > > pgprot_device is only used in two places in the kernel > pci_mmap_resource_range() for setting up the sysfs resourceXX mmap > > And in pci_remap_iospace() as part of emulationg PIO on mmio > architectures > > So, a PCI device should always be using pgprot_device() in its mmap > function > > The question is why is pgprot_noncached() being used at all? The only > difference on ARM is that noncached is non-Early Write Acknowledgement > and devices is not. > > At the very least this should be explained in a comment why nE vs E is > required in all these cases. > > Jason > . > HIP09 is a SoC device, and our CPU only optimizes ST4 instructions for device attributes. Therefore, we set device attributes to obtain optimization effects. The device attribute allows early ack, so it is faster compared with noncached. In order to ensure the early ack works correctly. Even if the data is incomplete, our device still knocks on the doorbell according to the content of the first 8 bytes to complete the data transmission. Thanks Wenpeng