On Sat, Feb 25, 2023 at 06:02:53PM +0800, Haoyue Xu wrote: > + > +set_source_files_properties(hns_roce_u_hw_v2.c PROPERTIES COMPILE_FLAGS "${SVE_FLAGS}") > diff --git a/providers/hns/hns_roce_u_hw_v2.c b/providers/hns/hns_roce_u_hw_v2.c > index 3a294968..bd457217 100644 > --- a/providers/hns/hns_roce_u_hw_v2.c > +++ b/providers/hns/hns_roce_u_hw_v2.c > @@ -299,6 +299,11 @@ static void hns_roce_update_sq_db(struct hns_roce_context *ctx, > hns_roce_write64(qp->sq.db_reg, (__le32 *)&sq_db); > } > > +static void hns_roce_sve_write512(uint64_t *dest, uint64_t *val) > +{ > + mmio_memcpy_x64_sve(dest, val); > +} This is not the right way, you should make this work like the x86 SSE stuff, using a "__attribute__((target(xx)))" Look in util/mmio.c and implement a mmio_memcpy_x64 for ARM SVE mmio_memcpy_x64 is defined to try to generate a 64 byte PCI-E TLP. If you don't want or can't handle that then you should write your own loop of 8 byte stores. > static void hns_roce_write512(uint64_t *dest, uint64_t *val) > { > mmio_memcpy_x64(dest, val, sizeof(struct hns_roce_rc_sq_wqe)); > @@ -314,7 +319,10 @@ static void hns_roce_write_dwqe(struct hns_roce_qp *qp, void *wqe) > hr_reg_write(rc_sq_wqe, RCWQE_DB_SL_H, qp->sl >> HNS_ROCE_SL_SHIFT); > hr_reg_write(rc_sq_wqe, RCWQE_WQE_IDX, qp->sq.head); > > - hns_roce_write512(qp->sq.db_reg, wqe); > + if (qp->flags & HNS_ROCE_QP_CAP_SVE_DIRECT_WQE) Why do you need a device flag here? > + hns_roce_sve_write512(qp->sq.db_reg, wqe); > + else > + hns_roce_write512(qp->sq.db_reg, wqe); Isn't this function being called on WC memory already? The usual way to make the 64 byte write is with stores to WC memory.. Jason