On Fri, Mar 31, 2023 at 11:38:26AM +0800, xuhaoyue (A) wrote: > > > On 2023/3/30 21:01:20, Jason Gunthorpe wrote: > > On Thu, Mar 30, 2023 at 08:57:41PM +0800, xuhaoyue (A) wrote: > >> > >> > >> On 2023/3/27 20:55:59, Jason Gunthorpe wrote: > >>> On Mon, Mar 27, 2023 at 08:53:35PM +0800, xuhaoyue (A) wrote: > >>> > >>>>>> static void hns_roce_write512(uint64_t *dest, uint64_t *val) > >>>>>> { > >>>>>> mmio_memcpy_x64(dest, val, sizeof(struct hns_roce_rc_sq_wqe)); > >>>>>> @@ -314,7 +319,10 @@ static void hns_roce_write_dwqe(struct hns_roce_qp *qp, void *wqe) > >>>>>> hr_reg_write(rc_sq_wqe, RCWQE_DB_SL_H, qp->sl >> HNS_ROCE_SL_SHIFT); > >>>>>> hr_reg_write(rc_sq_wqe, RCWQE_WQE_IDX, qp->sq.head); > >>>>>> > >>>>>> - hns_roce_write512(qp->sq.db_reg, wqe); > >>>>>> + if (qp->flags & HNS_ROCE_QP_CAP_SVE_DIRECT_WQE) > >>>>> > >>>>> Why do you need a device flag here? > >>>> > >>>> Our CPU die can support NEON instructions and SVE instructions, > >>>> but some CPU dies only have SVE instructions that can accelerate our direct WQE performance. > >>>> Therefore, we need to add such a flag bit to distinguish. > >>> > >>> NEON vs SVE is available to userspace already, it shouldn't come > >>> throuhg a driver flag. You need another reason to add this flag > >>> > >>> The userspace should detect the right instruction to use based on the > >>> cpu flags using the attribute stuff I pointed you at > >>> > >>> Jason > >>> . > >>> > >> > >> We optimized direct wqe based on different instructions for > >> different CPUs, but the architecture of the CPUs is the same and > >> supports both SVE and NEON instructions. We plan to use cpuid to > >> distinguish between them. Is this more reasonable? > > > > Uhh, do you mean certain CPUs won't work with SVE and others won't > > work with NEON? > > > > That is quite horrible > > > > Jason > > . > > > > No, acctually for general scenarios, our CPU supports two types of instructions, SVE and NEON. > However, for the CPU that requires high fp64 floating-point computing power, the SVE instruction is enhanced and the NEON instruction is weakened. Ideally the decision of what CPU instruction to use will be made by rdma-core, using the the various schemes for dynamic link time selection It should apply universally to all providers Jason