On Sun, Mar 19, 2023 at 07:37:45AM +0530, Selvin Xavier wrote: > On Fri, Mar 17, 2023 at 5:53 PM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: > > > > On 2023/3/17 18:17, Selvin Xavier wrote: > > > On Fri, Mar 17, 2023 at 2:40 PM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: > > >> > > >> On 2023/3/15 16:16, Selvin Xavier wrote: > > >>> Add resize_cq verb support for user space CQs. Resize operation for > > >>> kernel CQs are not supported now. > > >>> > > >>> Driver should free the current CQ only after user library polls > > >>> for all the completions and switch to new CQ. So after the resize_cq > > >>> is returned from the driver, user libray polls for existing completions > > >> > > >> libray -> library > > >> > > >>> and store it as temporary data. Once library reaps all completions in the > > >>> current CQ, it invokes the ibv_cmd_poll_cq to inform the driver about > > >>> the resize_cq completion. Adding a check for user CQs in driver's > > >>> poll_cq and complete the resize operation for user CQs. > > >>> Updating uverbs_cmd_mask with poll_cq to support this. > > >>> > > >>> User library changes are available in this pull request. > > >>> https://github.com/linux-rdma/rdma-core/pull/1315 > > >>> > > >>> Signed-off-by: Selvin Xavier <selvin.xavier@xxxxxxxxxxxx> > > >>> --- > > >>> drivers/infiniband/hw/bnxt_re/ib_verbs.c | 109 +++++++++++++++++++++++++++++++ > > >>> drivers/infiniband/hw/bnxt_re/ib_verbs.h | 3 + > > >>> drivers/infiniband/hw/bnxt_re/main.c | 2 + > > >>> drivers/infiniband/hw/bnxt_re/qplib_fp.c | 44 +++++++++++++ > > >>> drivers/infiniband/hw/bnxt_re/qplib_fp.h | 5 ++ > > >>> include/uapi/rdma/bnxt_re-abi.h | 4 ++ > > >>> 6 files changed, 167 insertions(+) > > >>> > > >>> diff --git a/drivers/infiniband/hw/bnxt_re/ib_verbs.c b/drivers/infiniband/hw/bnxt_re/ib_verbs.c > > >>> index 989edc7..e86afec 100644 > > >>> --- a/drivers/infiniband/hw/bnxt_re/ib_verbs.c > > >>> +++ b/drivers/infiniband/hw/bnxt_re/ib_verbs.c > > >>> @@ -2912,6 +2912,106 @@ int bnxt_re_create_cq(struct ib_cq *ibcq, const struct ib_cq_init_attr *attr, > > >>> return rc; > > >>> } > > >>> > > >>> +static void bnxt_re_resize_cq_complete(struct bnxt_re_cq *cq) > > >>> +{ > > >>> + struct bnxt_re_dev *rdev = cq->rdev; > > >>> + > > >>> + bnxt_qplib_resize_cq_complete(&rdev->qplib_res, &cq->qplib_cq); > > >>> + > > >>> + cq->qplib_cq.max_wqe = cq->resize_cqe; > > >>> + if (cq->resize_umem) { > > >>> + ib_umem_release(cq->umem); > > >>> + cq->umem = cq->resize_umem; > > >>> + cq->resize_umem = NULL; > > >>> + cq->resize_cqe = 0; > > >>> + } > > >>> +} > > >>> + > > >>> +int bnxt_re_resize_cq(struct ib_cq *ibcq, int cqe, struct ib_udata *udata) > > >>> +{ > > >>> + struct bnxt_qplib_sg_info sg_info = {}; > > >>> + struct bnxt_qplib_dpi *orig_dpi = NULL; > > >>> + struct bnxt_qplib_dev_attr *dev_attr; > > >>> + struct bnxt_re_ucontext *uctx = NULL; > > >>> + struct bnxt_re_resize_cq_req req; > > >>> + struct bnxt_re_dev *rdev; > > >>> + struct bnxt_re_cq *cq; > > >>> + int rc, entries; > > >>> + > > >>> + cq = container_of(ibcq, struct bnxt_re_cq, ib_cq); > > >>> + rdev = cq->rdev; > > >>> + dev_attr = &rdev->dev_attr; > > >>> + if (!ibcq->uobject) { > > >>> + ibdev_err(&rdev->ibdev, "Kernel CQ Resize not supported"); > > >>> + return -EOPNOTSUPP; > > >>> + } > > >>> + > > >>> + if (cq->resize_umem) { > > >>> + ibdev_err(&rdev->ibdev, "Resize CQ %#x failed - Busy", > > >>> + cq->qplib_cq.id); > > >>> + return -EBUSY; > > >>> + } > > >> > > >> Does above cq->resize_umem checking has any conconcurrent protection > > >> again the bnxt_re_resize_cq_complete() called by bnxt_re_poll_cq()? > > >> > > >> bnxt_re_resize_cq() seems like a control path operation, while > > >> bnxt_re_poll_cq() seems like a data path operation, I am not sure > > >> there is any conconcurrent protection between them. > > >> > > > The previous check is to prevent simultaneous resize_cq context from > > > the user application. > > > > > > if you see the library implementation (PR > > > https://github.com/linux-rdma/rdma-core/pull/1315), entire operation > > > is done in the single resize_cq context from application > > > i.e. > > > bnxt_re_resize_cq > > > -> ibv_cmd_resize_cq > > > call driver bnxt_re_resize_cq and return > > > -> poll out the current completions and store it in a user lib list > > > -> issue an ibv_cmd_poll_cq. > > > This will invoke bnxt_re_poll_cq in the kernel driver. We free > > > the previous cq resources. > > > > > > So the synchronization between resize_cq and poll_cq is happening in > > > the user lib. We can free the old CQ only after user land sees the > > > final Completion on the previous CQ. So we return from the driver's > > > bnxt_re_resize_cq and then poll for all completions and then use > > > ibv_cmd_poll_cq as a hook to reach the kernel driver and free the old > > > CQ's resource. > > > > > > Summarize, driver's bnxt_re_poll_cq for user space CQs will be called > > > only from resize_cq context and no concurrent protection required in > > > the driver. > > > > Is it acceptable to depend on the user space code to ensure the correct > > order in the kernel space? > > Isn't that may utilized by some malicious user to crash the kernel? > > > The user space code I mentioned is implemented in the provider/bnxt_re > user library just > like any other verb interface. Apps shall always go through this > library to send request > to the provider driver. And if I use my own version of provider/bnxt_re, will kernel crash? Thanks