On Thu, Aug 10, 2017 at 08:55:42PM +0000, Adit Ranadive wrote: > On 8/10/17, 1:02 PM, "Yuval Shaia" <yuval.shaia@xxxxxxxxxx> wrote: > > On Thu, Aug 10, 2017 at 12:05:02PM -0700, Adit Ranadive wrote: > > > From: Bryan Tan <bryantan@xxxxxxxxxx> > > > > > > There is a chance of a race between arming the CQ and receiving > > > completions. By reporting CQ missed events any ULPs should poll > > > again to get the completions. > > > > > > Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") > > > Acked-by: Aditya Sarwade <asarwade@xxxxxxxxxx> > > > Signed-off-by: Bryan Tan <bryantan@xxxxxxxxxx> > > > Signed-off-by: Adit Ranadive <aditr@xxxxxxxxxx> > > > --- > > > v0 -> v1: > > > - Check for invalid ring index. > > > --- > > > drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c | 17 ++++++++++++++++- > > > 1 file changed, 16 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c > > > index 69bda61..90aa326 100644 > > > --- a/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c > > > +++ b/drivers/infiniband/hw/vmw_pvrdma/pvrdma_cq.c > > > @@ -65,13 +65,28 @@ int pvrdma_req_notify_cq(struct ib_cq *ibcq, > > > struct pvrdma_dev *dev = to_vdev(ibcq->device); > > > struct pvrdma_cq *cq = to_vcq(ibcq); > > > u32 val = cq->cq_handle; > > > + unsigned long flags; > > > + int has_data = 0; > > > > > > val |= (notify_flags & IB_CQ_SOLICITED_MASK) == IB_CQ_SOLICITED ? > > > PVRDMA_UAR_CQ_ARM_SOL : PVRDMA_UAR_CQ_ARM; > > > > > > + spin_lock_irqsave(&cq->cq_lock, flags); > > > + > > > pvrdma_write_uar_cq(dev, val); > > > > > > - return 0; > > > + if (notify_flags & IB_CQ_REPORT_MISSED_EVENTS) { > > > + unsigned int head; > > > + > > > + has_data = pvrdma_idx_ring_has_data(&cq->ring_state->rx, > > > + cq->ibcq.cqe, &head); > > > + if (unlikely(has_data == PVRDMA_INVALID_IDX)) > > > + dev_err(&dev->pdev->dev, "CQ ring state invalid\n"); > > > > I see the point of checking the return value but per my understanding, and > > correct me if i'm wrong, this rare case points to a corrupted ring which > > can happen *only* in case of a bug so it is not "error" by nature. > > If this is correct then i don't see the point of having this "question" on > > every call to ib_notify_cq. > > > > Do you agree to move this check to pvrdma_idx_ring_has_data and even make > > the function use BUG_ON? > > I'll concede that while it points to a corrupted ring (through a device bug, > memory corruption) but we want to report it as a device error to maintain > consistency in our driver and give ULPs a chance to clean up. Also, the compiler > optimization should help here. Great, i understand that. So, at least can you consider moving this dev_err into pvrdma_idx_ring_has_data so callers do not need to handle errors? btw, same apply to pvrdma_idx_ring_has_space > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html