Re: null pointer in rxe_mr_copy()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/10/22 22:34, Bob Pearson wrote:
> Zhu,
> 
> Since checking for mr == NULL in rxe_mr_copy fixes the problem you were seeing in rping.
> Perhaps it would be a good idea to apply the following patch which would tell us which of
> the three calls to rxe_mr_copy is failing. My suspicion is the one in read_reply() in rxe_resp.c
> This could be caused by a race between shutting down the qp and finishing up an RDMA read.
> The responder resources state machine is completely unprotected from simultaneous access by
> verbs code and bh code in rxe_resp.c. rxe_resp is a tasklet so all the accesses from there are
> serialized but if anyone makes a verbs call that touches the responder resources it could
> cause problems. The most likely (only?) place this could happen is qp shutdown.

I have reproduced a failure in rping on the v13 patch series. So never mind. It's something else.
It runs for about a couple minutes on my  system between a pair of VMs with

rping -s or c -C 10000 -S 4096 -a 192.168.0.xx -d -V -p 1234

after a couple of minutes client hangs. Nothing in dmesg though. Happens right after an RDMA read
that reports success on the server. Possibly it is at 10000 packets feels about the right time but
job does not finish. 
> 
> Bob
> 
> 
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
> 
> index 60a31b718774..66184f5a4ddf 100644
> 
> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
> 
> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
> 
> @@ -489,6 +489,7 @@ int copy_data(
> 
>  		if (bytes > 0) {
> 
>  			iova = sge->addr + offset;
> 
>  
> 
> +			WARN_ON(!mr);
> 
>  			err = rxe_mr_copy(mr, iova, addr, bytes, dir);
> 
>  			if (err)
> 
>  				goto err2;
> 
> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
> 
> index 1d95fab606da..6e3e86bdccd7 100644
> 
> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
> 
> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
> 
> @@ -536,6 +536,7 @@ static enum resp_states write_data_in(struct rxe_qp *qp,
> 
>  	int	err;
> 
>  	int data_len = payload_size(pkt);
> 
>  
> 
> +	WARN_ON(!qp->resp.mr);
> 
>  	err = rxe_mr_copy(qp->resp.mr, qp->resp.va + qp->resp.offset,
> 
>  			  payload_addr(pkt), data_len, RXE_TO_MR_OBJ);
> 
>  	if (err) {
> 
> @@ -772,6 +773,7 @@ static enum resp_states read_reply(struct rxe_qp *qp,
> 
>  	if (!skb)
> 
>  		return RESPST_ERR_RNR;
> 
>  
> 
> +	WARN_ON(!mr);
> 
>  	err = rxe_mr_copy(mr, res->read.va, payload_addr(&ack_pkt),
> 
>  			  payload, RXE_FROM_MR_OBJ);
> 
>  	if (err)
> 




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux