Re: [PATCH for-next] RDMA/rxe: Handle zero length cases correctly

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/19/23 19:38, Zhu Yanjun wrote:
> On Fri, Jan 20, 2023 at 3:09 AM Bob Pearson <rpearsonhpe@xxxxxxxxx> wrote:
>>
>> Currently the rxe driver, in rare situations, can respond incorrectly
>> to zero length operations which are retried. The client does not
>> have to provide an rkey for zero length RDMA operations so the rkey
>> may be invalid. The driver saves this rkey in the responder resources
>> to replay the rdma operation if a retry is required so the second pass
>> will use this (potentially) invalid rkey which may result in memory
>> faults.
>>
>> This patch corrects the driver to ignore the provided rkey if the
>> reth length is zero and make sure to set the mr to NULL. In read_reply()
>> if the length is zero the MR is set to NULL. Warnings are added in
>> the routines in rxe_mr.c to catch NULL MRs when the length is non-zero.
>>
> 
> There is a patch in the following link:
> 
> https://patchwork.kernel.org/project/linux-rdma/patch/20230113023527.728725-1-baijiaju1990@xxxxxxxxx/
> 
> Not sure whether it is similar or not.
> 
> Zhu Yanjun
> 
>> Signed-off-by: Bob Pearson <rpearsonhpe@xxxxxxxxx>
>> ---
>>  drivers/infiniband/sw/rxe/rxe_mr.c   |  9 +++++++
>>  drivers/infiniband/sw/rxe/rxe_resp.c | 36 +++++++++++++++++++++-------
>>  2 files changed, 36 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_mr.c b/drivers/infiniband/sw/rxe/rxe_mr.c
>> index 072eac4b65d2..134a74f315c2 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_mr.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_mr.c
>> @@ -267,6 +267,9 @@ void *iova_to_vaddr(struct rxe_mr *mr, u64 iova, int length)
>>         int m, n;
>>         void *addr;
>>
>> +       if (WARN_ON(!mr))
>> +               return NULL;
>> +
>>         if (mr->state != RXE_MR_STATE_VALID) {
>>                 rxe_dbg_mr(mr, "Not in valid state\n");
>>                 addr = NULL;
>> @@ -305,6 +308,9 @@ int rxe_flush_pmem_iova(struct rxe_mr *mr, u64 iova, int length)
>>         if (length == 0)
>>                 return 0;
>>
>> +       if (WARN_ON(!mr))
>> +               return -EINVAL;
>> +
>>         if (mr->ibmr.type == IB_MR_TYPE_DMA)
>>                 return -EFAULT;
>>
>> @@ -349,6 +355,9 @@ int rxe_mr_copy(struct rxe_mr *mr, u64 iova, void *addr, int length,
>>         if (length == 0)
>>                 return 0;
>>
>> +       if (WARN_ON(!mr))
>> +               return -EINVAL;
>> +
>>         if (mr->ibmr.type == IB_MR_TYPE_DMA) {
>>                 u8 *src, *dest;
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_resp.c b/drivers/infiniband/sw/rxe/rxe_resp.c
>> index c74972244f08..a528dc25d389 100644
>> --- a/drivers/infiniband/sw/rxe/rxe_resp.c
>> +++ b/drivers/infiniband/sw/rxe/rxe_resp.c
>> @@ -457,13 +457,23 @@ static enum resp_states rxe_resp_check_length(struct rxe_qp *qp,
>>         return RESPST_CHK_RKEY;
>>  }
>>
>> +/* if the reth length field is zero we can assume nothing
>> + * about the rkey value and should not validate or use it.
>> + * Instead set qp->resp.rkey to 0 which is an invalid rkey
>> + * value since the minimum index part is 1.
>> + */
>>  static void qp_resp_from_reth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
>>  {
>> +       unsigned int length = reth_len(pkt);
>> +
>>         qp->resp.va = reth_va(pkt);
>>         qp->resp.offset = 0;
>> -       qp->resp.rkey = reth_rkey(pkt);
>> -       qp->resp.resid = reth_len(pkt);
>> -       qp->resp.length = reth_len(pkt);
>> +       qp->resp.resid = length;
>> +       qp->resp.length = length;
>> +       if (length)
>> +               qp->resp.rkey = reth_rkey(pkt);
>> +       else
>> +               qp->resp.rkey = 0;
>>  }
>>
>>  static void qp_resp_from_atmeth(struct rxe_qp *qp, struct rxe_pkt_info *pkt)
>> @@ -512,8 +522,8 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
>>
>>         /* A zero-byte op is not required to set an addr or rkey. See C9-88 */
>>         if ((pkt->mask & RXE_READ_OR_WRITE_MASK) &&
>> -           (pkt->mask & RXE_RETH_MASK) &&
>> -           reth_len(pkt) == 0) {
>> +           (pkt->mask & RXE_RETH_MASK) && reth_len(pkt) == 0) {
>> +               qp->resp.mr = NULL;
>>                 return RESPST_EXECUTE;
>>         }
>>
>> @@ -592,6 +602,7 @@ static enum resp_states check_rkey(struct rxe_qp *qp,
>>         return RESPST_EXECUTE;
>>
>>  err:
>> +       qp->resp.mr = NULL;
>>         if (mr)
>>                 rxe_put(mr);
>>         if (mw)
>> @@ -966,7 +977,10 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>         }
>>
>>         if (res->state == rdatm_res_state_new) {
>> -               if (!res->replay) {
>> +               if (qp->resp.length == 0) {
>> +                       mr = NULL;
>> +                       qp->resp.mr = NULL;
>> +               } else if (!res->replay) {
>>                         mr = qp->resp.mr;
>>                         qp->resp.mr = NULL;
>>                 } else {
>> @@ -980,9 +994,13 @@ static enum resp_states read_reply(struct rxe_qp *qp,
>>                 else
>>                         opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_FIRST;
>>         } else {
>> -               mr = rxe_recheck_mr(qp, res->read.rkey);
>> -               if (!mr)
>> -                       return RESPST_ERR_RKEY_VIOLATION;
>> +               if (qp->resp.length == 0) {
>> +                       mr = NULL;
>> +               } else {
>> +                       mr = rxe_recheck_mr(qp, res->read.rkey);
>> +                       if (!mr)
>> +                               return RESPST_ERR_RKEY_VIOLATION;
>> +               }
>>
>>                 if (res->read.resid > mtu)
>>                         opcode = IB_OPCODE_RC_RDMA_READ_RESPONSE_MIDDLE;
>> --
>> 2.37.2
>>

Zhu,

It relates since he is checking for NULL MRs. But I don't think it addresses the root
causes. The patch I sent should eliminate NULL MRs together with length != 0 in
the copy routines. I added WARN_ON's in case someone changes things later and
we hit this again. (A warning is more useful than a fault which can be very hard
to diagnose.)

The two changes I made that attack the cause of problems are
(1) clearing qp->resp.mr in check_rkey() in the alternate paths. The primary
path demands that it get set with a valid mr. But on the alternate paths it isn't
set at all and can leave with a stale, invalid or wrong mr value.
(2) in read_reply() there is an error path where a zero length read fails to get
acked and the requester retries the operation and sends a second request. This
will end up in read_reply and as currently written attempt to lookup the rkey and
turn it into an MR but no valid rkey is required in a zero length operation so this
is likely to fail. The fixes treats length == 0 as a special case and force a NULL mr.
This should not trigger a fault in the mr copy/etc. routines since they always
check for length == 0 and return or require a non zero length.

Thanks,

Bob





[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux