Re: [PATCH for-next v2] RDMA/rxe: fix regression caused by recent patch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10/30/20 12:36 PM, Jason Gunthorpe wrote:
> On Fri, Oct 30, 2020 at 12:11:07PM -0500, Bob Pearson wrote:
>> The commit referenced below performs additional checking on
>> devices used for DMA. Specifically it checks that
>>
>> device->dma_mask != NULL
>>
>> Rdma_rxe uses this device when pinning MR memory but did not
>> set the value of dma_mask. In fact rdma_rxe does not perform
>> any DMA operations so the value is never used but is checked.
>>
>> This patch gives dma_mask a valid value extracted from the device
>> backing the ndev used by rxe.
>>
>> Without this patch rdma_rxe does not function.
>>
>> N.B. This patch needs to be applied before the recent fix to add back
>> IB_USER_VERBS_CMD_POST_SEND to uverbs_cmd_mask.
>>
>> Dennis Dallesandro reported that Parav's similar patch did not apply
>> cleanly to rxe. This one does to for-next head of tree as of yesterday.
>>
>> Fixes: f959dcd6ddfd2 ("dma-direct: Fix potential NULL pointer dereference")
>> Signed-off-by: Bob Pearson <rpearson@xxxxxxx>
>>  drivers/infiniband/sw/rxe/rxe_verbs.c | 18 ++++++++++++++++--
>>  1 file changed, 16 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/sw/rxe/rxe_verbs.c b/drivers/infiniband/sw/rxe/rxe_verbs.c
>> index 7652d53af2c1..c857e83323ed 100644
>> +++ b/drivers/infiniband/sw/rxe/rxe_verbs.c
>> @@ -1128,19 +1128,32 @@ int rxe_register_device(struct rxe_dev *rxe, const char *ibdev_name)
>>  	int err;
>>  	struct ib_device *dev = &rxe->ib_dev;
>>  	struct crypto_shash *tfm;
>> +	u64 dma_mask;
>>  
>>  	strlcpy(dev->node_desc, "rxe", sizeof(dev->node_desc));
>>  
>>  	dev->node_type = RDMA_NODE_IB_CA;
>>  	dev->phys_port_cnt = 1;
>>  	dev->num_comp_vectors = num_possible_cpus();
>> -	dev->dev.parent = rxe_dma_device(rxe);
>>  	dev->local_dma_lkey = 0;
>>  	addrconf_addr_eui48((unsigned char *)&dev->node_guid,
>>  			    rxe->ndev->dev_addr);
>>  	dev->dev.dma_parms = &rxe->dma_parms;
>>  	dma_set_max_seg_size(&dev->dev, UINT_MAX);
>> -	dma_set_coherent_mask(&dev->dev, dma_get_required_mask(&dev->dev));
>> +
>> +	/* rdma_rxe never does real DMA but does rely on
>> +	 * pinning user memory in MRs to avoid page faults
>> +	 * in responder and completer tasklets. This code
>> +	 * supplies a valid dma_mask from the underlying
>> +	 * network device. It is never used but is checked.
>> +	 */
>> +	dev->dev.parent = rxe_dma_device(rxe);
> 
> Oh! This is another bug, the parent of an ib_device should never be
> set to a net_device!! This is probably why we get all those mysterious
> syzkaller faults :| Just leave it NULL
> 
>> +	dma_mask = *(dev->dev.parent->dma_mask);
>> +	err = dma_coerce_mask_and_coherent(&dev->dev, dma_mask);
> 
> Why not use Parav's logic?
> 
> Jason
> 

It's not the network device. It is the parent of the network device.
On 64 bit machines it gives 0xffffffffffffffff as dma_mask.

struct device *rxe_dma_device(struct rxe_dev *rxe)
{
        struct net_device *ndev;

        ndev = rxe->ndev;

        if (is_vlan_dev(ndev))
                ndev = vlan_dev_real_dev(ndev);

        return ndev->dev.parent;
}

His should work too. They will behave the same at the end of the day.
I don't really know what the rxe_dma_device() code was trying to do in the
first place so I didn't change it. But it was a handy place to get a dma_mask
that should work on any architecture. If there is no reason to set dev.parent
I can get rid of rxe_dma_device.

Bob



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux