Re: multipath IB/srp fail-over testing lands up in dump stack in swiotlb_alloc_coherent()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




----- Original Message -----
> From: "Laurence Oberman" <loberman@xxxxxxxxxx>
> To: leon@xxxxxxxxxx
> Cc: "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx>, "Yishai Hadas" <yishaih@xxxxxxxxxxxx>, linux-rdma@xxxxxxxxxxxxxxx
> Sent: Monday, June 13, 2016 10:19:57 AM
> Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in swiotlb_alloc_coherent()
> 
> 
> 
> ----- Original Message -----
> > From: "Leon Romanovsky" <leon@xxxxxxxxxx>
> > To: "Bart Van Assche" <bart.vanassche@xxxxxxxxxxx>
> > Cc: "Yishai Hadas" <yishaih@xxxxxxxxxxxx>, "Laurence Oberman"
> > <loberman@xxxxxxxxxx>, linux-rdma@xxxxxxxxxxxxxxx
> > Sent: Monday, June 13, 2016 10:07:47 AM
> > Subject: Re: multipath IB/srp fail-over testing lands up in dump stack in
> > swiotlb_alloc_coherent()
> > 
> > On Sun, Jun 12, 2016 at 11:32:53PM -0700, Bart Van Assche wrote:
> > > On 06/12/2016 03:40 PM, Laurence Oberman wrote:
> > > >Jun  8 10:12:52 jumpclient kernel: mlx5_core 0000:08:00.1: swiotlb
> > > >buffer
> > > >is full (sz: 266240 bytes)
> > > >Jun  8 10:12:52 jumpclient kernel: swiotlb: coherent allocation failed
> > > >for
> > > >device 0000:08:00.1 size=266240
> > > 
> > > Hello,
> > > 
> > > I think the above means that the coherent memory allocation succeeded but
> > > that the test dev_addr + size - 1 <= DMA_BIT_MASK(32) failed. Can someone
> > > from Mellanox tell us whether or not it would be safe to set
> > > coherent_dma_mask to DMA_BIT_MASK(64) for the mlx4 and mlx5 drivers?
> > 
> > Bart and Laurence,
> > We are actually doing it For mlx5 driver.
> > 
> > 926 static int mlx5_pci_init(struct mlx5_core_dev *dev, struct  mlx5_priv
> > *priv)
> > <...>
> > 961         err = set_dma_caps(pdev);
> > 
> > 187 static int set_dma_caps(struct pci_dev *pdev)
> > <...>
> > 201         err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(64));
> > 202         if (err) {
> > 203                 dev_warn(&pdev->dev,
> > 204                          "Warning: couldn't set 64-bit consistent PCI
> > DMA
> > mask\n");
> > 205                 err = pci_set_consistent_dma_mask(pdev,
> > DMA_BIT_MASK(32));
> > 206                 if (err) {
> > 207                         dev_err(&pdev->dev,
> > 208                                 "Can't set consistent PCI DMA mask,
> > aborting\n");
> > 209                         return err;
> > 210                 }
> > 211         }
> > 
> > 118 static inline int pci_set_consistent_dma_mask(struct pci_dev *dev,u64
> > mask)
> > 119 {
> > 120         return dma_set_coherent_mask(&dev->dev, mask);
> > 121 }
> > 
> > > 
> > > Thanks,
> > > 
> > > Bart.
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> Hi Leon,
> 
> OK I see it now
> 
> static int set_dma_caps(struct pci_dev *pdev)
> {
>         int err;
> 
>         err = pci_set_dma_mask(pdev, DMA_BIT_MASK(64));
>         if (err) {
> 
> Thanks
> Laurence
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Replying to my own email.
Leon, what is the implication of the mapping failure.
Its only in the reconnect stack when I am restarting controllers with the messaging and stack dump masked I still see the failure but it seems transparent in that all the paths come back.

[ 1595.167812] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240 bytes)
[ 1595.379133] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240 bytes)
[ 1595.460627] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240 bytes)
[ 1598.121096] scsi host1: reconnect attempt 3 failed (-48)
[ 1608.187869] mlx5_core 0000:08:00.0: swiotlb buffer is full (sz: 266240 bytes)
[ 1615.911705] scsi host1: reconnect attempt 4 failed (-12)
[ 1641.446017] scsi host1: ib_srp: Got failed path rec status -110
[ 1641.482947] scsi host1: ib_srp: Path record query failed
[ 1641.513454] scsi host1: reconnect attempt 5 failed (-110)
[ 1662.330883] scsi host1: ib_srp: Got failed path rec status -110
[ 1662.361224] scsi host1: ib_srp: Path record query failed
[ 1662.390768] scsi host1: reconnect attempt 6 failed (-110)
[ 1683.892311] scsi host1: ib_srp: Got failed path rec status -110
[ 1683.922653] scsi host1: ib_srp: Path record query failed
[ 1683.952717] scsi host1: reconnect attempt 7 failed (-110)
SM port is up

Entering MASTER state

[ 1705.254048] scsi host1:   REJ reason 0x8
[ 1705.274869] scsi host1: reconnect attempt 8 failed (-104)
[ 1723.264914] scsi host1:   REJ reason 0x8
[ 1723.285193] scsi host1: reconnect attempt 9 failed (-104)
[ 1743.658091] scsi host1:   REJ reason 0x8
[ 1743.678562] scsi host1: reconnect attempt 10 failed (-104)
[ 1761.911512] scsi host1:   REJ reason 0x8
[ 1761.932006] scsi host1: reconnect attempt 11 failed (-104)
[ 1782.209020] scsi host1: ib_srp: reconnect succeeded

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux