RE: [PATCH net-next 3/3] net: stmmac: Introducing support for Page Pool

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



From: Robin Murphy <robin.murphy@xxxxxxx>
Date: Jul/29/2019, 12:52:02 (UTC+00:00)

> On 29/07/2019 12:29, Jose Abreu wrote:
> > ++ Catalin, Will (ARM64 Maintainers)
> > 
> > From: Jon Hunter <jonathanh@xxxxxxxxxx>
> > Date: Jul/29/2019, 11:55:18 (UTC+00:00)
> > 
> >>
> >> On 29/07/2019 09:16, Jose Abreu wrote:
> >>> From: Jose Abreu <joabreu@xxxxxxxxxxxx>
> >>> Date: Jul/27/2019, 16:56:37 (UTC+00:00)
> >>>
> >>>> From: Jon Hunter <jonathanh@xxxxxxxxxx>
> >>>> Date: Jul/26/2019, 15:11:00 (UTC+00:00)
> >>>>
> >>>>>
> >>>>> On 25/07/2019 16:12, Jose Abreu wrote:
> >>>>>> From: Jon Hunter <jonathanh@xxxxxxxxxx>
> >>>>>> Date: Jul/25/2019, 15:25:59 (UTC+00:00)
> >>>>>>
> >>>>>>>
> >>>>>>> On 25/07/2019 14:26, Jose Abreu wrote:
> >>>>>>>
> >>>>>>> ...
> >>>>>>>
> >>>>>>>> Well, I wasn't expecting that :/
> >>>>>>>>
> >>>>>>>> Per documentation of barriers I think we should set descriptor fields
> >>>>>>>> and then barrier and finally ownership to HW so that remaining fields
> >>>>>>>> are coherent before owner is set.
> >>>>>>>>
> >>>>>>>> Anyway, can you also add a dma_rmb() after the call to
> >>>>>>>> stmmac_rx_status() ?
> >>>>>>>
> >>>>>>> Yes. I removed the debug print added the barrier, but that did not help.
> >>>>>>
> >>>>>> So, I was finally able to setup NFS using your replicated setup and I
> >>>>>> can't see the issue :(
> >>>>>>
> >>>>>> The only difference I have from yours is that I'm using TCP in NFS
> >>>>>> whilst you (I believe from the logs), use UDP.
> >>>>>
> >>>>> So I tried TCP by setting the kernel boot params to 'nfsvers=3' and
> >>>>> 'proto=tcp' and this does appear to be more stable, but not 100% stable.
> >>>>> It still appears to fail in the same place about 50% of the time.
> >>>>>
> >>>>>> You do have flow control active right ? And your HW FIFO size is >= 4k ?
> >>>>>
> >>>>> How can I verify if flow control is active?
> >>>>
> >>>> You can check it by dumping register MTL_RxQ_Operation_Mode (0xd30).
> >>
> >> Where would be the appropriate place to dump this? After probe? Maybe
> >> best if you can share a code snippet of where to dump this.
> >>
> >>>> Can you also add IOMMU debug in file "drivers/iommu/iommu.c" ?
> >>
> >> You can find a boot log here:
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.ubuntu.com_p_qtRqtYKHGF_&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=NrxsR2etpZHGb7HkN4XdgaGmKM1XYyldihNPL6qVSv0&s=CMATEcHVoqZw4sIrNOXc7SFE_kV_5CO5EU21-yJez6c&e=
> >>
> >>> And, please try attached debug patch.
> >>
> >> With this patch it appears to boot fine. So far no issues seen.
> > 
> > Thank you for testing.
> > 
> > Hi Catalin and Will,
> > 
> > Sorry to add you in such a long thread but we are seeing a DMA issue
> > with stmmac driver in an ARM64 platform with IOMMU enabled.
> > 
> > The issue seems to be solved when buffers allocation for DMA based
> > transfers are *not* mapped with the DMA_ATTR_SKIP_CPU_SYNC flag *OR*
> > when IOMMU is disabled.
> > 
> > Notice that after transfer is done we do use
> > dma_sync_single_for_{cpu,device} and then we reuse *the same* page for
> > another transfer.
> > 
> > Can you please comment on whether DMA_ATTR_SKIP_CPU_SYNC can not be used
> > in ARM64 platforms with IOMMU ?
> 
> In terms of what they do, there should be no difference on arm64 between:
> 
> dma_map_page(..., dir);
> ...
> dma_unmap_page(..., dir);
> 
> and:
> 
> dma_map_page_attrs(..., dir, DMA_ATTR_SKIP_CPU_SYNC);
> dma_sync_single_for_device(..., dir);
> ...
> dma_sync_single_for_cpu(..., dir);
> dma_unmap_page_attrs(..., dir, DMA_ATTR_SKIP_CPU_SYNC);
> 
> provided that the first sync covers the whole buffer and any subsequent 
> ones cover at least the parts of the buffer which may have changed. Plus 
> for coherent hardware it's entirely moot either way.

Thanks for confirming. That's indeed what stmmac is doing when buffer is 
received by syncing the packet size to CPU.

> 
> Given Jon's previous findings, I would lean towards the idea that 
> performing the extra (redundant) cache maintenance plus barrier in 
> dma_unmap is mostly just perturbing timing in the same way as the debug 
> print which also made things seem OK.

Mikko said that Tegra186 is not coherent so we have to explicit flush 
pipeline but I don't understand why sync_single() is not doing it ...

Jon, can you please remove *all* debug prints, hacks, etc ... and test 
this one in attach with plain -net tree ?

---
Thanks,
Jose Miguel Abreu

Attachment: 0001-net-stmmac-Flush-all-data-cache-in-RX-path.patch
Description: 0001-net-stmmac-Flush-all-data-cache-in-RX-path.patch


[Index of Archives]     [ARM Kernel]     [Linux ARM]     [Linux ARM MSM]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux