From: Robin Murphy <robin.murphy@xxxxxxx> Date: Jul/29/2019, 12:52:02 (UTC+00:00) > On 29/07/2019 12:29, Jose Abreu wrote: > > ++ Catalin, Will (ARM64 Maintainers) > > > > From: Jon Hunter <jonathanh@xxxxxxxxxx> > > Date: Jul/29/2019, 11:55:18 (UTC+00:00) > > > >> > >> On 29/07/2019 09:16, Jose Abreu wrote: > >>> From: Jose Abreu <joabreu@xxxxxxxxxxxx> > >>> Date: Jul/27/2019, 16:56:37 (UTC+00:00) > >>> > >>>> From: Jon Hunter <jonathanh@xxxxxxxxxx> > >>>> Date: Jul/26/2019, 15:11:00 (UTC+00:00) > >>>> > >>>>> > >>>>> On 25/07/2019 16:12, Jose Abreu wrote: > >>>>>> From: Jon Hunter <jonathanh@xxxxxxxxxx> > >>>>>> Date: Jul/25/2019, 15:25:59 (UTC+00:00) > >>>>>> > >>>>>>> > >>>>>>> On 25/07/2019 14:26, Jose Abreu wrote: > >>>>>>> > >>>>>>> ... > >>>>>>> > >>>>>>>> Well, I wasn't expecting that :/ > >>>>>>>> > >>>>>>>> Per documentation of barriers I think we should set descriptor fields > >>>>>>>> and then barrier and finally ownership to HW so that remaining fields > >>>>>>>> are coherent before owner is set. > >>>>>>>> > >>>>>>>> Anyway, can you also add a dma_rmb() after the call to > >>>>>>>> stmmac_rx_status() ? > >>>>>>> > >>>>>>> Yes. I removed the debug print added the barrier, but that did not help. > >>>>>> > >>>>>> So, I was finally able to setup NFS using your replicated setup and I > >>>>>> can't see the issue :( > >>>>>> > >>>>>> The only difference I have from yours is that I'm using TCP in NFS > >>>>>> whilst you (I believe from the logs), use UDP. > >>>>> > >>>>> So I tried TCP by setting the kernel boot params to 'nfsvers=3' and > >>>>> 'proto=tcp' and this does appear to be more stable, but not 100% stable. > >>>>> It still appears to fail in the same place about 50% of the time. > >>>>> > >>>>>> You do have flow control active right ? And your HW FIFO size is >= 4k ? > >>>>> > >>>>> How can I verify if flow control is active? > >>>> > >>>> You can check it by dumping register MTL_RxQ_Operation_Mode (0xd30). > >> > >> Where would be the appropriate place to dump this? After probe? Maybe > >> best if you can share a code snippet of where to dump this. > >> > >>>> Can you also add IOMMU debug in file "drivers/iommu/iommu.c" ? > >> > >> You can find a boot log here: > >> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__paste.ubuntu.com_p_qtRqtYKHGF_&d=DwICaQ&c=DPL6_X_6JkXFx7AXWqB0tg&r=WHDsc6kcWAl4i96Vm5hJ_19IJiuxx_p_Rzo2g-uHDKw&m=NrxsR2etpZHGb7HkN4XdgaGmKM1XYyldihNPL6qVSv0&s=CMATEcHVoqZw4sIrNOXc7SFE_kV_5CO5EU21-yJez6c&e= > >> > >>> And, please try attached debug patch. > >> > >> With this patch it appears to boot fine. So far no issues seen. > > > > Thank you for testing. > > > > Hi Catalin and Will, > > > > Sorry to add you in such a long thread but we are seeing a DMA issue > > with stmmac driver in an ARM64 platform with IOMMU enabled. > > > > The issue seems to be solved when buffers allocation for DMA based > > transfers are *not* mapped with the DMA_ATTR_SKIP_CPU_SYNC flag *OR* > > when IOMMU is disabled. > > > > Notice that after transfer is done we do use > > dma_sync_single_for_{cpu,device} and then we reuse *the same* page for > > another transfer. > > > > Can you please comment on whether DMA_ATTR_SKIP_CPU_SYNC can not be used > > in ARM64 platforms with IOMMU ? > > In terms of what they do, there should be no difference on arm64 between: > > dma_map_page(..., dir); > ... > dma_unmap_page(..., dir); > > and: > > dma_map_page_attrs(..., dir, DMA_ATTR_SKIP_CPU_SYNC); > dma_sync_single_for_device(..., dir); > ... > dma_sync_single_for_cpu(..., dir); > dma_unmap_page_attrs(..., dir, DMA_ATTR_SKIP_CPU_SYNC); > > provided that the first sync covers the whole buffer and any subsequent > ones cover at least the parts of the buffer which may have changed. Plus > for coherent hardware it's entirely moot either way. Thanks for confirming. That's indeed what stmmac is doing when buffer is received by syncing the packet size to CPU. > > Given Jon's previous findings, I would lean towards the idea that > performing the extra (redundant) cache maintenance plus barrier in > dma_unmap is mostly just perturbing timing in the same way as the debug > print which also made things seem OK. Mikko said that Tegra186 is not coherent so we have to explicit flush pipeline but I don't understand why sync_single() is not doing it ... Jon, can you please remove *all* debug prints, hacks, etc ... and test this one in attach with plain -net tree ? --- Thanks, Jose Miguel Abreu
Attachment:
0001-net-stmmac-Flush-all-data-cache-in-RX-path.patch
Description: 0001-net-stmmac-Flush-all-data-cache-in-RX-path.patch