On Wed, Dec 4, 2013 at 6:56 PM, Alex Deucher <alexdeucher@xxxxxxxxx> wrote: > On Wed, Dec 4, 2013 at 5:16 PM, Kleber Sacilotto de Souza > <klebers@xxxxxxxxxxxxxxxxxx> wrote: >> On 11/25/2013 10:11 PM, Kleber Sacilotto de Souza wrote: >>> >>> On 11/24/2013 09:15 PM, Benjamin Herrenschmidt wrote: >>>> >>>> On Fri, 2013-11-08 at 11:43 -0200, Kleber Sacilotto de Souza wrote: >>>>> >>>>> On 11/07/2013 08:29 PM, Benjamin Herrenschmidt wrote: >>>>>> >>>>>> On Mon, 2013-06-17 at 18:57 -0400, Alex Deucher wrote: >>>>>> >>>>>>> Weird. I wonder if there is an issue with cache snoops on PPC. We >>>>>>> currently use the gart in cached mode (GPU snoops CPU cache) with >>>>>>> cached pages. I wonder if we need to use uncached pages on PPC. >>>>>> >>>>>> There is no such issue and no known bugs with DMA writes on those >>>>>> PCIe host bridges (and they do get hammered pretty bad here). >>>>>> >>>>>> This needs further investigation by the lab/hw guys to find out what's >>>>>> actually happening on the bus and the host bridge. >>>>>> >>>>>> Thadeu, Kleber: Jerome suggested writing a test case in userspace that >>>>>> continuously writes to a spare scratch register (thus triggering the >>>>>> corresponding writeback DMA) and checks the memory location to compare >>>>>> the writeback value (using a debugfs file for example, or mmap). >> >> >> I was not able to reproduce the issue with this method, even after a weekend >> run. >> >> However, doing some more investigation it seems the problem is here, where >> we read the ring rptr: >> >> u32 radeon_ring_generic_get_rptr(struct radeon_device *rdev, >> struct radeon_ring *ring) >> { >> u32 rptr; >> >> if (rdev->wb.enabled) >> rptr = le32_to_cpu(rdev->wb.wb[ring->rptr_offs/4]); >> else >> rptr = RREG32(ring->rptr_reg); >> >> return rptr; >> } >> >> I realized that the DMA'ed rptr value has always the opposite byte order >> from the MMIO value. Since RREG32 already returns the register value on the >> CPU byte order, it seems we don't need to byte-swap the DMA'ed value. If I >> remove the le32_to_cpu() call and use the DMA'ed value directly, I don't get >> the IB scheduling failures and piglit results are the same as with writeback >> disabled. >> >> Is the adapter chipset swapping the bytes before doing the DMA to a >> big-endian host? > > Setting CP_RB_CNTL.BUF_SWAP causes the CP to use the selected byte > swapping for just about everything accessed by the CP (rptr writeback, > indirect buffers, etc.). Looks like the DMA ring supports and enables > rptr writeback as well (DMA_RB_CNTL.DMA_RPTR_WRITEBACK_SWAP_ENABLE) so > I think we can drop the swapping of the rptr writeback. > Obvious patch attached. Alex > Alex > >> >> >> >> -- >> Kleber Sacilotto de Souza >> IBM Linux Technology Center >>
From 0f7705c378a14060552df19c0e724e47632da4d8 Mon Sep 17 00:00:00 2001 From: Alex Deucher <alexander.deucher@xxxxxxx> Date: Wed, 4 Dec 2013 19:02:08 -0500 Subject: [PATCH] drm/radeon: don't byteswap readback of rptr writeback We already enable byteswapping of the rptr writeback in the hw. Fixes incorrect rptr readback on BE systems. Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> Cc: stable@xxxxxxxxxxxxxxx --- drivers/gpu/drm/radeon/radeon_ring.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/radeon/radeon_ring.c b/drivers/gpu/drm/radeon/radeon_ring.c index 9214403..56ff0cd 100644 --- a/drivers/gpu/drm/radeon/radeon_ring.c +++ b/drivers/gpu/drm/radeon/radeon_ring.c @@ -338,7 +338,7 @@ u32 radeon_ring_generic_get_rptr(struct radeon_device *rdev, u32 rptr; if (rdev->wb.enabled) - rptr = le32_to_cpu(rdev->wb.wb[ring->rptr_offs/4]); + rptr = rdev->wb.wb[ring->rptr_offs/4]; else rptr = RREG32(ring->rptr_reg); -- 1.8.3.1
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel