On Thu, May 16, 2024 at 08:20:05PM GMT, Akhil P Oommen wrote: > On Thu, May 16, 2024 at 08:15:34AM -0500, Andrew Halaney wrote: > > On Wed, May 15, 2024 at 12:08:49AM GMT, Akhil P Oommen wrote: > > > On Wed, May 08, 2024 at 07:46:31PM +0200, Konrad Dybcio wrote: > > > > Memory barriers help ensure instruction ordering, NOT time and order > > > > of actual write arrival at other observers (e.g. memory-mapped IP). > > > > On architectures employing weak memory ordering, the latter can be a > > > > giant pain point, and it has been as part of this driver. > > > > > > > > Moreover, the gpu_/gmu_ accessors already use non-relaxed versions of > > > > readl/writel, which include r/w (respectively) barriers. > > > > > > > > Replace the barriers with a readback that ensures the previous writes > > > > have exited the write buffer (as the CPU must flush the write to the > > > > register it's trying to read back) and subsequently remove the hack > > > > introduced in commit b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt > > > > status in hw_init"). > > > > For what its worth, I've been eyeing (but haven't tested) sending some > > patches to clean up dsi_phy_write_udelay/ndelay(). There's no ordering > > guarantee between a writel() and a delay(), so the expected "write then > > delay" sequence might not be happening.. you need to write, read, delay. > > > > memory-barriers.txt: > > > > 5. A readX() by a CPU thread from the peripheral will complete before > > any subsequent delay() loop can begin execution on the same thread. > > This ensures that two MMIO register writes by the CPU to a peripheral > > will arrive at least 1us apart if the first write is immediately read > > back with readX() and udelay(1) is called prior to the second > > writeX(): > > > > writel(42, DEVICE_REGISTER_0); // Arrives at the device... > > readl(DEVICE_REGISTER_0); > > udelay(1); > > writel(42, DEVICE_REGISTER_1); // ...at least 1us before this. > > Yes, udelay orders only with readl(). I saw a patch from Will Deacon > which fixes this for arm64 few years back: > https://lore.kernel.org/all/1543251228-30001-1-git-send-email-will.deacon@xxxxxxx/T/ > > But this is needed only when you write io and do cpuside wait , not when > you poll io to check status. Sure, I'm just highlighting the bug in dsi_phy_write_*delay() functions here, which match the udelay case you mention. > > > > > > > > > > > Fixes: b77532803d11 ("drm/msm/a6xx: Poll for GBIF unhalt status in hw_init") > > > > Signed-off-by: Konrad Dybcio <konrad.dybcio@xxxxxxxxxx> > > > > --- > > > > drivers/gpu/drm/msm/adreno/a6xx_gmu.c | 5 ++--- > > > > drivers/gpu/drm/msm/adreno/a6xx_gpu.c | 14 ++++---------- > > > > 2 files changed, 6 insertions(+), 13 deletions(-) > > > > > > I prefer this version compared to the v2. A helper routine is > > > unnecessary here because: > > > 1. there are very few scenarios where we have to read back the same > > > register. > > > 2. we may accidently readback a write only register. > > > > > > > > > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > > > > index 0e3dfd4c2bc8..4135a53b55a7 100644 > > > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > > > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gmu.c > > > > @@ -466,9 +466,8 @@ static int a6xx_rpmh_start(struct a6xx_gmu *gmu) > > > > int ret; > > > > u32 val; > > > > > > > > - gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, 1 << 1); > > > > - /* Wait for the register to finish posting */ > > > > - wmb(); > > > > + gmu_write(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ, BIT(1)); > > > > + gmu_read(gmu, REG_A6XX_GMU_RSCC_CONTROL_REQ); > > > > > > This is unnecessary because we are polling on a register on the same port below. But I think we > > > can replace "wmb()" above with "mb()" to avoid reordering between read > > > and write IO instructions. > > > > If I understand correctly, you don't need any memory barrier. > > writel()/readl()'s are ordered to the same endpoint. That goes for all > > the reordering/barrier comments mentioned below too. > > > > device-io.rst: > > > > The read and write functions are defined to be ordered. That is the > > compiler is not permitted to reorder the I/O sequence. When the ordering > > can be compiler optimised, you can use __readb() and friends to > > indicate the relaxed ordering. Use this with care. > > > > memory-barriers.txt: > > > > (*) readX(), writeX(): > > > > The readX() and writeX() MMIO accessors take a pointer to the > > peripheral being accessed as an __iomem * parameter. For pointers > > mapped with the default I/O attributes (e.g. those returned by > > ioremap()), the ordering guarantees are as follows: > > > > 1. All readX() and writeX() accesses to the same peripheral are ordered > > with respect to each other. This ensures that MMIO register accesses > > by the same CPU thread to a particular device will arrive in program > > order. > > > > In arm64, a writel followed by readl translates to roughly the following > sequence: dmb_wmb(), __raw_writel(), __raw_readl(), dmb_rmb(). I am not > sure what is stopping compiler from reordering __raw_writel() and __raw_readl() > above? I am assuming iomem cookie is ignored during compilation. It seems to me that is due to some usage of volatile there in __raw_writel() etc, but to be honest after reading about volatile and some threads from gcc mailing lists, I don't have a confident answer :) > > Added Will to this thread if he can throw some light on this. Hopefully Will can school us. > > -Akhil > > > > > > > > > > > > > > ret = gmu_poll_timeout(gmu, REG_A6XX_GMU_RSCC_CONTROL_ACK, val, > > > > val & (1 << 1), 100, 10000); > > > > diff --git a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > > > > index 973872ad0474..0acbc38b8e70 100644 > > > > --- a/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > > > > +++ b/drivers/gpu/drm/msm/adreno/a6xx_gpu.c > > > > @@ -1713,22 +1713,16 @@ static int hw_init(struct msm_gpu *gpu) > > > > } > > > > > > > > /* Clear GBIF halt in case GX domain was not collapsed */ > > > > + gpu_write(gpu, REG_A6XX_GBIF_HALT, 0); > > > > > > We need a full barrier here to avoid reordering. Also, lets add a > > > comment about why we are doing this odd looking sequence. > > > > > > > + gpu_read(gpu, REG_A6XX_GBIF_HALT); > > > > if (adreno_is_a619_holi(adreno_gpu)) { > > > > - gpu_write(gpu, REG_A6XX_GBIF_HALT, 0); > > > > gpu_write(gpu, REG_A6XX_RBBM_GPR0_CNTL, 0); > > > > - /* Let's make extra sure that the GPU can access the memory.. */ > > > > - mb(); > > > > > > We need a full barrier here. > > > > > > > + gpu_read(gpu, REG_A6XX_RBBM_GPR0_CNTL); > > > > } else if (a6xx_has_gbif(adreno_gpu)) { > > > > - gpu_write(gpu, REG_A6XX_GBIF_HALT, 0); > > > > gpu_write(gpu, REG_A6XX_RBBM_GBIF_HALT, 0); > > > > - /* Let's make extra sure that the GPU can access the memory.. */ > > > > - mb(); > > > > > > We need a full barrier here. > > > > > > > + gpu_read(gpu, REG_A6XX_RBBM_GBIF_HALT); > > > > } > > > > > > > > - /* Some GPUs are stubborn and take their sweet time to unhalt GBIF! */ > > > > - if (adreno_is_a7xx(adreno_gpu) && a6xx_has_gbif(adreno_gpu)) > > > > - spin_until(!gpu_read(gpu, REG_A6XX_GBIF_HALT_ACK)); > > > > - > > > > > > Why is this removed? > > > > > > -Akhil > > > > > > > gpu_write(gpu, REG_A6XX_RBBM_SECVID_TSB_CNTL, 0); > > > > > > > > if (adreno_is_a619_holi(adreno_gpu)) > > > > > > > > --- > > > > base-commit: 93a39e4766083050ca0ecd6a3548093a3b9eb60c > > > > change-id: 20240508-topic-adreno-a2d199cd4152 > > > > > > > > Best regards, > > > > -- > > > > Konrad Dybcio <konrad.dybcio@xxxxxxxxxx> > > > > > > > > > >