Re: [PATCH 2/2] mmiowb: Hook up mmiowb helpers to mutexes as well as spinlocks

Huacai Chen <chenhuacai@xxxxxxxxxx> · Fri, 12 Apr 2024 22:35:28 +0800

Hi, Will,

On Fri, Apr 12, 2024 at 9:32 PM Will Deacon <will@xxxxxxxxxx> wrote:
>
> On Fri, Mar 01, 2024 at 09:05:32PM +0800, Huacai Chen wrote:
> > Commit fb24ea52f78e0d595852e ("drivers: Remove explicit invocations of
> > mmiowb()") remove all mmiowb() in drivers, but it says:
> >
> > "NOTE: mmiowb() has only ever guaranteed ordering in conjunction with
> > spin_unlock(). However, pairing each mmiowb() removal in this patch with
> > the corresponding call to spin_unlock() is not at all trivial, so there
> > is a small chance that this change may regress any drivers incorrectly
> > relying on mmiowb() to order MMIO writes between CPUs using lock-free
> > synchronisation."
> >
> > The mmio in radeon_ring_commit() is protected by a mutex rather than a
> > spinlock, but in the mutex fastpath it behaves similar to spinlock. We
> > can add mmiowb() calls in the radeon driver but the maintainer says he
> > doesn't like such a workaround, and radeon is not the only example of
> > mutex protected mmio.
>
> Oh no! Ostrich programming is real!
>
> https://lore.kernel.org/lkml/CAHk-=wgbnn7x+i72NqnvXotbxjsk2Ag56Q5YP0OSvhY9sUk7QA@xxxxxxxxxxxxxx/
Yes, you are probably right, so we solved it in the architectural code
like this finally.

https://lore.kernel.org/loongarch/20240305143958.1752241-1-chenhuacai@xxxxxxxxxxx/T/#u

Huacai

>
> > So we extend the mmiowb tracking system from spinlock to mutex, hook up
> > mmiowb helpers to mutexes as well as spinlocks.
> >
> > Without this, we get such an error when run 'glxgears' on weak ordering
> > architectures such as LoongArch:
> >
> > radeon 0000:04:00.0: ring 0 stalled for more than 10324msec
> > radeon 0000:04:00.0: ring 3 stalled for more than 10240msec
> > radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000001f412 last fence id 0x000000000001f414 on ring 3)
> > radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000000f940 last fence id 0x000000000000f941 on ring 0)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> > radeon 0000:04:00.0: scheduling IB failed (-35).
> > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35)
> >
> > Link: https://lore.kernel.org/dri-devel/29df7e26-d7a8-4f67-b988-44353c4270ac@xxxxxxx/T/#t
>
> Hmm. It's hard to tell from that thread whether you're really falling
> afoul of the problems that mmiowb() tries to solve or whether Loongson
> simply doesn't have enough ordering in its MMIO accessors.
>
> The code you proposed has:
>
>         mmiowb(); /* Make sure wptr is up-to-date for hw */
>
> but mmiowb() is really about ensuring order with MMIO accesses from a
> different CPU that subsequently takes the lock.
>
> So please can you explain a bit more about the failing case? I'm worried
> that mmiowb() may be the wrong tool for the job here.
>
> Thanks,
>
> Will