Hi, Will, On Fri, Apr 12, 2024 at 9:32 PM Will Deacon <will@xxxxxxxxxx> wrote: > > On Fri, Mar 01, 2024 at 09:05:32PM +0800, Huacai Chen wrote: > > Commit fb24ea52f78e0d595852e ("drivers: Remove explicit invocations of > > mmiowb()") remove all mmiowb() in drivers, but it says: > > > > "NOTE: mmiowb() has only ever guaranteed ordering in conjunction with > > spin_unlock(). However, pairing each mmiowb() removal in this patch with > > the corresponding call to spin_unlock() is not at all trivial, so there > > is a small chance that this change may regress any drivers incorrectly > > relying on mmiowb() to order MMIO writes between CPUs using lock-free > > synchronisation." > > > > The mmio in radeon_ring_commit() is protected by a mutex rather than a > > spinlock, but in the mutex fastpath it behaves similar to spinlock. We > > can add mmiowb() calls in the radeon driver but the maintainer says he > > doesn't like such a workaround, and radeon is not the only example of > > mutex protected mmio. > > Oh no! Ostrich programming is real! > > https://lore.kernel.org/lkml/CAHk-=wgbnn7x+i72NqnvXotbxjsk2Ag56Q5YP0OSvhY9sUk7QA@xxxxxxxxxxxxxx/ Yes, you are probably right, so we solved it in the architectural code like this finally. https://lore.kernel.org/loongarch/20240305143958.1752241-1-chenhuacai@xxxxxxxxxxx/T/#u Huacai > > > So we extend the mmiowb tracking system from spinlock to mutex, hook up > > mmiowb helpers to mutexes as well as spinlocks. > > > > Without this, we get such an error when run 'glxgears' on weak ordering > > architectures such as LoongArch: > > > > radeon 0000:04:00.0: ring 0 stalled for more than 10324msec > > radeon 0000:04:00.0: ring 3 stalled for more than 10240msec > > radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000001f412 last fence id 0x000000000001f414 on ring 3) > > radeon 0000:04:00.0: GPU lockup (current fence id 0x000000000000f940 last fence id 0x000000000000f941 on ring 0) > > radeon 0000:04:00.0: scheduling IB failed (-35). > > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) > > radeon 0000:04:00.0: scheduling IB failed (-35). > > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) > > radeon 0000:04:00.0: scheduling IB failed (-35). > > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) > > radeon 0000:04:00.0: scheduling IB failed (-35). > > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) > > radeon 0000:04:00.0: scheduling IB failed (-35). > > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) > > radeon 0000:04:00.0: scheduling IB failed (-35). > > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) > > radeon 0000:04:00.0: scheduling IB failed (-35). > > [drm:radeon_gem_va_ioctl [radeon]] *ERROR* Couldn't update BO_VA (-35) > > > > Link: https://lore.kernel.org/dri-devel/29df7e26-d7a8-4f67-b988-44353c4270ac@xxxxxxx/T/#t > > Hmm. It's hard to tell from that thread whether you're really falling > afoul of the problems that mmiowb() tries to solve or whether Loongson > simply doesn't have enough ordering in its MMIO accessors. > > The code you proposed has: > > mmiowb(); /* Make sure wptr is up-to-date for hw */ > > but mmiowb() is really about ensuring order with MMIO accesses from a > different CPU that subsequently takes the lock. > > So please can you explain a bit more about the failing case? I'm worried > that mmiowb() may be the wrong tool for the job here. > > Thanks, > > Will