On Tue, Jan 07, 2025 at 03:58:46PM +1000, Dave Airlie wrote: > From: Dave Airlie <airlied@xxxxxxxxxx> > > If we have two nouveau controlled devices and one passes a dma-fence > to the other, when we hit the sync path it can cause the second device > to try and put a sync wait in it's pushbuf for the seqno of the context > on the first device. > > Since fence contexts are vmm bound, check the if vmm's match between > both users, this should ensure that fence seqnos don't get used wrongly > on incorrect channels. The fence sequence number is global, i.e. per device, hence checking the vmm context seems too restrictive. Wouldn't it be better to ensure that `prev->cli->drm == chan->cli->drm`? This way we can still optimize where dependencies are between different applications, but on the same device. > > This seems to happen fairly spuriously and I found it tracking down > a multi-card regression report, that seems to work by luck before this. > > Signed-off-by: Dave Airlie <airlied@xxxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx > --- > drivers/gpu/drm/nouveau/nouveau_fence.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/nouveau/nouveau_fence.c b/drivers/gpu/drm/nouveau/nouveau_fence.c > index ee5e9d40c166f..5743c82f4094b 100644 > --- a/drivers/gpu/drm/nouveau/nouveau_fence.c > +++ b/drivers/gpu/drm/nouveau/nouveau_fence.c > @@ -370,7 +370,8 @@ nouveau_fence_sync(struct nouveau_bo *nvbo, struct nouveau_channel *chan, > > rcu_read_lock(); > prev = rcu_dereference(f->channel); > - if (prev && (prev == chan || > + if (prev && (prev->vmm == chan->vmm) && > + (prev == chan || Maybe better break it down a bit, e.g. bool local = prev && (prev->... == chan->...); if (local && ...) { ... } > fctx->sync(f, prev, chan) == 0)) > must_wait = false; > rcu_read_unlock(); > -- > 2.43.0 >