On Wed, Sep 4, 2013 at 10:37 PM, Maarten Lankhorst <maarten.lankhorst@xxxxxxxxxxxxx> wrote: > Op 04-09-13 05:21, Ben Skeggs schreef: >> On Tue, Sep 3, 2013 at 12:31 AM, Maarten Lankhorst >> <maarten.lankhorst@xxxxxxxxxxxxx> wrote: >>> This increases the chance slightly that recovery from lockup can happen >>> succesfully. >> I'd *really* love to see proof of this. When channels die, all >> outstanding fences are marked as signalled. This should do absolutely >> nothing... > nv84+ heavily rely on fences though, and a race like this is possible: > - channel 0 uses a bo from channel 1, queues a wait somewhere in the command stream for it. > - channel 1 dies cleanly, but userspace creates a new channel in its place, fence counter is reset to 0. > - channel 0 reaches the NV84_SUBCHAN_SEMAPHORE_TRIGGER.ACQUIRE_GEQUAL op, waits on fence in channel 1 to signal forever. Ok, this isn't exactly the issue you implied in the commit message. But yes, this could possibly be an issue for sure. I don't think this is the right way to fix it however. I'll have a bit of a think on the problem and see what I can come up with. Thanks, Ben. > > Channel 0 could be the global drm channel used for buffer moves, which would result in a hang. This may seem unlikely, but I believe that parallel piglit runs could trigger it. > > If not, simply creating an operation that takes a few seconds in channel 0 and then queuing a command that uses a bo from channel 1 while chan1 is still busy, then deleting/recreating chan1 could trigger it. > > ~Maarten > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel