On Thu, Sep 5, 2013 at 6:30 PM, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > It also a confusion that the kernel can't prevent. > > lock_mutex > A checks bo, reports it idle > unlock_mutex > > lock_mutex > B renders to bo > unlock_mutex > > lock_mutex > A uses bo, stalls > unlock_mutex > > Whether or not the checking of the bo is locked is irrelevant as it can > be gazzumped at anytime between the check and the use. Nope, I'm talking about a different kind of confusion. Object A is busy on the RCS with seqno 1. Some other guy submits a bit of work to the blitter with seqno 2. Blitter finishes work, so signalled seqno is 2, RCS is still busy. dri client sends a buffer swap request to the display server with object A. Display server does pageflip/blit/whatever, just something which will force the kernel to move object A to the ring. After that object A is busy on the BLT with seqno 3. Concurrently our dri client runs the busy ioctl and reads ring == blitter, seqno == 1 and concludes that the object not busy. And this can happen while the RCS hasn't even finished rendering the original request from the client for object A. And that broken and will be prevented by locking. So the scenario I'm talking about is not the client racing the busy against _new_ command submission, but the kernel lying to the client about the completion of old commands which have been submitted all from the same thread context. We've already had a similar bug for the last_write_seqno where we updated the ring but used the old seqno, resulting in mayhem. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx