On Sun, Sep 18, 2011 at 03:59:50PM +0200, Daniel Vetter wrote: > On Sun, Sep 18, 2011 at 03:18:57PM +0200, Marcin Slusarz wrote: > > Currently DRM_NOUVEAU_GEM_CPU_PREP ioctl is broken WRT handling of signals. > > > > nouveau_gem_ioctl_cpu_prep calls ttm_bo_wait which waits for fence to > > "signal" or 3 seconds timeout pass. > > But if it detects pending signal, it returns ERESTARTSYS and goes back > > to userspace. After signal handler, userspace repeats the same ioctl which > > starts _new 3 seconds loop_. > > So when the application relies on signals, some ioctls may never finish > > from application POV. > > > > There is one important application which does this - Xorg. It uses SIGIO > > (for input handling) and SIGALARM. > > > > GPU lockups lead to endless ioctl loop which eventually manifests in crash > > with "[mi] EQ overflowing. The server is probably stuck in an infinite loop." > > message instead of being propagated to DDX. > > > > The solutions is to add new ioctl NOUVEAU_GEM_CPU_PREP_TIMEOUT with > > timeout parameter and decrease it on every signal. > > Just fyi: We handle that issue in i915 by returning -EIO when the kernel > decides that the gpu has died for good and that resetting doesn't help. > Until then we rely on the ioctl restarting to kick everyone out of kernel > mode so the reset handler can do its business. If the reset is > successfull, userspace continues (due to the ioctl being restarted) > hopefully mostly undisturbed. While the gpu is hung, but not yet reset, we > stall all ioctls before taking the struct_mutex (see i915_gem_wait_error > in i915_mutex_lock_interruptible). > > Imo the advantage of that approach is that the kernel utlimately decides > when the gpu is gone, and userspace (lacking much of the required > information) must not engage in such guessing-games, too. This approach would be preferrable, but we don't know yet how to reset nvidia's gpu. Fixing this API bug could at least let us degrade to noaccel. And I believe there are cases where ttm_bo_wait can fail with EBUSY and it doesn't mean GPU locked up... Marcin _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel