On Sun, Sep 18, 2011 at 03:18:57PM +0200, Marcin Slusarz wrote: > Currently DRM_NOUVEAU_GEM_CPU_PREP ioctl is broken WRT handling of signals. > > nouveau_gem_ioctl_cpu_prep calls ttm_bo_wait which waits for fence to > "signal" or 3 seconds timeout pass. > But if it detects pending signal, it returns ERESTARTSYS and goes back > to userspace. After signal handler, userspace repeats the same ioctl which > starts _new 3 seconds loop_. > So when the application relies on signals, some ioctls may never finish > from application POV. > > There is one important application which does this - Xorg. It uses SIGIO > (for input handling) and SIGALARM. > > GPU lockups lead to endless ioctl loop which eventually manifests in crash > with "[mi] EQ overflowing. The server is probably stuck in an infinite loop." > message instead of being propagated to DDX. > > The solutions is to add new ioctl NOUVEAU_GEM_CPU_PREP_TIMEOUT with > timeout parameter and decrease it on every signal. Just fyi: We handle that issue in i915 by returning -EIO when the kernel decides that the gpu has died for good and that resetting doesn't help. Until then we rely on the ioctl restarting to kick everyone out of kernel mode so the reset handler can do its business. If the reset is successfull, userspace continues (due to the ioctl being restarted) hopefully mostly undisturbed. While the gpu is hung, but not yet reset, we stall all ioctls before taking the struct_mutex (see i915_gem_wait_error in i915_mutex_lock_interruptible). Imo the advantage of that approach is that the kernel utlimately decides when the gpu is gone, and userspace (lacking much of the required information) must not engage in such guessing-games, too. -Daniel -- Daniel Vetter Mail: daniel@xxxxxxxx Mobile: +41 (0)79 365 57 48 _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel