On Wed, Apr 04, 2018 at 11:32:54AM +0200, Daniel Vetter wrote: > So we've done some experiments for the case where the fault originated > from kernel context (copy_to|from_user and friends). The fixup code seems > to retry the copy once after the fault (in copy_user_handle_tail), if that > fails again we get a short read/write. This might result in an -EFAULT, > short read()/write() or anything else really, depending upon the syscall > api. > > Except in some code paths in gpu drivers where we convert anything into > -ERESTARTSYS/EINTR if there's a signal pending it won't ever result in the > syscall getting restarted (well except maybe short read/writes if > userspace bothers with that). > > So I guess gpu fault handlers indeed break the kernel's expectations, but > then I think we're getting away with that because the inner workings of > gpu memory objects is all heavily abstracted away by opengl/vulkan and > friends. > > I guess what we could do is try to only do killable sleeps if it's a > kernel fault, but that means wiring a flag through all the callchains. Not > pretty. Except when there's a magic set of functions that would convert > all interruptible sleeps to killable ones only for us. I actually have plans to allow mutex_lock_{interruptible,killable} to return -EWOULDBLOCK if a flag is set. So this doesn't seem entirely unrelated. Something like this perhaps: struct task_struct { + unsigned int sleep_state; }; static noinline int __sched -__mutex_lock_interruptible_slowpath(struct mutex *lock) +__mutex_lock_slowpath(struct mutex *lock, long state) { - return __mutex_lock(lock, TASK_INTERRUPTIBLE, 0, NULL, _RET_IP_); + if (state == TASK_NOBLOCK) + return -EWOULDBLOCK; + return __mutex_lock(lock, state, 0, NULL, _RET_IP_); } +int __sched mutex_lock_state(struct mutex *lock, long state) +{ + might_sleep(); + + if (__mutex_trylock_fast(lock)) + return 0; + + return __mutex_lock_slowpath(lock, state); +} +EXPORT_SYMBOL(mutex_lock_state); Then the page fault handler can do something like: old_state = current->sleep_state; current->sleep_state = TASK_INTERRUPTIBLE; ... current->sleep_state = old_state; This has the page-fault-in-a-signal-handler problem. I don't know if there's a way to determine if we're already in a signal handler and use a different sleep_state ...?