On 25 May 2017 at 06:13, Andy Lutomirski <luto@xxxxxxxxxx> wrote: >>> >>> Can you give a simple example of what's going on and why it matters? >>> Here is the use case in which I ran into this problem. I have a driver which does request_firmware() when a write() is done to a sysfs file. The write() was being done by an android init script (with the init interpreter "write" command). init, of course, forks lots of processes and some of the children die. So the scenario was the following: 1) Android init calls write() on the sysfs file 2) The sysfs .store() callback registered by a driver is called 3) The driver calls request_firmware() 4) request_firmware() sends the firmware load request to userspace and calls wait_for_completion_interruptible() 5) A child dies and raises SIGCHLD 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal 7) request_firmware() [before this patch] translated that to -EAGAIN 8) The driver (in my case) ignored this [because the firmware was not critical - it was for checking if a microcontroler was up to date] (but it could have returned it to userspace, same problem) The point being that, due to a signal (SIGCHLD) which has nothing to do with the firmware loading process, the firmware load was not done. Also EAGAIN is the same error used if the load request times out so it was impossible to distinguish the two cases. ERESTARTSYS is an internal error and is not returned to userspace. Instead it is handled by the linux syscall machinery which, after processing the signal either restarts (transpently to userspace) the syscall or returns EINTR to userspace (depending if the signal handler users SA_RESTART - see man 7 signal) With this patch here is what happens: 1) Android init calls write() on the sysfs file 2) The sysfs .store() callback registered by a driver is called 3) The driver calls request_firmware() 4) request_firmware() sends the firmware load request to userspace and calls wait_for_completion_interruptible() 5) A child dies and raises SIGCHLD 6) wait_for_completion_interruptible() returns -ERESTARTSYS due to the signal 7) request_firmware() [with this patch] returns -ERESTARTSYS 8) The driver returns -ERSTARTSYS from its sysfs .store method 9) The system call machinery invokes the signal handler 10) The signal handler does its stuff 11) Because SA_RESTART was set the system call is restarted (calling the sysfs .store) and we try it all again from step 2 Note that, on the the userspace side write() is only called once (the restart is transparent to userspace which is oblivious to all this) The kernel side write() (which calls .store() is called multiple times (so that code does need to know about this) >>> ERESTARTSYS and friends are highly magical, and I'm not convinced that >>> allowing _request_firmware_load to return -ERESTARTSYS is actually a >>> good idea. What if there are system calls that can't handle this >>> style of restart that start being restarted as a result? >> If the caller is unable to restart (for example if the driver's .store() callback had already done lots of stuff that couldn't be undone) it is free to translate -ERSTARTSYS to -EINTR before returning. But request_frimware() can't know about that. >>> Maybe SIGCHLD shouldn't interrupt firmware loading? I don't think there's a way of doing that without disabling all signals (ie using the non interruptible wait variants). It used to be that way (which is why I only ran into this after updating from an ancient 3.16 kernel to a slightly less ancient 4.4) But there are valid reasons for wanting to be able to interrupt firmware loading (like being able to kill the userspace helper) Hope this explains it better, Regads, Martin -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html