On Thu, Jun 08 2017, Mikulas Patocka wrote: > On Thu, 8 Jun 2017, Shaohua Li wrote: > >> On Thu, Jun 08, 2017 at 04:59:03PM +1000, Neil Brown wrote: >> > On Wed, Jun 07 2017, Mikulas Patocka wrote: >> > >> > > The function flush_signals clears all pending signals for the process. It >> > > may be used by kernel threads when we need to prepare a kernel thread for >> > > responding to signals. However using this function for an userspaces >> > > processes is incorrect - clearing signals without the program expecting it >> > > can cause misbehavior. >> > > >> > > The raid1 and raid5 code uses flush_signals in its request routine because >> > > it wants to prepare for an interruptible wait. This patch drops >> > > flush_signals and uses sigprocmask instead to block all signals (including >> > > SIGKILL) around the schedule() call. The signals are not lost, but the >> > > schedule() call won't respond to them. >> > > >> > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> >> > > Cc: stable@xxxxxxxxxxxxxxx >> > >> > Thanks for catching that! >> > >> > Acked-by: NeilBrown <neilb@xxxxxxxx> >> >> Applied, thanks! >> >> Neil, >> Not about the patch itself. I had question about that part of code. Dropped >> others since this is raid related. I didn't get the point why it's a >> TASK_INTERRUPTIBLE sleep. It seems suggesting the thread will bail out if a >> signal is sent. But I didn't see we check the signal and exit the loop. What's >> the correct behavior here? Since the suspend range is controlled by userspace, > > As I understand the code - the purpose is to have an UNINTERRUPTIBLE sleep > that isn't accounted in load average and that doesn't trigger the hung > task warning. Exactly my reason - yes. > > There should really be something like TASK_UNINTERRUPTIBLE_LONG for this > purpose. That would be nice. > >> I think the correct behavior is if user kills the thread, we exit the loop. So >> it seems like to be we check if there is fatal signal pending, exit the loop, >> and return IO error. Not sure if we should return IO error though. > > No, this is not correct - if we report an I/O error for the affected bio, > it could corrupt filesystem or confuse other device mapper targets that > could be on the top of MD. It is not right to corrupt filesystem if the > user kills a process. Yes, we are too deep to even return something like ERESTARTSYS. Blocking is the only option. Thanks, NeilBrown > >> Thanks, >> Shaohua > > Mikulas
Attachment:
signature.asc
Description: PGP signature