On Thu, Jun 08 2017, Shaohua Li wrote: > On Fri, Jun 09, 2017 at 07:24:29AM +1000, Neil Brown wrote: >> On Thu, Jun 08 2017, Mikulas Patocka wrote: >> >> > On Thu, 8 Jun 2017, Shaohua Li wrote: >> > >> >> On Thu, Jun 08, 2017 at 04:59:03PM +1000, Neil Brown wrote: >> >> > On Wed, Jun 07 2017, Mikulas Patocka wrote: >> >> > >> >> > > The function flush_signals clears all pending signals for the process. It >> >> > > may be used by kernel threads when we need to prepare a kernel thread for >> >> > > responding to signals. However using this function for an userspaces >> >> > > processes is incorrect - clearing signals without the program expecting it >> >> > > can cause misbehavior. >> >> > > >> >> > > The raid1 and raid5 code uses flush_signals in its request routine because >> >> > > it wants to prepare for an interruptible wait. This patch drops >> >> > > flush_signals and uses sigprocmask instead to block all signals (including >> >> > > SIGKILL) around the schedule() call. The signals are not lost, but the >> >> > > schedule() call won't respond to them. >> >> > > >> >> > > Signed-off-by: Mikulas Patocka <mpatocka@xxxxxxxxxx> >> >> > > Cc: stable@xxxxxxxxxxxxxxx >> >> > >> >> > Thanks for catching that! >> >> > >> >> > Acked-by: NeilBrown <neilb@xxxxxxxx> >> >> >> >> Applied, thanks! >> >> >> >> Neil, >> >> Not about the patch itself. I had question about that part of code. Dropped >> >> others since this is raid related. I didn't get the point why it's a >> >> TASK_INTERRUPTIBLE sleep. It seems suggesting the thread will bail out if a >> >> signal is sent. But I didn't see we check the signal and exit the loop. What's >> >> the correct behavior here? Since the suspend range is controlled by userspace, >> > >> > As I understand the code - the purpose is to have an UNINTERRUPTIBLE sleep >> > that isn't accounted in load average and that doesn't trigger the hung >> > task warning. >> >> Exactly my reason - yes. >> >> > >> > There should really be something like TASK_UNINTERRUPTIBLE_LONG for this >> > purpose. >> >> That would be nice. >> >> > >> >> I think the correct behavior is if user kills the thread, we exit the loop. So >> >> it seems like to be we check if there is fatal signal pending, exit the loop, >> >> and return IO error. Not sure if we should return IO error though. >> > >> > No, this is not correct - if we report an I/O error for the affected bio, >> > it could corrupt filesystem or confuse other device mapper targets that >> > could be on the top of MD. It is not right to corrupt filesystem if the >> > user kills a process. >> >> Yes, we are too deep to even return something like ERESTARTSYS. >> Blocking is the only option. > > My concern is if the app controlling the suspend range dies, other threads > will block in the kernel side forever. We can't even force kill them. This > is an unfortunate behavior. Would adding a timeout here make sense? The app > controlling the suspend range looks part of the disk firmware now. If the > firmware doesn't respond, returning IO timeout is normal. Yes, this happens. You can write to /sys/block/mdXX/md/suspend_lo to unblock it, but most people don't know this. A timeout might be appropriate, but I would want it to be quite a long one. Several minutes at least... Though if I found I wanted to do some careful raid6repair surgery on an array, and so suspending the problematic region and started work, I might get annoyed if I/O started getting errors, instead of just waiting like I asked.... but maybe that is a rather far-fetched scenario. Thanks, NeilBrown > > Thanks, > Shaohua
Attachment:
signature.asc
Description: PGP signature