On Mon, Jun 19, 2017 at 07:30:50AM +1000, Neil Brown wrote: > On Fri, Jun 16 2017, Shaohua Li wrote: > > > On Fri, Jun 16, 2017 at 01:26:00PM +1000, Neil Brown wrote: > >> On Fri, Jun 09 2017, Shaohua Li wrote: > >> > >> > From: Shaohua Li <shli@xxxxxx> > >> > > >> > suspend range is controlled by userspace. If userspace doesn't clear suspend > >> > range, it's possible a thread will wait for the range forever, we can't even > >> > kill it. This is bad behavior. Add a timeout in the wait. If timeout happens, > >> > we return IO error. The app controlling suspend range looks like part of disk > >> > firmware, if disk isn't responded for a long time, timed out IO error is > >> > returned. > >> > > >> > A simple search in SCSI code shows maximum IO timeout is 120s, so I use this > >> > value here too. > >> > >> I really don't like this. It is an API change with no really > >> justification. Has the current behavior caused a problem? > > > > This centainly causes problem. Set the suspend range will make application > > stall for ever, don't you think this is a problem? > > I agree that it could cause a problem. I'm asking it is actually, in > practice, causes a problem. Do you have reports from people saying "the > IO to my RAID array is hanging, what is wrong?" and you look into it and > find out that suspend_hi is larger than suspend_lo? > > And if that does happen, is this really the best way to fix it? I don't have reports about this issue. Just think the behavior is bad. > > > >> Both md and dm export APIs which allow IO to be suspended while > >> changes are made. Changing that to a timed-out period needs, I think, > >> to be clearly justified. > >> > >> If it is changed to a timed-out period, then that should be explicit, > >> rather than having each request independently time out. > >> i.e. when the suspend is initiated, the end-time should be computed, and > >> any IO would block until that time, not block for some number of > >> seconds. > > > > Ok, this makes sense. We can add a timeout. If it's expired, we clear suspend > > range. Userspace should know what they are doing. > > > >> > >> If an md device is left suspended, then the current code will block IO > >> indefinitely. This patch will at a 20minute times to every single > >> request, which will mean IO proceeds, but extremely slowly. I don't see > >> that as a useful improvement. > > > > It returns error, so application will not dispatch more IO. But I agree a > > timeout to clear the suspend looks a better policy. > > Write errors only get back to the application if it calls fsync(), and > many don't do that. Write errors can easily cause a filesystem to go > read-only, and require an fsck. I think we should be very cautious > about triggering write errors. > > NFS will hang indefinitely rather then return an error if the server is > not available. That can certainly be annoying, but the alternative has > been tried, and it leads to random data corruption. > The two cases are only comparable at a very high level, but I think > this result should encourage substantial caution. It's hard to say if an IO error or an infinite wait is better, but since there is better option in this case, I don't want to argue. I'll repost a patch to reset suspend range after a timeout, assume this is your suggestion. Thanks, Shaohua -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html