On Mon, Oct 15, 2012 at 03:51:34PM +0900, Fernando Luis Vazquez Cao wrote: > On 2012年10月15日 15:36, Dave Chinner wrote: > >On Mon, Oct 15, 2012 at 12:24:59PM +0900, Fernando Luis Vazquez Cao wrote: > >>On 2012/10/13 10:06, Dave Chinner wrote: > >>>On Fri, Oct 12, 2012 at 06:47:32PM +0900, Fernando Luis Vázquez Cao wrote: > >>>>Any process attempting to write to a frozen filesystem uninterruptibly and > >>>>unkillably waits for the filesystem to be thawed. This wait is of unbounded > >>>>length. Ignore such waits in the hung_task detector. > >>>Filesystems should not be frozen for long enough to trigger the hung > >>>task detector under normal usage. IMO, if you are freezing a > >>>filesystem for that long, then you're either doing something wrong > >>>or something has gone wrong, and in either case I think we should be > >>>emitting warnings... > >>The problem is that in production systems situations where > >>a filesystem remains brozen for long periods are not uncommon. > >>A typical example is as follows: the control daemon or script that > >>controls the freeze/thaw using the fsfreeze ioctls dies, the next > >There's your problem. Fix that, don't turn off useful warnings that > >indicate something has gone wrong. > > It is not my problem. It is the enterprise distro's user's problem. It's your problem because you are trying to change the code :) > As I mentioned in my previous email if you want to emit a > warning do it in the right place and make sure that it is > something informative. hung_check certainly isn't the > right place to do it. So, how do we now know when a freeze fails to complete, as opposed to a thaw that hasn't occurred? We won't get any reports from threads that are stuck waiting for the freeze to complete, and so we'll end up with a silent hang. This is *exactly* what the hung task messages are supposed to avoid by being verbose - we know what hung rather than having stuff just silently stop. If you want to remove verbose warnings, replace them with concise, targeted and *equivalent* warnings before removing the only warnings we currently have that indicate a problem.... > A failure in a user space script should not lead to a kernel > panic or to a flood of process stack dumps in the system log An administrator can cause that to happen in many, many ways by having a script or a daemon fail to do the right thing. freeze/thaw is not unique in that respect. Removing an entire class of warnings because something is broken in userspace and fails to be handled correctly is not the right solution. Indeed, if you have a daemon that freezes the filesystem, and you haven't architected it with a watchdog to handle restarts due to failures, then you don't have a resilient system at all, regardless of these warnings. If it's a HA daemon/agent that doesn't get restarted and clean up it's mess automatically, then IMO it is fundamentally broken and that's the problem that needs fixing. Removing kernel warnings doesn't change the fact that the application doing freeze/thaw is broken by design... > administrators cannot interpret (a common complaint from > our customers). This is the behaviour this patch is trying to > fix. Educate your customers through documentation then - FAQs exist for a reason. Removing warnings that we (developers) rely on for debugging issues with freeze/thaw because customers don't understand what they are is a terrible solution. It means we don't hear about problems (because there are no warnings), and when we do we hear about silent hangs we can't diagnose them (because there are no warnings). It's a lose/lose situation. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html