On Tue, Oct 16, 2012 at 11:30:48AM +0900, Fernando Luis Vazquez Cao wrote: > On 2012/10/16 06:02, Dave Chinner wrote: > >On Mon, Oct 15, 2012 at 03:51:34PM +0900, Fernando Luis Vazquez Cao wrote: > >>As I mentioned in my previous email if you want to emit a > >>warning do it in the right place and make sure that it is > >>something informative. hung_check certainly isn't the > >>right place to do it. > >So, how do we now know when a freeze fails to complete, as opposed > >to a thaw that hasn't occurred? We won't get any reports from > >threads that are stuck waiting for the freeze to complete, and so > >we'll end up with a silent hang. > > Are you referring to a situation where freeze_super() fails > to complete? Yes. > If so hung_check will detect that the task > that called the freeze ioctl is stuck and will dump its > stack's contents, which is *precisely* the information we > want. I think you are wrong there - we won't get the freeze ioctl triggering the hang check. sb_write_wait() only has a single wait queue for all levels of freeze, and any sb_write_end() call will cause it to wake and recheck the level count it is waiting for. e.g. we have a SB_FREEZE_WRITE level leak, so freeze is waiting for that count to go to zero. page faults and transactions can still go ahead while in this state, so sb_start_write/sb_end_write will be called any number of times while sb_write_wait() is waiting for SB_FREEZE_WRITE write count to go to zero. Any call to sb_end_write() will issue a wakeup to the writer wait queue if there is a task sleeping on it, so the freeze process stuck in sb_write_wait() will not trigger the hung task timer because it gets woken by other external events and so reset the hung task timer regularly. Hence we can't rely on the hung task timer to detect freeze failures caused by start/end accounting leaks. Having the threads already frozen be detected as hung in this situationis the only sane thing to do because, well, they've hung.... > Absolutely. By the way, to handle restarts properly > we need check ioctls or a sysfs/procfs equivalent for > fsfreeze, which my previous patch set implements. sysfs, not ioctls. > >Removing kernel warnings doesn't change the fact that the > >application doing freeze/thaw is broken by design... > > It is precisely because we want to handle things > in user space that we need to get hung_task > related panics and unneeded warnings out of > the way. If you handle failures in userspace, then the kernel warnings won't occur If you fail to handle it in userspace correctly, then we need the warnings at the kernel level. Besides, there are more users of freeze/thaw than just whatever your application is. We cannot guarantee that all users Do The Right Thing, so even if your application works correctly we still need the warnings for all those that don't. e.g. what happens if dm-snapshot goes AWOL in the middle of a snapshot? Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html