Re: [PATCH] fsfreeze: tell hung_task about processes put to sleep

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 16 Oct 2012 15:52:47 +1100

On Tue, Oct 16, 2012 at 11:30:48AM +0900, Fernando Luis Vazquez Cao wrote:
> On 2012/10/16 06:02, Dave Chinner wrote:
> >On Mon, Oct 15, 2012 at 03:51:34PM +0900, Fernando Luis Vazquez Cao wrote:
> >>As I mentioned in my previous email if you want to emit a
> >>warning do it in the right place and make sure that it is
> >>something informative. hung_check certainly isn't the
> >>right place to do it.
> >So, how do we now know when a freeze fails to complete, as opposed
> >to a thaw that hasn't occurred? We won't get any reports from
> >threads that are stuck waiting for the freeze to complete, and so
> >we'll end up with a silent hang.
> 
> Are you referring to a situation where freeze_super() fails
> to complete?

Yes.

> If so hung_check will detect that the task
> that called the freeze ioctl is stuck and will dump its
> stack's contents, which is *precisely* the information we
> want.

I think you are wrong there - we won't get the freeze ioctl
triggering the hang check. sb_write_wait() only has a single wait
queue for all levels of freeze, and any sb_write_end() call will
cause it to wake and recheck the level count it is waiting for.

e.g. we have a SB_FREEZE_WRITE level leak, so freeze is waiting for
that count to go to zero. page faults and transactions can still go
ahead while in this state, so sb_start_write/sb_end_write will be
called any number of times while sb_write_wait() is waiting for
SB_FREEZE_WRITE write count to go to zero. Any call to
sb_end_write() will issue a wakeup to the writer wait queue if there
is a task sleeping on it, so the freeze process stuck in
sb_write_wait() will not trigger the hung task timer because it gets
woken by other external events and so reset the hung task timer
regularly.

Hence we can't rely on the hung task timer to detect freeze failures
caused by start/end accounting leaks. Having the threads already
frozen be detected as hung in this situationis the only sane thing
to do because, well, they've hung....

> Absolutely. By the way, to handle restarts properly
> we need check ioctls or a sysfs/procfs equivalent for
> fsfreeze, which my previous patch set implements.

sysfs, not ioctls.

> >Removing kernel warnings doesn't change the fact that the
> >application doing freeze/thaw is broken by design...
> 
> It is precisely because we want to handle things
> in user space that we need to get hung_task
> related panics and unneeded warnings out of
> the way.

If you handle failures in userspace, then the kernel warnings won't
occur  If you fail to handle it in userspace correctly, then we
need the warnings at the kernel level.

Besides, there are more users of freeze/thaw than just whatever your
application is. We cannot guarantee that all users Do The Right
Thing, so even if your application works correctly we still need the
warnings for all those that don't. e.g. what happens if dm-snapshot
goes AWOL in the middle of a snapshot?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html