Re: [PATCH] xfs: prevent spurious "head behind tail" warnings

Dave Chinner <david@xxxxxxxxxxxxx> · Thu, 21 Nov 2013 13:19:33 +1100

On Tue, Nov 19, 2013 at 05:44:50PM -0600, Mark Tinguely wrote:
> On 11/19/13 17:24, Eric Sandeen wrote:
> >On 11/19/13, 5:08 PM, Mark Tinguely wrote:
> >>On 11/19/13 16:37, Dave Chinner wrote:
> >>>From: Dave Chinner<dchinner@xxxxxxxxxx>
> >>>
> >>>When xlog_space_left() cracks the grant head and the log tail, it
> >>>does so without locking to synchronise the sampling of the
> >>>variables. It samples the grant head first, so if there is a delay
> >>>before it smaples the log tail, there is a window where the log tail
> >>>could have moved onwards and be moved past the sampled value of the
> >>>grant head. This then leads to the "xlog_space_left: head behind
> >>>tail" warning message.
> >>>
> >>>To avoid spurious output in this situation, swap the order in which
> >>>the variables are cracked. This means that the head may grant head
> >>>may move if there is a delay, but the log tail will be stable, hence
> >>>ensure the tail does not jump the head accidentally.
> >>>
> >>>While this avoids the spurious head behind tail problem, it
> >>>introduces the opposite problem - the head can move more than a full
> >>>cycle past the tail. The code already handles this case by
> >>>indicating that the log is full (i.e. zero space available) but
> >>>that's still (generally) a spurious situation.
> >>>
> >>>Hence, if we detect that the head is more than a cycle ahead of the
> >>>tail or the head is behind the tail, start the calculation again by
> >>>resampling the variables and trying again. If we get too many
> >>>resamples, then throw a warning and return a full or empty log
> >>>appropriately.
> >>>
> >>>Signed-off-by: Dave Chinner<dchinner@xxxxxxxxxx>
> >>>---
> >>
> >>I am still getting the debug message:
> >>
> >>   xlog_verify_grant_tail: space>  BBTOB(tail_blocks)
> >>
> >>This is a real over grant. It has been a while since I did all the tests, but basically the only way to stop it is to have a lock between checking for xlog_space_left() and actually reserving the space.
> >>
> >>I am not a fan of another band-aid on a problem that is caused because we are granting space without locks.
> >
> >Mark, can you remind us of your testcase that produces this?
> >(sorry, I guess I should search for that old thread...)
> >
> >Thanks,
> >-Eric
> >
> >>--Mark.
> 
> xfstest 273 hits it 100% of the time for me, as does 32+ process
> fsstress, pretty much any high log usage test.

Yet I do those sorts of tests all the time on lots of different
machines and I rarely see either of the warnings that we are talking
about here. I sent the xlog_space_left() patch because a user
reported their log being spammed, not because they had a problem
with a filesystem hanging on log space.

i.e. spam == false positive, log space hang == real problem.

> I know Brian hit this with xfstest 273 when he was testing for
> commit 9a3a5dab.
> 
> Using xfstest 273, I was seeing ten of thousand of bytes of over
> commit.

That doesn't mean it's real.

By definition, a false positive shows exactly the same numbers as a
real problem. The only way to determine it is a false positive is to
use some other method of verifying the numbers. That could be
checking the state of other related variables or simply resampling
the variables multiple times to reduce the probability that the
problem was caused by a temporary condition (i.e. a race).

In this case, resampling them from their source is used to reverify
them, and if we get continual failures being detected, then the
likelihood of them being a false positive is very low.

> From what I recall, I tried a separate lock for the
> write/reserve grant heads,

We already have a per-grant head lock, and most of the calls to
xlog_space_left() are called under it. Hence the problem I'm
addressing is not a result of the grant head changing - the problem
is the log tail is updated concurrently. No amount of locking being
added to the grant heads will fix that problem.

And regardless of the fact that xlog_verify_grant_tail() samples the
write grant head with locks (it's a atomic64_t to do this safely),
the problem is that log tail can then change before it is sampled.
No amount of locking the grant heads will fix that problem, either.

> put locks to make sure the verifier was
> not getting stale information, ordered the write/reserve ungrants
> relative to the grants, put in cache smp_mb() call. Some attempts
> were more successful than others, but the only way I could prevent
> the overgrant completely was to put back the global lock between the
> checking for space and the granting of space.

Which won't have solved the problem of concurrent log tail updates
occurring after the grant head has been sampled. All this change
does is modify the concurrency patterns and hence made the race
condition much harder to hit by disabling pre-emption across
functions.

IOWs, we can probably get exactly the same result from adding
"preempt_disable(); ....  preempt_enable();" around the variable
sampling so that we cannot be pre-empted at all between the two
samples being taken.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs