Re: [PATCH 1/3] xfs: don't shutdown log recovery on validation errors

Mark Tinguely <tinguely@xxxxxxx> · Fri, 14 Jun 2013 07:55:01 -0500

On 06/13/13 19:13, Dave Chinner wrote:
On Thu, Jun 13, 2013 at 05:09:03PM -0500, Ben Myers wrote:
Hi Dave,

On Thu, Jun 13, 2013 at 12:08:27PM +1000, Dave Chinner wrote:
On Wed, Jun 12, 2013 at 08:04:41PM -0500, Ben Myers wrote:
On Wed, Jun 12, 2013 at 12:19:06PM +1000, Dave Chinner wrote:
From: Dave Chinner<dchinner@xxxxxxxxxx>

Unfortunately, we cannot guarantee that items logged multiple times
and replayed by log recovery do not take objects back in time. When
theya re taken back in time, the go into an intermediate state which
is corrupt, and hence verification that occurs on this intermediate
state causes log recovery to abort with a corruption shutdown.

Instead of causing a shutdown and unmountable filesystem, don't
verify post-recovery items before they are written to disk. This is
less than optimal, but there is no way to detect this issue for
non-CRC filesystems If log recovery successfully completes, this
will be undone and the object will be consistent by subsequent
transactions that are replayed, so in most cases we don't need to
take drastic action.

For CRC enabled filesystems, leave the verifiers in place - we need
to call them to recalculate the CRCs on the objects anyway. This
recovery problem canbe solved for such filesystems - we have a LSN
stamped in all metadata at writeback time that we can to determine
whether the item should be replayed or not. This is a separate piece
of work, so is not addressed by this patch.

Is there a test case for this one?  How are you reproducing this?

The test case was Dave Jones running sysrq-b on a hung test machine.
The machine would occasionally end up with a corrupt home directory.

http://oss.sgi.com/pipermail/xfs/2013-May/026759.html

Analysis from a metdadump provided by Dave:

http://oss.sgi.com/pipermail/xfs/2013-June/026965.html

And Cai also appeared to be hitting this after a crash on 3.10-rc4,
as it's giving exactly the same "verifier failed during log recovery"
stack trace:

http://oss.sgi.com/pipermail/xfs/2013-June/026889.html

Thanks.  It appears that the verifiers have found corruption due to a
flaw in log recovery, and the fix you are proposing is to stop using
them.  If we do that, we'll have no way of detecting the corruption and
will end up hanging users of older kernels out to dry.

We've never detected it before, and it's causing regressions for
multiple people. We *can't fix it* because we can't detect the
situation sanely, and we are not leaving people with old kernels
hanging out to dry. The opposite is true: we are fucking over
current users by preventing log recovery on filesystems that will
recovery perfectly OK and have almost always recovered just fine in
the past.

I think your suggestion that non-debug systems could warn instead of
fail is a good one, but removing the verifier altogether is
inappropriate.

Changing every single verifier in a non-trivial way is not something
I'm about to do for a -rc6 kernel. Removing the verifiers from log
recovery just reverts to the pre-3.8 situation, so is perfectly
acceptable short term solution while we do the more invasive verify
changes.

Can you make the metadump available?  I need to understand this better
before I can sign off.  Also:  Any idea how far back this one goes?

No, I can't make the metadump available to you - it was provided
privately and not obfuscated and so you'd have to ask Dave for it.

As to how long this problem has existed? It's a zero-day bug. Like I
said, I've suspected for years that this can happen, and only now do
we have proof of it...

Cheers,

Dave.

My gut feeling for the patch was the same as Ben's, but thinking this 
over, I have to take back all the eloquent curse works. IMO, we have to 
bring the patch in because the goal for Linux 3.10 is to have a stable 
environment for the non-CRC case and the verifier is breaking XFS log 
recovery for the common non-CRC case.
                  ---
In the common case, the verifier is tripping over the fundemental 
difference between how the AIL works (consolidate buffer writes into one 
and it can write anything that made it to log up to l_last_sync_lsn) and 
the log recovery, which works on each modification.

If there is another unknown kind of future write, then it would be nice 
to know and having a warning message would help. Unfortunately, the 
warning may make recovery noisy and falsely concern the users, but we 
are in uncharted waters.

I will blame and put a reviewed-by on the patch.

--Mark.

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs