On Tue, Nov 06, 2012 at 04:13:11PM +1100, Dave Chinner wrote: > Hi folks, > > Fourth version of the buffer verifier series. The read verifier > infrastructure is described here: > > http://oss.sgi.com/archives/xfs/2012-10/msg00146.html > > The second version with write verifiers is described here: > > http://oss.sgi.com/archives/xfs/2012-10/msg00280.html > > This version add write verifiers to all buffers that aren't directly > read (i.e. via xfs_buf_get*() interfaces), and drops the log > recovery verifiers from the series as it really needs more buffer > item format flags to do relaibly. > > The seris is just about ready to go - it passes all of xfstests here > except for 070. With the addition of the getbuf write verifiers, > this series is now detecting a corrupt xfs_da_node buffer being > written to disk. It appears to be a new symptom of known problem, > as tracing indicates that the test is triggering the same double > split/join pattern as described here: > > http://oss.sgi.com/archives/xfs/2012-03/msg00347.html So, 070 isn't hitting this exact problem - I think i have a handle on the cause of the problem in the link now (i.e. I have a fix that passes all of xfstests without any other problems arising), but the reproducer is also causing the same write verifier failures as 070 and 117. However, all three do a double leaf split operation, so that's going to be the underlying cause of the verifier failure. This tracepoint list is the first half of an attribute add operation: xfs_attr_node_addname xfs_buf_init xfs_attr_leaf_lookup xfs_attr_node_replace xfs_attr_leaf_add xfs_da_split xfs_attr_leaf_split xfs_da_grow_inode xfs_attr_leaf_create xfs_attr_leaf_rebalance xfs_trans_log_buf xfs_da_link_after xfs_trans_log_buf xfs_attr_leaf_add_old xfs_attr_leaf_add xfs_attr_leaf_compact xfs_trans_log_buf xfs_attr_leaf_split_before xfs_attr_leaf_split xfs_da_grow_inode xfs_attr_leaf_create xfs_attr_leaf_rebalance xfs_da_link_after xfs_trans_log_buf xfs_attr_leaf_add_new xfs_attr_leaf_add xfs_attr_leaf_add_work xfs_da_fixhashpath xfs_da_node_split xfs_da_node_add xfs_da_node_add xfs_da_fixhashpath xfs_attr_leaf_flipflags (that double leaf split makes it nice and complex, doesn't it?) One of these operations is resulting in the buffer at block number 0xc8 being corrupted in memory. The xfs_trans_log_buf() calls above are the places where that buffer is logged. Prior to fixing the corruption problem, the code would assert fail in xfs_attr_leaf_flipflags() (part of the atomic rename sequence), then a couple of seconds later dump a write verifier failure. Now I've just got to work out where in this maze the buffer gets corrupted, and then I might start to understand why it doesn't appear to cause detectable on-disk corruption... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs