Re: [PATCH 03/18] xfs: Do background CIL flushes via a workqueue

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 30 Apr 2012 16:09:13 +1000

On Mon, Apr 30, 2012 at 11:24:32AM +1000, Dave Chinner wrote:
> On Wed, Apr 18, 2012 at 11:47:06AM +1000, Dave Chinner wrote:
> > On Tue, Apr 17, 2012 at 04:49:56PM -0500, Ben Myers wrote:
> > > On Tue, Apr 17, 2012 at 04:21:55PM -0500, Ben Myers wrote:
> > > > Gah! I just hit this assert.  
> > > > 
> > > > v3.4-rc2-3-g8a00ebe with:
> > > > Christoph's ilock series
> > > > Christoph's xfsbufd series
> > > > Jan's freeze series
> > > > Dave's queue.
> > > > 
> > > > nfs7 login: [ 1175.172406] XFS: Assertion failed: push_seq > 0 && push_seq <= ctx->sequence, file: /root/xfs/fs/xfs/xfs_log_cil.c, line: 406
> > 
> > which has probably resulted in a log corruption.
> > 
> > > [  390.097633] ---[ end trace 33a9795b638157b9 ]---
> > > 
> > > Here it is from the system log
> > > 
> > > Apr 17 16:35:12 linux kernel: [  389.637543] XFS (264=211361P۔300): Mounting Filesystem
> > > Apr 17 16:35:12 linux kernel: [  389.912402] XFS (264=211361P۔300): Starting recovery (logdev: internal)
> > > Apr 17 16:35:12 linux kernel: [  390.054357] XFS (264=211361P۔300): xlog_recover_inode_pass2: Bad inode magic number, dip = 0xf0ffd800, dino bp = 0xef89c480, ino = 25541592
> > > Apr 17 16:35:12 linux kernel: [  390.066290] XFS (264=211361P۔300): Internal error xlog_recover_inode_pass2(1) at line 2248 of file /root/xfs/fs/xfs/xfs_log_recover.c.  Caller 0xfaa08ffa
> > 
> > Because log recovery has read a bad inode from disk, which means an
> > allocation transaction has probably not been replayed.
> 
> Just as a note - I'm getting itest 121 reliably tripping over this
> exact problem as a result of the discontiguous buffer item support
> patch. Basically the problem is that an inode modification
> transaction is being replayed before the inode buffer allocation
> transaction, so the read of the inode buffer returns an
> uninitialised data area, and hence the assert failure.
> 
> At this point I can't see why that patch would cause the problem to
> occur and this tends to imply that it isn't that patch that has
> caused the bug.

The bug was in my code - the calculation for the map size was wrong
so only the first half of the modifications to the buffer was being
replayed. So, this seems unrelated.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs