On Thu, May 19, 2011 at 03:35:04PM -0700, Phil Karn wrote: > I just got the following on my console each time I invoked xfs_fsr on a XFS > file system. The file system resides on a OCZ SSD that I've been having > problems with. This morning my system deadlocked while running a program > that created and deleted many small files on the SSD (a Perl script feeding > a large number of email messages one at a time to procmail). I suspect bad > garbage collection algorithms in the SSD; I recovered by booting into single > user and running wiper.sh on the file system to replenish the drive's pool > of erased pages. Since then I've been running wiper.sh regularly to ensure a > sufficient erased page pool in the SSD. I had just run it when I ran > xfs_fsr. > > So it's possible that my file system data structures are messed up. However, > the system otherwise seems normal, and I've been routinely tagging my files > with extended attributes containing their SHA-1 hashes so I can check their > integrity. So far my checks haven't found any corrupted files. > > Here is the relevant output from my kernel log. Is this a XFS bug, or does > it simply indicate a corrupted file system due to my earlier crash? > > [29847.045684] BUG: unable to handle kernel NULL pointer dereference at > 0000000000000018 Dereferencing an offset of 24 bytes from the start of a structure. > [29847.045690] IP: [<ffffffffa033c11b>] xfs_trans_log_inode+0xb/0x30 [xfs] Three structures possible: xfs_inode, xfs_trans, xfs_inode_log_item: 138 xfs_trans_log_inode( 139 xfs_trans_t *tp, 140 xfs_inode_t *ip, 141 uint flags) 142 { 143 ASSERT(ip->i_transp == tp); 144 ASSERT(ip->i_itemp != NULL); 145 ASSERT(xfs_isilocked(ip, XFS_ILOCK_EXCL)); 146 147 tp->t_flags |= XFS_TRANS_DIRTY; 148 ip->i_itemp->ili_item.li_desc->lid_flags |= XFS_LID_DIRTY; And the situation is that ip->i_itemp->ili_item.li_desc == NULL: typedef struct xfs_log_item { struct list_head li_ail; /* AIL pointers */ xfs_lsn_t li_lsn; /* last on-disk lsn */ struct xfs_log_item_desc *li_desc; /* ptr to current desc*/ ..... That should not happen - the inode should be linked into the transaction (tp), and li_desc should never be NULL here. Are you running with CONFIG_XFS_DEBUG=y? If not, it is probably worthwhile as it should catch the problems more precisely before a NULL pointer dereference occurs. > and so on...it repeats a few times because I issued the xfs_fsr command a > few times. So it is reproducable? Can you turn on the xfs_swapext tracepoints and gather the output over a failure, as well as using xfs_fsr -v -d and capturing that output? That might indicate that there is a specific inode extent swap configuration that triggers this problem that I haven't realised exists. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs