On 4/11/13 6:26 PM, Brian Foster wrote: > On 04/11/2013 03:11 PM, 符永涛 wrote: >> It happens tonight again on one of our servers, how to debug the root >> cause? Thank you. >> > > Hi, > > I've attached a system tap script (stap -v xfs.stp) that should > hopefully print out a bit more data should the issue happen again. Do > you have a small enough number of nodes (or predictable enough pattern) > that you could run this on the nodes that tend to fail and collect the > output? > > Also, could you collect an xfs_metadump of the filesystem in question > and make it available for download and analysis somewhere? I believe the > ideal approach is to mount/umount the filesystem first to replay the log > before collecting a metadump, but somebody could correct me on that (to > be safe, you could collect multiple dumps: pre-mount and post-mount). Dave suggested yesterday that this would be best: metadump right after unmounting post-failure, then mount/umount & generate another metadump. -Eric > Could you also describe your workload a little bit? Thanks. > > Brian > >> Apr 12 02:32:10 cqdx kernel: XFS (sdb): xfs_iunlink_remove: >> xfs_inotobp() returned error 22. >> Apr 12 02:32:10 cqdx kernel: XFS (sdb): xfs_inactive: xfs_ifree returned >> error 22 >> Apr 12 02:32:10 cqdx kernel: XFS (sdb): xfs_do_force_shutdown(0x1) >> called from line 1184 of file fs/xfs/xfs_vnodeops.c. Return address = >> 0xffffffffa02ee20a >> Apr 12 02:32:10 cqdx kernel: XFS (sdb): I/O Error Detected. Shutting >> down filesystem >> Apr 12 02:32:10 cqdx kernel: XFS (sdb): Please umount the filesystem and >> rectify the problem(s) >> Apr 12 02:32:19 cqdx kernel: XFS (sdb): xfs_log_force: error 5 returned. >> Apr 12 02:32:49 cqdx kernel: XFS (sdb): xfs_log_force: error 5 returned. >> Apr 12 02:33:19 cqdx kernel: XFS (sdb): xfs_log_force: error 5 returned. >> Apr 12 02:33:49 cqdx kernel: XFS (sdb): xfs_log_force: error 5 returned. >> _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs