On Mon, Dec 17, 2012 at 12:22:51AM +0100, Matthias Schniedermeyer wrote: > On 17.12.2012 07:48, Dave Chinner wrote: > > Can you provide the information I asked - what was there before the > > unmount, > > With time reduced to 2 minutes and everything put in a script (see below). > > After about 2 minutes: > ls -lR /mnt > /mnt: > total 0 > drwxr-xr-x 2 root root 105 Dec 17 00:13 a > > /mnt/a: > total 11186176 > -rw-r--r-- 1 root root 943718400 Dec 17 00:11 1 > -rw-r--r-- 1 root root 943718400 Dec 17 00:13 10 > -rw-r--r-- 1 root root 943718400 Dec 17 00:13 11 > -rw-r--r-- 1 root root 891450368 Dec 17 00:13 12 > -rw-r--r-- 1 root root 943718400 Dec 17 00:11 2 > -rw-r--r-- 1 root root 943718400 Dec 17 00:11 3 > -rw-r--r-- 1 root root 943718400 Dec 17 00:12 4 > -rw-r--r-- 1 root root 943718400 Dec 17 00:12 5 > -rw-r--r-- 1 root root 943718400 Dec 17 00:12 6 > -rw-r--r-- 1 root root 943718400 Dec 17 00:12 7 > -rw-r--r-- 1 root root 943718400 Dec 17 00:12 8 > -rw-r--r-- 1 root root 943718400 Dec 17 00:12 9 > > I yanked out the cable immediatly after the ls. > > > what is there before recovery > > xfs_logprint: > data device: 0x811 > log device: 0x811 daddr: 104857632 length: 102400 > > cycle: 1 version: 2 lsn: 1,0 tail_lsn: 1,0 > length of Log Record: 20 prev offset: -1 num ops: 1 > uuid: b22d154a-b580-4d3a-bc09-ac0c62d39d39 format: little endian linux > h_size: 32768 > ---------------------------------------------------------------------------- > Oper (0): tid: b0c0d0d0 len: 8 clientid: LOG flags: UNMOUNT > Unmount filesystem > > ============================================================================ > xfs_logprint: skipped 4096 cleared blocks in range: 2 - 4097 > xfs_logprint: skipped 98302 zeroed blocks in range: 4098 - 102399 > xfs_logprint: physical end of log > ============================================================================ > xfs_logprint: logical end of log > ============================================================================ Not what I was after (i.e. the raw dump of the log to a file via xfs_logprint -C), but in this case serves to indicate that nothing ever got written to the log. That's not exactly what I was expecting, given that you've been reporting that some files were there after remount. > mount /dev/sdb1 /mnt -oro,norecovery > ls -lR /mnt > /mnt: > total 0 > > So nothing. If nothing got written to the log, then no metadata got written to disk because it is pinned in memory until the log is written. > > And depending on what occurred, send me the log file so I > > can see what was actually supposed to be replayed by log recovery? > > Regarding to the 'event trace' from your other mail, i have no idea what > or how to do that. http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > Dec 17 00:13:28 localhost kernel: [ 325.562642] lost page write due to I/O error on sdb1 > Dec 17 00:13:28 localhost kernel: [ 325.562643] Buffer I/O error on device sdb1, logical block 2613285 > Dec 17 00:13:28 localhost kernel: [ 325.562643] lost page write due to I/O error on sdb1 > Dec 17 00:13:28 localhost kernel: [ 325.562644] Buffer I/O error on device sdb1, logical block 2613286 > Dec 17 00:13:28 localhost kernel: [ 325.562644] lost page write due to I/O error on sdb1 > Dec 17 00:13:28 localhost kernel: [ 325.562645] Buffer I/O error on device sdb1, logical block 2613287 > Dec 17 00:13:28 localhost kernel: [ 325.562645] lost page write due to I/O error on sdb1 > Dec 17 00:13:28 localhost kernel: [ 325.562646] Buffer I/O error on device sdb1, logical block 2613288 > Dec 17 00:13:28 localhost kernel: [ 325.562646] lost page write due to I/O error on sdb1 > Dec 17 00:13:28 localhost kernel: [ 325.647364] XFS (sdb1): metadata I/O error: block 0x6400022 ("xlog_iodone") error 5 numblks 64 > Dec 17 00:13:28 localhost kernel: [ 325.648559] XFS (sdb1): xfs_do_force_shutdown(0x2) called from line 1074 of file /xssd/usr_src/linux/fs/xfs/xfs_log.c. Return address = 0xffffffff812a1e49 > Dec 17 00:13:28 localhost kernel: [ 325.649804] XFS (sdb1): Log I/O Error Detected. Shutting down filesystem And there's why nothing got written to the log - you pulled the plug before it was flushed. But that also implies it wasn't flushed for a couple of minutes, which is not correct. They should be occurring every 30s when the filesystem is only slightly busy like this. Hmmmm, that rigns a bell. I think I came across this when doing a RHEL6 backport recently, and decided that nothing needed doing upstream because there were already patches pending to fix it. That is, there is a logic bug is in xfssyncd_worker(), and that was fixed by the xfssyncd-die-die-die patchset that is now in 3.8-rc1. So, two things: firstly, can you check that the oldest file that is lost on 3.8-rc1 is less than 30s old, and secondly, whether this patch gives the same result as 3.8-rc1 on 3.6/3.7: diff --git a/fs/xfs/xfs_sync.c b/fs/xfs/xfs_sync.c index 9500caf..7bf85e8 100644 --- a/fs/xfs/xfs_sync.c +++ b/fs/xfs/xfs_sync.c @@ -400,7 +400,7 @@ xfs_sync_worker( * cancel_delayed_work_sync on this work queue before tearing down * the ail and the log in xfs_log_unmount. */ - if (!(mp->m_super->s_flags & MS_ACTIVE) && + if ((mp->m_super->s_flags & MS_ACTIVE) && !(mp->m_flags & XFS_MOUNT_RDONLY)) { /* dgc: errors ignored here */ if (mp->m_super->s_writers.frozen == SB_UNFROZEN && Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs