Re: [linux-lvm] 2.6.22-rc5 XFS fails after hibernate/resume

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



David Greaves wrote:
I'm going to have to do some more testing...
done


David Chinner wrote:
On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote:
David Greaves wrote:
So doing:
xfs_freeze -f /scratch
sync
echo platform > /sys/power/disk
echo disk > /sys/power/state
# resume
xfs_freeze -u /scratch

Works (for now - more usage testing tonight)

Verrry interesting.
Good :)
Now, not so good :)


What you were seeing was an XFS shutdown occurring because the free space
btree was corrupted. IOWs, the process of suspend/resume has resulted
in either bad data being written to disk, the correct data not being
written to disk or the cached block being corrupted in memory.
That's the kind of thing I was suspecting, yes.

If you run xfs_check on the filesystem after it has shut down after a resume, can you tell us if it reports on-disk corruption? Note: do not run xfs_repair to check this - it does not check the free space btrees; instead it simply rebuilds them from scratch. If xfs_check reports an error, then run xfs_repair
to fix it up.
OK, I can try this tonight...


This is on 2.6.22-rc5

So I hibernated last night and resumed this morning.
Before hibernating I froze and sync'ed. After resume I thawed it. (Sorry Dave)

Here are some photos of the screen during resume. This is not 100% reproducable - it seems to occur only if the system is shutdown for 30mins or so.

Tejun, I wonder if error handling during resume is problematic? I got the same errors in 2.6.21. I have never seen these (or any other libata) errors other than during resume.

http://www.dgreaves.com/pub/2.6.22-rc5-resume-failure.jpg
(hard to read, here's one from 2.6.21
http://www.dgreaves.com/pub/2.6.21-resume-failure.jpg

I _think_ I've only seen the xfs problem when a resume shows these errors.


Ok, to try and cause a problem I ran a make and got this back at once:
make: stat: Makefile: Input/output error
make: stat: clean: Input/output error
make: *** No rule to make target `clean'.  Stop.
make: stat: GNUmakefile: Input/output error
make: stat: makefile: Input/output error


I caught the first dmesg this time:

Filesystem "dm-0": XFS internal error xfs_btree_check_sblock at line 334 of file fs/xfs/xfs_btree.c. Caller 0xc01b58e1
 [<c0104f6a>] show_trace_log_lvl+0x1a/0x30
 [<c0105c52>] show_trace+0x12/0x20
 [<c0105d15>] dump_stack+0x15/0x20
 [<c01daddf>] xfs_error_report+0x4f/0x60
 [<c01cd736>] xfs_btree_check_sblock+0x56/0xd0
 [<c01b58e1>] xfs_alloc_lookup+0x181/0x390
 [<c01b5b06>] xfs_alloc_lookup_le+0x16/0x20
 [<c01b30c1>] xfs_free_ag_extent+0x51/0x690
 [<c01b4ea4>] xfs_free_extent+0xa4/0xc0
 [<c01bf739>] xfs_bmap_finish+0x119/0x170
 [<c01e3f4a>] xfs_itruncate_finish+0x23a/0x3a0
 [<c02046a2>] xfs_inactive+0x482/0x500
 [<c0210ad4>] xfs_fs_clear_inode+0x34/0xa0
 [<c017d777>] clear_inode+0x57/0xe0
 [<c017d8e5>] generic_delete_inode+0xe5/0x110
 [<c017da77>] generic_drop_inode+0x167/0x1b0
 [<c017cedf>] iput+0x5f/0x70
 [<c01735cf>] do_unlinkat+0xdf/0x140
 [<c0173640>] sys_unlink+0x10/0x20
 [<c01040a4>] syscall_call+0x7/0xb
 =======================
xfs_force_shutdown(dm-0,0x8) called from line 4258 of file fs/xfs/xfs_bmap.c. Return address = 0xc021101e Filesystem "dm-0": Corruption of in-memory data detected. Shutting down filesystem: dm-0
Please umount the filesystem, and rectify the problem(s)

so I cd'ed out of /scratch and umounted.

I then tried the xfs_check.

haze:~# xfs_check /dev/video_vg/video_lv
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_check.  If you are unable to mount the filesystem, then use
the xfs_repair -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.
haze:~# mount /scratch/
haze:~# umount /scratch/
haze:~# xfs_check /dev/video_vg/video_lv

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Bad page state in process 'xfs_db'

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: page:c1767bc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Trying to fix it up, but a reboot is needed

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Backtrace:

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Bad page state in process 'syslogd'

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: page:c1767cc0 flags:0x80010008 mapping:00000000 mapcount:-64 count:0

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Trying to fix it up, but a reboot is needed

Message from syslogd@haze at Tue Jun 19 08:47:30 2007 ...
haze kernel: Backtrace:

ugh. Try again
haze:~# xfs_check /dev/video_vg/video_lv
haze:~#

whilst running a top reported this as roughly the peak memory usage:
 8759 root      18   0  479m 474m  876 R  2.0 46.9   0:02.49 xfs_db
so it looks like it didn't run out of memory (machine has 1Gb).

Dave, I ran xfs_check -v... but I got bored when it reached 122M of bz2 compressed output with no sign of stopping... still got it if it's any use...

lots of:
setting block 0/0 to sb
setting block 0/1 to freelist
setting block 0/2 to freelist
setting block 0/3 to freelist
setting block 0/4 to freelist
setting block 0/75 to btbno
setting block 0/346901 to free1
setting block 0/346903 to free1
setting block 0/346904 to free1
setting block 0/346905 to free1
  and stuff like this
inode 128 mode 040777 fmt extents afmt extents nex 1 anex 0 nblk 1 sz 4096
inode 128 nlink 39 is dir
inode 128 extent [0,7,1,0]

I then rebooted and ran a repair which didn't show any damage.

David

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux