Re: Corruption of root fs during git bisect of drm system hang

Dave Chinner <david@xxxxxxxxxxxxx> · Mon, 15 Jul 2013 12:28:41 +1000

On Fri, Jul 12, 2013 at 09:07:21AM +0200, Markus Trippelsdorf wrote:
> On 2013.07.12 at 12:17 +1000, Dave Chinner wrote:
> > On Thu, Jul 11, 2013 at 11:07:55AM +0200, Markus Trippelsdorf wrote:
> > > On 2013.07.10 at 23:12 -0500, Stan Hoeppner wrote:
> > > > On 7/10/2013 10:58 PM, Dave Chinner wrote:
> > > > > On Thu, Jul 11, 2013 at 05:36:21AM +0200, Markus Trippelsdorf wrote:
> > > > 
> > > > >> I was loosing my KDE settings bit by bit with every reboot during the
> > > > >> bisection. First my window-rules disappeared, then my desktop background
> > > > >> changed to default, then my taskbar moved from top to the bottom, etc.
> > > > >> In the end I had to restore all my .files from backup. 
> > > > > 
> > > > > That's not filesystem corruption. That sounds more like someone not
> > > > > using fsync in the apropriate place when overwriting a file....
> > > > 
> > > > From Sandeen's blog, March 2009:
> > > > 
> > > > "I dunno how to resolve this right now.  I talked to some nice KDE folks
> > > > on irc; they basically want atomic writes, either you get your old file
> > > > or your new file post-crash; and tempfile/sync/rename does this – but
> > > > the fsync hurts on 78% of the Linux filesystems out there.  So their
> > > > KSaveFile class doesn’t fsync.  So what to do, what to do.."
> > > > 
> > > > That's 4 years ago.  Is it possible the KDE devs are still not using
> > > > fsync?  Sure seems likely given Markus' problem.
> > > 
> > > Looking at the source:
> > > http://api.kde.org/4.10-api/kdelibs-apidocs/kdecore/html/ksavefile_8cpp_source.html#l00219
> > > it appears that one can set an environment variable KDE_EXTRA_FSYNC to
> > > address this issue.
> > > 
> > > However in my case it doesn't help. Even with KDE_EXTRA_FSYNC=1 I still
> > > loose my KDE settings in case of a crash. So the whole fsync thing might
> > > be a red herring.
> > > 
> > > What's more this time I endend up with undeletable files in /tmp (for
> > > example .X0-lock) after the crash:
> > > 
> > > (/dev/sdb was mounted and unmounted normally before I ran xfs_repair)
> > > 
> > > t@ubunt:~# xfs_repair /dev/sdb
> > > Phase 1 - find and verify superblock...
> > > Phase 2 - using internal log
> > >         - zero log...
> > >         - scan filesystem freespace and inode maps...
> > > agi unlinked bucket 0 is 683435008 in ag 2 (inode=4978402304)
> > > agi unlinked bucket 1 is 683435009 in ag 2 (inode=4978402305)
> > >         - found root inode chunk
> > 
> > Again, these are signs that log recovery has not completed
> > successfully or that for some reason it thought the log was clean.
> > Can you please post the dmesg output after the crash when you go
> > through the mount/unmount process before you run xfs_repair?
> 
> Sure.
> First boot after crash:
>  XFS (sdb2): Mounting Filesystem
>  XFS (sdb2): Starting recovery (logdev: internal)
>  XFS (sdb2): Ending recovery (logdev: internal)
> 
> Second boot after crash:
>  XFS (sdb2): Mounting Filesystem
>  XFS (sdb2): Ending clean mount 
> 
> I then boot Ubuntu from another disc to run xfs_repair.

That's what shoul dhave been in the initial description of your
problem.

> And looking through my logs I see this WARNING:
> 
> ------------[ cut here ]------------
> WARNING: CPU: 0 PID: 439 at fs/inode.c:280 drop_nlink+0x33/0x40()
> CPU: 0 PID: 439 Comm: gconfd-2 Not tainted 3.10.0-08982-g6d128e1-dirty #42
> Hardware name: System manufacturer System Product Name/M4A78T-E, BIOS 3503    04/13/2011
>  0000000000000009 ffffffff8157d030 0000000000000000 ffffffff81060788
>  ffff8801f8608cc8 ffff880205998230 ffff8801f7bede58 0000000000000000
>  ffff8801f86083c0 ffffffff8110ce93 ffff8801f8608b40 ffffffff811b7104
> Call Trace:
>  [<ffffffff8157d030>] ? dump_stack+0x41/0x51
>  [<ffffffff81060788>] ? warn_slowpath_common+0x68/0x80
>  [<ffffffff8110ce93>] ? drop_nlink+0x33/0x40
>  [<ffffffff811b7104>] ? xfs_droplink+0x24/0x60
>  [<ffffffff811b84ed>] ? xfs_remove+0x24d/0x380
>  [<ffffffff811b1657>] ? xfs_vn_unlink+0x37/0x80
>  [<ffffffff8110414e>] ? vfs_unlink+0x6e/0xe0
>  [<ffffffff8110432a>] ? do_unlinkat+0x16a/0x220
>  [<ffffffff810f4fa9>] ? SyS_faccessat+0x149/0x200
>  [<ffffffff81583292>] ? system_call_fastpath+0x16/0x1b

When did that occur? Before the crash, after the first/second mount?
after you ran repair?

> Some further observations:
> 
> When I boot 3.2.0 after the crash log recovery works fine.
> 
> When I boot 3.9.0 after the crash I get the following:
> 
> [    2.332989] XFS (sdc2): Mounting Filesystem
> [    2.406206] XFS (sdc2): Starting recovery (logdev: internal)
> [    2.418147] XFS (sdc2): log record CRC mismatch: found 0xdbcaef48, expected 0x69e7934e.

Just informational - indicating that the log records don't have
valid CRCs in them because 3.2 didn't calculate them. If you are
getting them when after a crash on a 3.9+ kernel, then there's a
problem writing to the log....

> When I boot the current Linus tree after the crash log recovery fails silently.

dmesg output, please. Indeed, what does "fails silently" mean? the
filesystem doesn't mount but no error is given?

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs