On 2013.07.19 at 14:23 -0500, Eric Sandeen wrote: > On 7/19/13 11:32 AM, Markus Trippelsdorf wrote: > > On 2013.07.19 at 11:02 -0500, Eric Sandeen wrote: > >> On 7/19/13 7:51 AM, Markus Trippelsdorf wrote: > >>> On 2013.07.19 at 14:41 +0200, Stefan Ring wrote: > >>>>> I've bisected this issue to the following commit: > >>>>> > >>>>> commit cca9f93a52d2ead50b5da59ca83d5f469ee4be5f > >>>>> Author: Dave Chinner <dchinner@xxxxxxxxxx> > >>>>> Date: Thu Jun 27 16:04:49 2013 +1000 > >>>>> > >>>>> xfs: don't do IO when creating an new inode > >>>>> > >>>>> Reverting this commit on top of the Linus tree "solves" all problems for > >>>>> me. IOW I no longer loose my KDE and LibreOffice config files during a > >>>>> crash. Log recovery now works fine and xfs_repair shows no issues. > >>>>> > >>>>> So users of 3.11.0-rc1 beware. Only run this version if you have > >>>>> up-to-date backups handy. > >> > >> Are you certain about that bisection point? All that does is > >> say: When we allocate a new inode, assign it a random generation > >> number, rather than reading it from disk & incrementing the > >> older generation number, AFAICS. So it simply avoids a read IO. > > > > Yes, I'm sure. > > As I wrote above I also double-checked by reverting the commit on top of > > the current Linus tree. > > > >> I wonder if simply changing IO patterns on the SSD changes how > >> it's doing caching & destaging <handwave>. > > > > No. The corruption also happens on my conventional (spinning) drives. > > > >>>> What I miss in this thread is a distinction between filesystem > >>>> corruption on the one hand and a few zeroed files on the other. The > >>>> latter may be a nuisance, but it is expected behavior, while the > >>>> former should never happen, period, if I'm not mistaken. > >>> > >>> Well, it is natural that fs developers at first try to blame userspace. > >> > >> I disagree with that, we just need to be clear about your scenarios, > >> and what integrity guarantees should apply. > >> > >>> Unfortunately it turned out that in this case there is filesystem > >>> corruption. (Fortunately this normally happens only very rarely on rc1 > >>> kernels). > >> > >> Corruption is when you get back data that you did not write, > >> or metadata which is inconsistent or unreadable even after a proper > >> log replay. > >> > >> Corruption is _not_ unsynced, buffered data that was lost on a > >> crash or poweroff. > >> > >> But I might not have followed the thread properly, and I might > >> misunderstand your situation. > >> > >> When you experience this lost file [data] scenario, was it after an > >> orderly reboot, or after a crash and/or system reset? > > > > To reproduce this issue simply boot into your desktop and then hit > > sysrq-c and reboot. > > Ok, a crash, so at a minimum, some buffered data loss is 100% expected. Sure. > > After log replay without error messages, the > > filesystem is in an inconsistent state > > What exactly do you mean by inconsistent state? Sorry to be pedantic here. By inconsistent state I mean a filesystem state that forces you to run xfs_repair to get back to normal. > > and many small config files are > > lost. > > Written how long ago? Were they fsynced? > I suppose you are unsure about that, if they're app-written. I hit sysrq-c ~10 seconds after the KDE session is fully functional. As I've wrote above I added an fsync to the KDE config file handler. So the files should be fsynced. > > There are also undeletable files. > > What happens when you try to delete them? They show up as "?????? ??????" in "ls -l" and I get an error when I try to delete them. (I don't recall the exact error message) See for example the /tmp/.X0-lock file that I mentioned earlier in this thread. > > You need to run xfs_repair > > manually to bring the filesystem back to normal. > > And what is the repair output? See the outputs I've posted in this thread before. It's always a variation thereof. > Can you show an exact sequence of events, capturing all relevant output from repair and/or dmesg, etc, just so we see exactly what you see? I already did that. -- Markus _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs