On 10.12.2012 11:58, Dave Chinner wrote: > On Sat, Dec 08, 2012 at 08:29:27PM +0100, Matthias Schniedermeyer wrote: > > On 06.12.2012 09:51, Lin Li wrote: > > > Hi, Guys. I recently suffered a huge data loss on power cut on an XFS > > > partition. The problem was that I copied a lot of files (roughly 20Gb) to > > > an XFS partition, then 10 hours later, I got an unexpected power cut. As a > > > result, all these newly copied files disappeared as if they had never been > > > copied. I tried to check and repair the partition, but xfs_check reports no > > > error at all. So I guess the problem is that the meta data for these files > > > were all kept in the cache (64Mb) and were never committed to the hard > > > disk. > > > > > > What is the cache flush policy for XFS? Does it always reserve some fixed > > > space in cache for metadata? I asked because I thought since I copied such > > > a huge amount of data, at least some of these files must be fully committed > > > to the hard disk, then cache is only 64Mb anyway. But the reality is all of > > > them were lost. the only possibility I can think is some part of the cache > > > was reserved for meta data, so even the cache is fully filled, this part > > > will not be written to the disk. Am I right? > > > > I have the same problem, several times. > > > > The latest just an hour ago. > > I'm copying a HDD onto another. Plain rsync -a /src/ /tgt/ Both HDDs are > > 3TB SATA-drives in a USB3-enclosure with a dm-crypt layer in between. > > About 45 minutes into copying the target HDD disconnects for a moment. > > 45minutes means someting over 200GB were copied, each file is about > > 900MB. > > After remounting the filesystems there were exactly 0 files. > > This sounds like an entirely different problem to what the OP > reported. For me it sounds only like different timing. Otherwise i don't see much difference in files vanished after a few hours(of inactiviry) and a few minutes (while still beeing active). > Did the filesystem have an error returned? No. > i.e. did it shut down (what's in dmesg)? There's not much XFS could have done after the block-device vanished. A dis-/r-eappierung block-device gets a new name because the old name is still "in use", the block-devic gets cleaned up after 'umount'ing and closing the dm-crypt device. When the USB3-HDD disconnected it reappered a moment later under a new name, it bounced between sdc <-> sdf. In syslog it's a plain "USB disconnect, device number XX" message. Followed by a standard new device found message-bombardment. In between there are some error-messages, but as it's pratically a yanked out and replugged cable, a little complaing by the kernel is to be expected. > Did you run repair in between the shutdown and remount? No. XFS (dm-3): Mounting Filesystem XFS (dm-3): Starting recovery (logdev: internal) XFS (dm-3): Ending recovery (logdev: internal) > How many files in that 200GB of data? At 0.9GB/file at least 220. > http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F > > Basically, you have an IO error situation, and you have dm-crypt > in-between buffering an unknown about of changes. In my experience, > data loss eventsi are rarely filesystem problems when USB drives or > dm-crypt is involved... I don't know the inner workings auf dm-*, but shouldn't it behave transparent and rely on the block-layer for buffering. > > After that i started a "while true; do sync ; done"-loop in the > > background. > > And just while i was writing this email the HDD disconnected a second > > time. But this time the files up until the last 'sync' were retained. > > Exactly as I'd expect. > > > And something like this has happend to me at least a half dozen times in > > the last few month. I think the first time was with kernel 3.5.X, when i > > was actually booting into 3.6 with a plain "reboot" (filesystem might > > not have been umounted cleanly.), after the reboot the changes of about > > the last half hour were gone. e.g. i had renamed a directory about 15 > > minutes before i rebooted and after the reboot the directory had it's > > old name back. > > > > Kernel in all but (maybe)one case is between 3.6 and 3.6.2 (currently), > > the first time MIGHT have been something around 3.5.8 but i'm not sure. > > HDDs were either connected by plain SATA(AHCI) or by USB3 enclosure. All > > affected filesystems were/are with a dm-crypt layer inbetween. > > Given that dm-crypt is the common factor here, I'd start by ruling > that out. i.e. reproduce the problem without dm-crypt being used. That's a slight problem for me, pratically everything i have is encrypted. Now that i think about it, maybe dm-crypt really is to blame, up until a few month ago i was using loop-AES. After dm-crypt got the capability to emulate it i have moved over to dm-crypt because the loop-AES support in Debian got worse over time. I didn't have any problems until after i moved to dm-crypt, but OTOH i'm not the only one using dm-crypt. But OTOOH maybe not so many people use the loop-AES compatibility-mode. -- Matthias _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs