On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > On Wednesday, 2014-07-02 at 08:04 -0400, Brian Foster wrote: > >On Wed, Jul 02, 2014 at 11:57:25AM +0200, Carlos E. R. wrote: > > ... > > >This is the background eofblocks scanner attempting to free preallocated > >space on a file. The scanner looks for files that have been recently > >grown and since been flushed to disk (i.e., no longer concurrently being > >written to) and trims the post-eof preallocation that comes along with > >growing files. > > > >The corruption errors at xfs_alloc.c:1602,1629 on v3.11 fire if the > >extent we are attempting to free is already accounted for in the > >by-block allocation btree. IOW, this is attempting to free an extent > >that the allocation metadata thinks is already free. > > > >> > >>Brief description: > >> > >> > >> * It happens only on restore from hibernation. > > > >Interesting, could you elaborate a bit more on the behavior this system > >is typically subjected to? i.e., is this a server that sees a constant > >workload that is also frequently hibernated/awakened? > > It is a desktop machine I use for work at home. I typically have many > applications opened on diferent workspaces in XFCE. Say one has terminals, > another has Thunderbird/Pine, another Firefox, another LibreOffice; another > may have gimp, another may be kbabel or lokalize, another may have vmplayer, > etc, whatever. When I go out or go to sleep, I hibernate the machine, > instead of powering down, because it is much faster than reboot, login, and > start the wanted applications, and I want to conserve some electricity. > > I also use the machine for testing configurations, but these I try to do on > virtual machines, instead of my work partition. > > > The machine may be used anywhere from 4 to 16 hours a day, and hibernated at > least once a day, perhaps three times if I have to go out several times. It > makes no sense to me to leave the machine powered doing nothing, if > hibernating is so easy and reliable - till now. If I have to leave for more > than a week, I tend to do a full "halt". > > > > By the way, this started hapening when I replaced an old 500 GB hard disk > (Seagate ST3500418AS) with a 2 TB new unit (Seagate ST2000DM001-1CH164). > Smartctl long test says fine (and seatools from Windows, too). > Ok, so there's a lot going on. I was mainly curious to see what was causing lingering preallocations, but it could be anything extending a file multiple times. > > > >>I do not have more info than what appears on the logs, but four times (two > >>different kernels): > >> > >>cer@Telcontar:~> zgrep XFS_WANT_CORRUPTED_GOTO /var/log/messages*xz > >>/var/log/messages-20140402.xz:<0.1> 2014-03-15 03:35:17 Telcontar kernel - - - [37685.111787] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1629 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9 > >>/var/log/messages-20140402.xz:<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9 > >>/var/log/messages-20140506.xz:<0.1> 2014-04-17 22:47:08 Telcontar kernel - - - [280271.851374] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c54fe9 > >>/var/log/messages-20140629.xz:<0.1> 2014-06-29 12:32:18 Telcontar kernel - - - [212890.626346] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_alloc.c. Caller 0xffffffffa0c39fe9 > >>cer@Telcontar:~> > > >So you have reproduced this, reformatted with mkfs, restored from > >backups and continued to reproduce the problem? And still only on this > >particular partition? > > Right. Exactly that. > > Only that I can not reproduce the issue at will, but about once a month, > randomly. > > AFAIK, xfsdump can not carry over a filesystem corruption, right? > I think that's accurate, though it might complain/fail in the act of dumping an fs that is corrupted. The behavior here suggests there might not be on disk corruption, however. > > > **** LONG DESCRIPTION and LOGS start here ******** > ... > <5.6> 2014-06-29 12:48:34 Telcontar rsyslogd - - - [origin software="rsyslogd" swVersion="7.4.7" x-pid="1111" x-info="http://www.rsyslog.com"] exiting on signal 15. > 2014-06-29 12:48:35+02:00 - Halting the system now =========================================== uptime: 12:48pm up 4 days 8:43, 33 users, load average: 1.40, 0.53, 0.67 > 2014-06-29 12:57:41+02:00 - Booting the system now ================================================================================ Linux Telcontar 3.11.10-11-desktop #1 SMP PREEMPT Mon May 12 13:37:06 UTC 2014 (3d22b5f) x86_64 x86_64 x86_64 GNU/Linux > > (it does not show in the log that I had to hit the hardware reset button, > the machine refused to reboot normally, apparently) > > > (If you ask why I took so long to notice the problem after thawing, > my routine is to power up the machine, then go prepare tea. :-) > When I come back with the mug, I'm dismayed to see I can not > start working; and this day I was in a a hurry) > > > So I reboot (text mode, level 3), umount home, run xfsrepair, mount again, > do xfsdump, do simultanesouly an rsync (it is a file by file copy, in case > of problems with dump), umount, use YaST in text mode to reformat the > partition, mount, and then xfsrestore. It did not occur to me to make a > 'dd' photo this time: I was tired and busy. > > Maybe next time I can take the photo with dd before doing anything else (it > takes about 80 minutes), or simply do an "xfs_metadump", which should be > faster. And I might not have then 500 GiB of free space to make a dd copy, > anyway. > xfs_metadump should be faster. It will grab the metadata only and obfuscate filenames so as to hide sensitive information. > > > > > > Question. > > As this always happens on recovery from hibernation, and seeing the message > "Corruption of in-memory data detected", could it be that thawing does a bad > memory recovery from the swap? I thought that the procedure includes some > checksum, but I don't know for sure. > Not sure, though if so I would think that might be a more common source of problems. > > > > > > >This is interesting because the corruption appears to be associated with > >post-eof space, which is generally transient. The worst case is that > >this space is trimmed off files when they are evicted from cache, such > >as during a umount. To me, that seems to correlate with a more > >recent/runtime problem rather than something that might be lingering on > >disk, but we don't really know for sure. > > Dunno. > > To me, there are two problems: > > 1) The corruption itself. > 2) That xfs_repair fails to repair the filesystem. In fact, I believe > it does not detect it! > > To me, #2 is the worst, and it is what makes me do the backup, format, > restore cycle for recovery. An occassional kernel crash is somewhat > acceptable :-} > Well it could be that the "corruption" is gone at the point of a remount. E.g., something becomes inconsistent in memory, the fs detects it and shuts down before going any further. That's actually a positive. ;) That also means it's probably not be necessary to do a full backup, reformat and restore sequence as part of your routine here. xfs_repair should scour through all of the allocation metadata and yell if it finds something like free blocks allocated to a file. > > > >>Wait! I have a "dd" copy of the entire partition (500 GB), made on March > >>16th, 5 AM, so hard data could be obtained from there. I had forgotten. I'll > >>get something for you now: > > ... > > >>I could do a "xfs_metadump" on it - just tell me what options to use, and > >>where can the result be uploaded to, if big. > >> > > > >A metadump would be helpful, though that only gives us the on-disk > >state. What was the state of this fs at the time the dd image was > >created? > > I'm sorry, I'm not absolutely sure. I believe it is corrupted, but I can not > vouch it. > > >I'm curious if something like an 'rm -rf *' on the metadump > >would catch any other corruptions or if this is indeed limited to > >something associated with recent (pre)allocations. > > Sorry, run 'rm -rf *' where??? > On the metadump... mainly just to see whether freeing all of the used blocks in the fs triggered any other errors (i.e., a brute force way to check for further corruptions). > > >Run 'xfs_metadump <src> <tgtfile>' to create a metadump that will > >obfuscate filenames by default. It should also be compressible. In the > >future, it's probably worth grabbing a metadump as a first step (before > >repair, zeroing the log, etc.) so we can look at the fs in the state > >most recent to the crash. > > I will take that photo next time, using a rescue system in order to impede > the system from mounting the partition and replaying the log. Dunno how long > that will take to happen, though... usually a month - but at least now I > know how to do it. > > > > > Meanwhile, I have done a xfs_metadump of the image, and compressed it with > xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email > that, and even less to a mail list. > > Do you still have a bugzilla system where I can upload it? I had an account > at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if it still > runs :-? > I think http://bugzilla.redhat.com should allow you to file a bug and attach the file. Brian > If you don't, I can try to create it a bugzilla on openSUSE instead, and > tell you the number... but I don't know if it takes files that big. If it > doesn't, I'll fragment the file. You need to have an account there, I think, > to retrieve the attachment, and I would prefer to mark the bug private, or > at least the attachment. > > > > > I did the following. > > First I made a copy, with "dd", of the partition image, all 489G of it. On > this copy I ran "xfs_check", "xfs_repair -n", and "xfs_repair", with these > results: > > > Telcontar:/data/storage_d/old_backup # xfs_check xfs_copy_home_workonit > xfs_check is deprecated and scheduled for removal in June 2014. > Please use xfs_repair -n <dev> instead. > Telcontar:/data/storage_d/old_backup # xfs_repair -n xfs_copy_home_workonit > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - scan filesystem freespace and inode maps... > - found root inode chunk > Phase 3 - for each AG... > - scan (but don't clear) agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > - process newly discovered inodes... > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > No modify flag set, skipping phase 5 > Phase 6 - check inode connectivity... > - traversing filesystem ... > - traversal finished ... > - moving disconnected inodes to lost+found ... > Phase 7 - verify link counts... > No modify flag set, skipping filesystem flush and exiting. > Telcontar:/data/storage_d/old_backup # time xfs_repair xfs_copy_home_workonit > Phase 1 - find and verify superblock... > Phase 2 - using internal log > - zero log... > - scan filesystem freespace and inode maps... > - found root inode chunk > Phase 3 - for each AG... > - scan and clear agi unlinked lists... > - process known inodes and perform inode discovery... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > - process newly discovered inodes... > Phase 4 - check for duplicate blocks... > - setting up duplicate extent list... > - check for inodes claiming duplicate blocks... > - agno = 0 > - agno = 1 > - agno = 2 > - agno = 3 > Phase 5 - rebuild AG headers and trees... > - reset superblock... > Phase 6 - check inode connectivity... > - resetting contents of realtime bitmap and summary inodes > - traversing filesystem ... > - traversal finished ... > - moving disconnected inodes to lost+found ... > Phase 7 - verify and correct link counts... > done > > real 0m28.058s > user 0m1.692s > sys 0m2.265s > Telcontar:/data/storage_d/old_backup # > > > Maybe the image was made after repair, or maybe xfs_repair doesn't detect > anything, which as far as I remember, was the case. > > > > I recreate the copy, to try "mount" on an unaltered copy. > > > Telcontar:/data/storage_d/old_backup # time dd if=xfs_copy_home > of=xfs_copy_home_workonit && mount -v xfs_copy_home_workonit mount/ > 1024000000+0 records in > 1024000000+0 records out > 524288000000 bytes (524 GB) copied, 4662.7 s, 112 MB/s > > real 77m43.697s > user 3m1.420s > sys 28m41.958s > mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount. > (reverse-i-search)`mount': time dd if=xfs_copy_home > Telcontar:/data/storage_d/old_backup # > > > So it mounts... > > > > > > - -- Cheers, > Carlos E. R. > (from 13.1 x86_64 "Bottle" at Telcontar) > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.22 (GNU/Linux) > > iEYEARECAAYFAlO0x18ACgkQtTMYHG2NR9X6QwCcD8r5qXIHVh4ELklM/tzXASds > yskAoIcwxYNC2tKsS7wE9Jp+g4MNUdpd > =pIZI > -----END PGP SIGNATURE----- > > _______________________________________________ > xfs mailing list > xfs@xxxxxxxxxxx > http://oss.sgi.com/mailman/listinfo/xfs _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs