-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
[This email has been delayed, while I thought about where to upload
metadata file - see near the end]
On Thursday, 2014-07-03 at 13:39 -0400, Brian Foster wrote:
On Thu, Jul 03, 2014 at 05:00:47AM +0200, Carlos E. R. wrote:
Ok, so there's a lot going on. I was mainly curious to see what was
causing lingering preallocations, but it could be anything extending a
file multiple times.
Right.
AFAIK, xfsdump can not carry over a filesystem corruption, right?
I think that's accurate, though it might complain/fail in the act of
dumping an fs that is corrupted. The behavior here suggests there might
not be on disk corruption, however.
At least, not a detectable one.
If I don't do that backup-format-restore, I get issues soon, and it
crashes within a day - I got after booting (the first event):
0.1> 2014-03-15 03:53:47 Telcontar kernel - - - [ 301.857523] XFS: Internal error XFS_WANT_CORRUPTED_RETURN at line 350 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_all
And some hours later:
<0.1> 2014-03-15 22:20:34 Telcontar kernel - - - [20151.298345] XFS: Internal error XFS_WANT_CORRUPTED_GOTO at line 1602 of file /home/abuild/rpmbuild/BUILD/kernel-desktop-3.11.10/linux-3.11/fs/xfs/xfs_allo
It was here that I decided to backup-format-restore instead.
Maybe next time I can take the photo with dd before doing anything else (it
takes about 80 minutes), or simply do an "xfs_metadump", which should be
faster. And I might not have then 500 GiB of free space to make a dd copy,
anyway.
xfs_metadump should be faster. It will grab the metadata only and
obfuscate filenames so as to hide sensitive information.
Ok, I have a post-it label on the monitor so that I remember - my notes
are typically stored in the home partition :-)
But the obfuscation is not complete, I can recognize file names:
00008DC0 .leeme.kfPTgt . ....... .2aujzfJ.%;u. . .0...
00008DF0 .pepe_after_gnome.tar.bz2.vcTJ8c.@.. . .......
00008E20 .amyN3xYjaldFXYpeUry. 3;&.K.. .. .0... !.pepe_j
00008E50 ust_created.tar.bz2.JlyD0W .. .@....... .NGb0URO
00008E80 C0Bh9cHwp-hBh.6wMS .. .p . ... ..registro.0DPzS
00008EB0 G .. . ....... .8n-.w$.9. .. . .8... +.suse_u
00008EE0 pgrade_to_102_pkglist-bis.txt.tcFUKq. . .......
00008F10 #B-XqcrWP4cqsw77yv8UsYbcCa-D76q..(#.. .. .8...
00008F40 '.suse_upgrade_to_102_pkglist.txt.0KTuDa 7.. .8
I just had a quick look with 'mc', the dump is to large too inspect it
all.
Question.
As this always happens on recovery from hibernation, and seeing the message
"Corruption of in-memory data detected", could it be that thawing does a bad
memory recovery from the swap? I thought that the procedure includes some
checksum, but I don't know for sure.
Not sure, though if so I would think that might be a more common source
of problems.
And it only affects my /home partition - although it may be the busiest
one.
To me, there are two problems:
1) The corruption itself.
2) That xfs_repair fails to repair the filesystem. In fact, I believe
it does not detect it!
To me, #2 is the worst, and it is what makes me do the backup, format,
restore cycle for recovery. An occassional kernel crash is somewhat
acceptable :-}
Well it could be that the "corruption" is gone at the point of a
remount. E.g., something becomes inconsistent in memory, the fs detects
it and shuts down before going any further. That's actually a positive.
;)
That also means it's probably not be necessary to do a full backup,
reformat and restore sequence as part of your routine here. xfs_repair
should scour through all of the allocation metadata and yell if it finds
something like free blocks allocated to a file.
No, if I don't backup-format-restore it happens again within a day. There
is something lingering. Unless that was just chance... :-?
It is true that during that day I hibernated several times more than
needed to see if it happened again - and it did.
I'm curious if something like an 'rm -rf *' on the metadump
would catch any other corruptions or if this is indeed limited to
something associated with recent (pre)allocations.
Sorry, run 'rm -rf *' where???
On the metadump... mainly just to see whether freeing all of the used
blocks in the fs triggered any other errors (i.e., a brute force way to
check for further corruptions).
Sorry, but I fail to see how to do it. I maybe thick, or I lack the context.
If I run:
Telcontar:/data/storage_d/old_backup # ls -lh
total 604G
drwxr-xr-x 22 root root 4.0K Mar 8 20:30 home
drwxr-xr-x 3 root root 16 Sep 25 2010 home1
drwxr-xr-x 2 root root 6 Jul 3 02:36 mount
- -rw-r--r-- 1 root root 45 Jul 3 04:25 procedure
- -rw-r--r-- 1 root root 388M Jul 3 02:42 tgtfile
- -rw-r--r-- 1 root root 11M Jul 3 02:50 tgtfile2.xz
- -rw-r--r-- 1 root users 489G Mar 16 05:42 xfs_copy_home
- -rw-r--r-- 1 root root 489G Jul 3 04:40 xfs_copy_home_workonit
- -rw-r--r-- 1 root users 39G Mar 16 05:49 xfsdump__home
- -rw-r--r-- 1 root users 39G Mar 16 05:57 xfsdump__home1
Telcontar:/data/storage_d/old_backup # rm -rf *
that would destroy my entire backup!
If you mean:
rm -rf tgtfile
I fail to see what that would accomplish, except to remove a file that is actually on a different partition, not home.
However, I can do:
Telcontar:/data/storage_d/old_backup # mount -v xfs_copy_home_workonit mount/
mount: /dev/loop0 mounted on /data/storage_d/old_backup/mount.
Telcontar:/data/storage_d/old_backup # cd mount
Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*
Telcontar:/data/storage_d/old_backup/mount # time rm -r /data/storage_d/old_backup/mount/*
real 2m45.380s
user 0m0.265s
sys 0m6.878s
Telcontar:/data/storage_d/old_backup/mount #
Telcontar:/data/storage_d/old_backup/mount # ls -la
total 4
drwxr-xr-x 2 root root 6 Jul 4 01:56 .
drwxr-xr-x 5 root root 4096 Jul 3 04:25 ..
Telcontar:/data/storage_d/old_backup/mount #
Telcontar:/data/storage_d/old_backup/mount # df -h .
Filesystem Size Used Avail Use% Mounted on
/dev/loop0 489G 33M 489G 1% /data/storage_d/old_backup/mount
Telcontar:/data/storage_d/old_backup/mount #
And I do not see anything on the log, only that it mounted cleanly.
Meanwhile, I have done a xfs_metadump of the image, and compressed it with
xz. It has 10834536 bytes. What do I do with it? I'm not sure I can email
that, and even less to a mail list.
Do you still have a bugzilla system where I can upload it? I had an account
at <http://oss.sgi.com/bugzilla/>, made on 2010. I don't know if it still
runs :-?
I have an active bugzilla account at <http://oss.sgi.com/bugzilla/>, I'm
logged in there now. I haven't checked if I can create a bug, not been
sure what parameters to use (product, component, whom to assign to). I
think that would be the most appropriate place.
Meanwhile, I have uploaded the file to my google drive account, so I can
share it with anybody on request - ie, it is not public, I need to add a
gmail address to the list of people that can read the file.
Alternatively, I could just email the file to people asking for it,
offlist, but not in a single email, in chunks limited to 1.5 MB per
email.
I think http://bugzilla.redhat.com should allow you to file a bug and
attach the file.
Sorry, I don't have an account there...
I do have one at openSUSE, though, and it does allow me to attach files, up
to a limit. If the file is to big, it can be fragmented in pieces. But I
will not use it unless you people say that you have an account there.
For using a bugzilla, the most appropriate one would be at SGI, IMHO, if
they are still supporting this project.
- --
Cheers,
Carlos E. R.
(from 13.1 x86_64 "Bottle" at Telcontar)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.22 (GNU/Linux)
iEYEARECAAYFAlO3HXUACgkQtTMYHG2NR9VndgCgillZYmQCvUynytO/7YALlUyv
c9gAnj8GmFfnMHGd+P9GaWm9ScVVTH81
=GEXl
-----END PGP SIGNATURE-----
_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs