OK, last post for this week, I promise.
Are the timestamps in the backing store supposed to be the same on all
nodes after syncing? Files were just synced from the master node
(primarly lock server). The master has files with a timestamp
1-Jan-1970. The secondary node that just synced them has the current
timestamp on the files (17-Feb-2009).
The gluster metadata doesn't match, either:
primary:
# file: spreadsheet.ods
trusted.glusterfs.afr.data-pending=0sAAAAAAAAAAAAAAAA
trusted.glusterfs.afr.metadata-pending=0sAAAAAAAAAAAAAAAA
trusted.glusterfs.createtime="1218717518"
trusted.glusterfs.version="11"
secondary:
getfattr -m "" -d spreadsheet.ods
# file: spreadsheet.ods
trusted.glusterfs.afr.data-pending=0sAAAAAAAAAAA=
md5 checksums of the files match, so the contents are the same.
When both servers are up, the timestamp lists correctly (1-Jan-1970).
When the primary gets shut down, ls lists the timestamp of the file on
the secondary server (17-Feb-2009).
When the primary server returns, the secondary starts listing the
timestamp correctly again. cat-ing all the files doesn't "heal" the
discrepancy.
The file sync was done by
# ls -laR /mount/path; find /mount/path -type f -exec head -c1 '{}' \;
The really weird thing is that this doesn't seem to happen to all the
files, but it does appear to happen to a very large number of them. The
master seems to have valid sattr metadata, but the newly synced mirror
doesn't, even though the content of the files in the backing store is
correct.
It is also worth noting that the timestamp discrepancy seems to have the
incidence of nearly 100%. The metadata discrepancy seems to be occuring
considerably less often.
Similarly, aren't timestamps and creation time always supposed to be
added to a file when it gets created/modified?
If I create a new file:
echo "test" > /home/gordan/test
cd /gluster/home/gordan
primary:
# getfattr -m "" -d test
# file: test
trusted.glusterfs.afr.data-pending=0sAAAAAAAAAAA=
secondary:
getfattr -m "" -d test
# file: test
trusted.glusterfs.afr.data-pending=0sAAAAAAAAAAA=
Is this normal? Shouldn't there be
trusted.glusterfs.createtime and trusted.glusterfs.version xattrs set on
that file?
If I then delete the file from the primary, I get this in the logs,
twice, ONLY on the primary:
2009-02-17 21:31:46 E [posix.c:2434:posix_xattrop] home-store: /gordan:
Numerical result out of range
2009-02-17 21:31:46 E [posix.c:2434:posix_xattrop] home-store: /gordan:
Numerical result out of range
Nothing gets logged on the secondary.
xattrs on my home directory are:
# getfattr -m "" -d gordan
# file: gordan
trusted.glusterfs.afr.entry-pending=0sAAAAEgAAAAAAAAAA
trusted.glusterfs.afr.metadata-pending=0sAAAAAAAAAAAAAAAA
trusted.glusterfs.createtime="1209929991"
trusted.glusterfs.version="9190"
I'm pretty sure this used to work correctly (or maybe I'm misguided in
the hope that I would have noticed by now if it wasn't). The only thing
that I've changed on my setup recently is the updated gluster patched
fuse module. Actually, thinking about it, and more importantly, looking
at the glusterfs logs, the time these "Numerical result out of range"
errors started appearing all over the place is about the same time I
upgraded fuse to 2.7.4glfs11. That may be more than just a coincidence.
I'll try to re-create this with a couple of minimalist virtual machines
with minimal data in the data stores so that I can provide root access
to them for testing, if that is of interest.
Gordan