Hi gurus, I've been banging my head against a test volume for about a month and a half now, and I'm having some serious problems figuring out what's going on. I'm running on Ubuntu 12.04 amd64 I'm running Gluster 3.4.0final-ubuntu1~precise1 My cluster is made up of four machines, each machine has two 4TB HDDs (ext4), with replication My test client has an HDD with 913GB of test data in 156,544 files Forgive the weird path names, but I wanted to use a setup with something akin to the real data that I'd be using, and in production there's going to be weird path names aplenty. I include the path names here just in case someone sees something obvious, like "You compared the wrong files" or "You can't use path names like that with gluster!" But for your reading pleasure, I also list output below with the path names removed so that you can clearly see similarities or differences from client to volume to brick. Disclaimer: I have done some outage tests with this volume in the past by unplugging a drive, plugging it back in, and then doing a full heal. The volume currently shows 1023 failed heals on bkupc1-b:/export/b/ (brick #2). But that was before I started this particular test. For this test all the old files and directories had been deleted from the volume beforehand so that I could start with an empty volume. And no outages -- simulated or otherwise -- have taken place for this test. (I have confirmed that every file listed by a gluster as heal-failed no longer exists. And yet, even though I have deleted the volume's contents, the failed heals count remains.) I thought this might be important to disclose. If so desired I can repeat the test after deleting the volume and recreating it from scratch. However, once in production, doing this would be highly unfeasible as a solution to a problem. So if this is the cause of my angst, then I'd rather know how to fix things as they sit now as opposed to scrapping the volume and starting anew. Here's a detailed description of my latest test: 1) The client mounts the volume with fuse.glusterfs (rw,default_permissions,allow_other,max_read=131072) as /data/bkupc1 2) I perform an rsync of the data to the volume. I have the whole test scripted and I'll list the juicy bits: cd /export/d/eraseme/ if [ -d /data/bkupc1/BACKUPS/ ]; then mv /data/bkupc1/BACKUPS /data/bkupc1/BACKUPS.old ( /bin/rm -fr /data/bkupc1/BACKUPS.old & ) fi mkdir /data/bkupc1/BACKUPS rsync \ -a \ -v \ --delete \ --delete-excluded \ --force \ --ignore-errors \ --one-file-system \ --progress \ --stats \ --exclude '/tmp' \ --exclude '/var/tmp' \ --exclude '**core' \ --partial \ --inplace \ ./ \ /data/bkupc1/BACKUPS/ NOTE: If the directory /data/bkupc1/BACKUPS/ exists from a previous run of this test then I move it, and then delete it in the background while rsync is running. Output: ... Number of files: 156554 Number of files transferred: 147980 Total file size: 886124490325 bytes Total transferred file size: 886124487184 bytes Literal data: 886124487184 bytes Matched data: 0 bytes File list size: 20189800 File list generation time: 0.001 seconds File list transfer time: 0.000 seconds Total bytes sent: 886258975318 Total bytes received: 2845881 sent 886258975318 bytes received 2845881 bytes 45981053.79 bytes/sec total size is 886124490325 speedup is 1.00 3) My client has md5 checksums for it's files, so next my script checks the files on the volume: cd /data/bkupc1/BACKUPS/ md5sum -c --quiet md5sums data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF: FAILED md5sum: WARNING: 1 computed checksum did NOT match a) Taking a closer look at this file: On the client: root at client:/export/d/eraseme# ls -ald data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF -rw-r--r-- 1 peek peek 646041328 Nov 13 2009 data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF On the volume: root at bkupc1-a:/data/bkupc1/BACKUPS# ls -ald data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF -rw-r--r-- 1 peek peek 646041328 Nov 13 2009 data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF On the raw bricks: root at bkupc1-a:/export# ls -ald ./*/glusterfs/BACKUPS/data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF -rw-r--r-- 2 peek peek 646041328 Nov 13 2009 ./a/glusterfs/BACKUPS/data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF root at bkupc1-b:/export# ls -ald ./*/glusterfs/BACKUPS/data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF -rw-r--r-- 2 peek peek 646041328 Nov 13 2009 ./a/glusterfs/BACKUPS/data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF To make this more readable, here's the output with the path stripped off, listed in the order given above: -rw-r--r-- 1 peek peek 646041328 Nov 13 2009 <-- client -rw-r--r-- 1 peek peek 646041328 Nov 13 2009 <-- volume -rw-r--r-- 2 peek peek 646041328 Nov 13 2009 <-- brick #1 -rw-r--r-- 2 peek peek 646041328 Nov 13 2009 <-- brick #2 Good: Size, permissions, ownership, and time all match. b) MD5 checksums: On the client: root at catus:/export/d/eraseme# md5sum data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF 52b8f8166ef4303bd7b897e8cc6a86c0 data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF On the volume: root at bkupc1-a:/data/bkupc1/BACKUPS# md5sum data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF 90a767df080af25adbc3db4da8406072 data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF On the bricks: root at bkupc1-a:/export# md5sum ./*/glusterfs/BACKUPS/data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF 90a767df080af25adbc3db4da8406072 ./a/glusterfs/BACKUPS/data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF root at bkupc1-b:/export# md5sum ./*/glusterfs/BACKUPS/data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF 52b8f8166ef4303bd7b897e8cc6a86c0 ./a/glusterfs/BACKUPS/data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/87fdc790-0443-11e3-b8fb-f46d04e15793/87fb6cfc-0443-11e3-b8fb-f46d04e15793/gF-Eqm1GHw7NPNQOQoeJLNNlfL5ydR0FzVZDHdK9OShRHknwgkqCG0M1yWnryQ,cdfk6Ysdk99eoncEHxnDrEQZF To make this more readable, here's the output with the path stripped off, listed in the order given above: 52b8f8166ef4303bd7b897e8cc6a86c0 <-- client 90a767df080af25adbc3db4da8406072 <-- volume 90a767df080af25adbc3db4da8406072 <-- brick #1 52b8f8166ef4303bd7b897e8cc6a86c0 <-- brick #2 AHA!!! The MD5 checksum is different on one of the bricks! c) I also have SHA1 checksums of these files as well, and checking that I get the same thing: a12cbec32cc8b02dd4dc5e53d017238756f2b182 <-- client 4fbeacdac48f5a292bd5f0c9dfe1d073fd75354e <-- volume 4fbeacdac48f5a292bd5f0c9dfe1d073fd75354e <-- brick #1 a12cbec32cc8b02dd4dc5e53d017238756f2b182 <-- brick #2 4) Last but not least, just to make sure that the horse is good and dead, my script does a byte-by-byte comparison of every file with /usr/bin/diff -r -q ./ /data/bkupc1/BACKUPS/. Diff reports a difference -- BUT -- it's with a *different* file, in a *different* directory: Files ./data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/8825c6c8-0443-11e3-b8fb-f46d04e15793/880f8f0c-0443-11e3-b8fb-f46d04e15793/iMmV,UqdiqZRie5QUu341iRS7s,-OK7PzXSuPgr0o30yNDXNG6uvqA0Wyr7RRR3MBE4 and /data/bkupc1/BACKUPS/data/884b9a38-0443-11e3-b8fb-f46d04e15793/884a7040-0443-11e3-b8fb-f46d04e15793/8825c6c8-0443-11e3-b8fb-f46d04e15793/880f8f0c-0443-11e3-b8fb-f46d04e15793/iMmV,UqdiqZRie5QUu341iRS7s,-OK7PzXSuPgr0o30yNDXNG6uvqA0Wyr7RRR3MBE4 differ NOTE: Diff doesn't even notice that the first file -- the one listed in (3) above -- shows any difference at all. I suppose this could be explained away depending on glusters' internal workings. IF gluster provides access to replicated data round-robin then I could see how md5 and sha1 might wind up getting the file from brick#1 while diff gets the file from brick#2, but that's "IF", not "HOW". I don't actually know anything about how gluster works under the hood. That's just the first possible explanation that came to my mind. On closer look: ls -ald: -rw-r--r-- 1 peek peek 527435808 Aug 5 2009 <-- client -rw-r--r-- 1 peek peek 527435808 Aug 5 2009 <-- volume -rw-r--r-- 2 peek peek 527435808 Aug 5 2009 <-- brick #1 -rw-r--r-- 2 peek peek 527435808 Aug 5 2009 <-- brick #2 MD5: 01eca86b5b48beb8f76204112dc69ac3 <-- client 01eca86b5b48beb8f76204112dc69ac3 <-- volume 01eca86b5b48beb8f76204112dc69ac3 <-- brick #1 3c1c9eadc44a1e144a576d4b388a1c42 <-- brick #2 SHA1: a570dac34f820bc4973ea485b059429786068993 <-- client a570dac34f820bc4973ea485b059429786068993 <-- volume a570dac34f820bc4973ea485b059429786068993 <-- brick #1 0b755903ac3ba1314fbba7a73ef0c5c6d6716ff1 <-- brick #2 Why is this happening? Did I do something wrong? Or is this a legitimate bug? I have preserved the log files from each client and I'll be pouring over those next, but I'll be honest, I don't know what I'm looking for. Any help is greatly appreciated. Michael Peek