So, I haven't heard anything back, so I just wanted to update this just in case anyone else comes across it. This was an old store that we created in 3.0.4, that kept getting duplicate files, basically we ran an update script that would use wget, try to download any files that were not present on the local box but were on the remote. Of course if it just downloaded the same file it would either 1) ignore it and not download it because it would see that we already have it 2) overwrite that file (clobber) with a new version of that file or 2) rewrite the file as file.1 so as not to mess with the original one (no-clobber) - but in fact it did none of these - so instead we ended up with the bizzare feature of having multiple/identical files in the same directory. Meanwhile we're also using far more space than we should have (~70TB instead of ~40TB or so) thanks to having directories like this: # ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/ total 536436 drwxr-xr-x 2 www-data www-data 294912 Jan 13 10:05 . drwx------ 1016 www-data www-data 3846144 Dec 12 11:10 .. -rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010 tijdschriftvoore1951nede_djvu.txt -rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010 tijdschriftvoore1951nede_djvu.txt -rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010 tijdschriftvoore1951nede_djvu.xml -rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010 tijdschriftvoore1951nede_djvu.xml -rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010 tijdschriftvoore1951nede.gif -rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010 tijdschriftvoore1951nede.gif -rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010 tijdschriftvoore1951nede_jp2.zip -rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010 tijdschriftvoore1951nede_jp2.zip -rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010 tijdschriftvoore1951nede_marc.xml -rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010 tijdschriftvoore1951nede_marc.xml -rwxr-xr-x 1 www-data www-data 720 Jul 12 2010 tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x 1 www-data www-data 720 Jul 12 2010 tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x 1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml -rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml Ouch, right? So, I installed 3.1.1, that went well, I got it on all the drives and servers we had before, have a total capacity of 96TB again, good, all seems to be working, mounted the old directories and saw the same issue with the duplicate files and let it sit over night to see if it would notice this and try to fix things. Then we're seeing gluster logs saying things like: ==> glusterfs/mnt-glusterfs.log <== [2011-01-13 11:46:23.2762] I [afr-common.c:662:afr_lookup_done] bhl-volume-replicate-55: entries are missing in lookup of /www/t/tijdschriftvoore1951nede. [2011-01-13 11:46:23.2817] I [afr-common.c:716:afr_lookup_done] bhl-volume-replicate-55: background meta-data data entry self-heal triggered. path: /www/t/tijdschriftvoore1951nede [2011-01-13 11:46:23.5342] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] bhl-volume-replicate-55: background meta-data data entry self-heal completed on /www/t/tijdschriftvoore1951nede ...so we think, hey, maybe we're all set here, it's fixing itself and removing those duplicate files, but no such luck: # ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/ total 536436 drwxr-xr-x 2 www-data www-data 294912 Jan 13 10:05 . drwx------ 1016 www-data www-data 3846144 Dec 12 11:10 .. -rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010 tijdschriftvoore1951nede_djvu.txt -rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010 tijdschriftvoore1951nede_djvu.txt -rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010 tijdschriftvoore1951nede_djvu.xml -rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010 tijdschriftvoore1951nede_djvu.xml -rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010 tijdschriftvoore1951nede.gif -rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010 tijdschriftvoore1951nede.gif -rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010 tijdschriftvoore1951nede_jp2.zip -rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010 tijdschriftvoore1951nede_jp2.zip -rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010 tijdschriftvoore1951nede_marc.xml -rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010 tijdschriftvoore1951nede_marc.xml -rwxr-xr-x 1 www-data www-data 720 Jul 12 2010 tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x 1 www-data www-data 720 Jul 12 2010 tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x 1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml -rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml but, this allows us to do (in my opinion) scary things like this: # ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/*_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml # rm /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml # ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/*_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml eek! so it only removed one of the files, even though they both had the same name. At this point we're going to wipe all 70TB and re-transfer, hoping it stops when it gets all the files and doesn't start writing the files with the same names as before. Anyone with advice or insight into this issue? Would love to learn why it did this, and REALLY hope it doesn't do it again. Thanks P On Wed, Jan 12, 2011 at 2:37 PM, phil cryer <phil at cryer.us> wrote: > I'm now running gluster 3.1.1 on Debian. A directory that was running > under 3.0.4 had duplicate files, but I've remounted things now that > we're running 3.1.1 in hopes it would fix things, but so far it has > not: > > # ls -l /mnt/glusterfs/www/0/0descriptionofta581unittotal 37992 > -rwxr-xr-x 1 www-data www-data ? 796343 Jun 23 ?2010 > 0descriptionofta581unit_bw.pdf > -rwxr-xr-x 1 www-data www-data ? 796343 Jun 23 ?2010 > 0descriptionofta581unit_bw.pdf > ---------T 1 root ? ? root ? ? ? ? 1497 Jun 24 ?2010 > 0descriptionofta581unit_dc.xml > ---------T 1 root ? ? root ? ? ? ? 1497 Jun 24 ?2010 > 0descriptionofta581unit_dc.xml > ---------T 1 www-data www-data ? 577050 Jun 24 ?2010 > 0descriptionofta581unit.djvu > ---------T 1 www-data www-data ? 577050 Jun 24 ?2010 > 0descriptionofta581unit.djvu > -rwxr-xr-x 1 www-data www-data ? ?33272 Jun 22 ?2010 > 0descriptionofta581unit_djvu.txt > -rwxr-xr-x 1 www-data www-data ? ?33272 Jun 22 ?2010 > 0descriptionofta581unit_djvu.txt > -rwxr-xr-x 1 www-data www-data ? ? 4445 Jun 23 ?2010 > 0descriptionofta581unit_files.xml > -rwxr-xr-x 1 www-data www-data ? ? 4445 Jun 23 ?2010 > 0descriptionofta581unit_files.xml > -rwxr-xr-x 1 www-data www-data ? ? 5011 Jun 22 ?2010 > 0descriptionofta581unit_marc.xml > -rwxr-xr-x 1 www-data www-data ? ? 5011 Jun 22 ?2010 > 0descriptionofta581unit_marc.xml > -rwxr-xr-x 1 www-data www-data ? ? ?360 Jun 23 ?2010 > 0descriptionofta581unit_metasource.xml > -rwxr-xr-x 1 www-data www-data ? ? ?360 Jun 23 ?2010 > 0descriptionofta581unit_metasource.xml > -rwxr-xr-x 1 www-data www-data ? ? 2848 Jun 22 ?2010 > 0descriptionofta581unit_meta.xml > -rwxr-xr-x 1 www-data www-data ? ? 2848 Jun 22 ?2010 > 0descriptionofta581unit_meta.xml > -rwxr-xr-x 1 www-data www-data 16916480 Jun 22 ?2010 > 0descriptionofta581unit_orig_jp2.tar > -rwxr-xr-x 1 www-data www-data 16916480 Jun 22 ?2010 > 0descriptionofta581unit_orig_jp2.tar > -rwxr-xr-x 1 www-data www-data ?1051810 Jun 22 ?2010 0descriptionofta581unit.pdf > -rwxr-xr-x 1 www-data www-data ?1051810 Jun 22 ?2010 0descriptionofta581unit.pdf > > While running the latest, 3.1.1, I noticed some log files that said: > > [..] > [2011-01-12 15:24:33.325546] I > [afr-common.c:613:afr_lookup_self_heal_check] bhl-volume-replicate-69: > size differs for > /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu > [2011-01-12 15:24:33.325558] I [afr-common.c:716:afr_lookup_done] > bhl-volume-replicate-69: background ?meta-data data self-heal > triggered. path: > /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu > [2011-01-12 15:24:33.364501] I > [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] > bhl-volume-replicate-66: background ?meta-data data self-heal > completed on /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu > [2011-01-12 15:24:33.364881] I > [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] > bhl-volume-replicate-69: background ?meta-data data self-heal > completed on /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu > > I assumed it was fixing that, but it didn't. Here's the full logs that > include all the gluster.log work it did in this directory: > http://pastebin.com/8X52Em7Y > > Question: how can I 'fix' this, or is the best bet to remove > everything and start over? It's going to set us back, but I'd rather > do it now that keep banging on this without any resolution. > > Thanks for the help, really like the new gluster command, very nice! > > P > -- > http://philcryer.com > -- http://philcryer.com