Phil, This sounds to me like an issue identified that affects Gluster directories that were part of older versions related to extended attributes that were set on the directories. I believe this issue is supposed to be fixed in 3.1.2. I don't know how large your dataset is, but a way to fix it would be to: 1. Delete the Gluster volume. 2. On the back end directories on your nodes, scrub the offending extended attribute with the command: find /back/end/dir -exec setfattr -x trusted.gfid {} \; 3. Create the Gluster volume again. 4. Mount the volume somewhere as a GlusterFS(mount -t glusterfs....) and run: find /mnt/gluster -print0 | xargs --null stat 5. Enjoy. Please let me know if that helps. Thank you. -Jacob -----Original Message----- From: gluster-users-bounces at gluster.org [mailto:gluster-users-bounces at gluster.org] On Behalf Of phil cryer Sent: Thursday, January 13, 2011 9:07 AM To: gluster-users at gluster.org Subject: Re: Debian, 3.1.1, duplicate files So, I haven't heard anything back, so I just wanted to update this just in case anyone else comes across it. This was an old store that we created in 3.0.4, that kept getting duplicate files, basically we ran an update script that would use wget, try to download any files that were not present on the local box but were on the remote. Of course if it just downloaded the same file it would either 1) ignore it and not download it because it would see that we already have it 2) overwrite that file (clobber) with a new version of that file or 2) rewrite the file as file.1 so as not to mess with the original one (no-clobber) - but in fact it did none of these - so instead we ended up with the bizzare feature of having multiple/identical files in the same directory. Meanwhile we're also using far more space than we should have (~70TB instead of ~40TB or so) thanks to having directories like this: # ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/ total 536436 drwxr-xr-x 2 www-data www-data 294912 Jan 13 10:05 . drwx------ 1016 www-data www-data 3846144 Dec 12 11:10 .. -rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010 tijdschriftvoore1951nede_djvu.txt -rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010 tijdschriftvoore1951nede_djvu.txt -rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010 tijdschriftvoore1951nede_djvu.xml -rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010 tijdschriftvoore1951nede_djvu.xml -rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010 tijdschriftvoore1951nede.gif -rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010 tijdschriftvoore1951nede.gif -rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010 tijdschriftvoore1951nede_jp2.zip -rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010 tijdschriftvoore1951nede_jp2.zip -rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010 tijdschriftvoore1951nede_marc.xml -rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010 tijdschriftvoore1951nede_marc.xml -rwxr-xr-x 1 www-data www-data 720 Jul 12 2010 tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x 1 www-data www-data 720 Jul 12 2010 tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x 1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml -rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml Ouch, right? So, I installed 3.1.1, that went well, I got it on all the drives and servers we had before, have a total capacity of 96TB again, good, all seems to be working, mounted the old directories and saw the same issue with the duplicate files and let it sit over night to see if it would notice this and try to fix things. Then we're seeing gluster logs saying things like: ==> glusterfs/mnt-glusterfs.log <== [2011-01-13 11:46:23.2762] I [afr-common.c:662:afr_lookup_done] bhl-volume-replicate-55: entries are missing in lookup of /www/t/tijdschriftvoore1951nede. [2011-01-13 11:46:23.2817] I [afr-common.c:716:afr_lookup_done] bhl-volume-replicate-55: background meta-data data entry self-heal triggered. path: /www/t/tijdschriftvoore1951nede [2011-01-13 11:46:23.5342] I [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] bhl-volume-replicate-55: background meta-data data entry self-heal completed on /www/t/tijdschriftvoore1951nede ...so we think, hey, maybe we're all set here, it's fixing itself and removing those duplicate files, but no such luck: # ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/ total 536436 drwxr-xr-x 2 www-data www-data 294912 Jan 13 10:05 . drwx------ 1016 www-data www-data 3846144 Dec 12 11:10 .. -rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010 tijdschriftvoore1951nede_djvu.txt -rwxr-xr-x 1 www-data www-data 1151282 Jul 12 2010 tijdschriftvoore1951nede_djvu.txt -rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010 tijdschriftvoore1951nede_djvu.xml -rwxr-xr-x 1 www-data www-data 12078834 Jul 12 2010 tijdschriftvoore1951nede_djvu.xml -rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010 tijdschriftvoore1951nede.gif -rwxr-xr-x 1 www-data www-data 271733 Jul 12 2010 tijdschriftvoore1951nede.gif -rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010 tijdschriftvoore1951nede_jp2.zip -rwxr-xr-x 1 www-data www-data 257779301 Jul 12 2010 tijdschriftvoore1951nede_jp2.zip -rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010 tijdschriftvoore1951nede_marc.xml -rwxr-xr-x 1 www-data www-data 2278 Jul 12 2010 tijdschriftvoore1951nede_marc.xml -rwxr-xr-x 1 www-data www-data 720 Jul 12 2010 tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x 1 www-data www-data 720 Jul 12 2010 tijdschriftvoore1951nede_meta.mrc -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x 1 www-data www-data 256 Jul 12 2010 tijdschriftvoore1951nede_names.xml_meta.txt -rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml -rwxr-xr-x 1 www-data www-data 257556 Jul 13 2010 tijdschriftvoore1951nede_scandata.xml but, this allows us to do (in my opinion) scary things like this: # ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/*_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml # rm /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml # ls -al /mnt/glusterfs//www/t/tijdschriftvoore1951nede/*_names.xml -rwxr-xr-x 1 www-data www-data 546411 Jul 12 2010 /mnt/glusterfs//www/t/tijdschriftvoore1951nede/tijdschriftvoore1951nede_names.xml eek! so it only removed one of the files, even though they both had the same name. At this point we're going to wipe all 70TB and re-transfer, hoping it stops when it gets all the files and doesn't start writing the files with the same names as before. Anyone with advice or insight into this issue? Would love to learn why it did this, and REALLY hope it doesn't do it again. Thanks P On Wed, Jan 12, 2011 at 2:37 PM, phil cryer <phil at cryer.us> wrote: > I'm now running gluster 3.1.1 on Debian. A directory that was running > under 3.0.4 had duplicate files, but I've remounted things now that > we're running 3.1.1 in hopes it would fix things, but so far it has > not: > > # ls -l /mnt/glusterfs/www/0/0descriptionofta581unittotal 37992 > -rwxr-xr-x 1 www-data www-data 796343 Jun 23 2010 > 0descriptionofta581unit_bw.pdf > -rwxr-xr-x 1 www-data www-data 796343 Jun 23 2010 > 0descriptionofta581unit_bw.pdf > ---------T 1 root root 1497 Jun 24 2010 > 0descriptionofta581unit_dc.xml > ---------T 1 root root 1497 Jun 24 2010 > 0descriptionofta581unit_dc.xml > ---------T 1 www-data www-data 577050 Jun 24 2010 > 0descriptionofta581unit.djvu > ---------T 1 www-data www-data 577050 Jun 24 2010 > 0descriptionofta581unit.djvu > -rwxr-xr-x 1 www-data www-data 33272 Jun 22 2010 > 0descriptionofta581unit_djvu.txt > -rwxr-xr-x 1 www-data www-data 33272 Jun 22 2010 > 0descriptionofta581unit_djvu.txt > -rwxr-xr-x 1 www-data www-data 4445 Jun 23 2010 > 0descriptionofta581unit_files.xml > -rwxr-xr-x 1 www-data www-data 4445 Jun 23 2010 > 0descriptionofta581unit_files.xml > -rwxr-xr-x 1 www-data www-data 5011 Jun 22 2010 > 0descriptionofta581unit_marc.xml > -rwxr-xr-x 1 www-data www-data 5011 Jun 22 2010 > 0descriptionofta581unit_marc.xml > -rwxr-xr-x 1 www-data www-data 360 Jun 23 2010 > 0descriptionofta581unit_metasource.xml > -rwxr-xr-x 1 www-data www-data 360 Jun 23 2010 > 0descriptionofta581unit_metasource.xml > -rwxr-xr-x 1 www-data www-data 2848 Jun 22 2010 > 0descriptionofta581unit_meta.xml > -rwxr-xr-x 1 www-data www-data 2848 Jun 22 2010 > 0descriptionofta581unit_meta.xml > -rwxr-xr-x 1 www-data www-data 16916480 Jun 22 2010 > 0descriptionofta581unit_orig_jp2.tar > -rwxr-xr-x 1 www-data www-data 16916480 Jun 22 2010 > 0descriptionofta581unit_orig_jp2.tar > -rwxr-xr-x 1 www-data www-data 1051810 Jun 22 2010 > 0descriptionofta581unit.pdf > -rwxr-xr-x 1 www-data www-data 1051810 Jun 22 2010 > 0descriptionofta581unit.pdf > > While running the latest, 3.1.1, I noticed some log files that said: > > [..] > [2011-01-12 15:24:33.325546] I > [afr-common.c:613:afr_lookup_self_heal_check] bhl-volume-replicate-69: > size differs for > /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu > [2011-01-12 15:24:33.325558] I [afr-common.c:716:afr_lookup_done] > bhl-volume-replicate-69: background meta-data data self-heal > triggered. path: > /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu > [2011-01-12 15:24:33.364501] I > [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] > bhl-volume-replicate-66: background meta-data data self-heal > completed on /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu > [2011-01-12 15:24:33.364881] I > [afr-self-heal-common.c:1526:afr_self_heal_completion_cbk] > bhl-volume-replicate-69: background meta-data data self-heal > completed on /www/0/0descriptionofta581unit/0descriptionofta581unit.djvu > > I assumed it was fixing that, but it didn't. Here's the full logs that > include all the gluster.log work it did in this directory: > http://pastebin.com/8X52Em7Y > > Question: how can I 'fix' this, or is the best bet to remove > everything and start over? It's going to set us back, but I'd rather > do it now that keep banging on this without any resolution. > > Thanks for the help, really like the new gluster command, very nice! > > P > -- > http://philcryer.com > -- http://philcryer.com _______________________________________________ Gluster-users mailing list Gluster-users at gluster.org http://gluster.org/cgi-bin/mailman/listinfo/gluster-users