Oooops. Didn't think of that with AFR. However I think Lucene always create new files when documents are flushed to disk so on commit basis there will be low imapact. But the scenario you're talking about will most definitely kick in when optimization of the index occurs. Hundreds of smaller files aggregates into bigger more compact files. Since Lucene cannot hold all smaller files in memory it will flush parts of the merge in "log" files which will trigger the case you're talking about. So basically the absolute worst case possible using GlusterFS with AFR would be to use it with a webserver access log right ? I think I will go for AFR when it comes to the billion small files since they are almost never updated but is there a smart way of updating big files in GlusterFS ? Perhaps Gluster is a bad choice for Lucene indexing and I really need to go for having many cheap boxes with local disks instead. Kindly //Marcus On Fri, May 9, 2008 at 10:37 AM, Daniel Maher <dma+gluster@xxxxxxxxx<dma%2Bgluster@xxxxxxxxx>> wrote: > On Wed, 7 May 2008 20:06:40 +0200 "Marcus Herou" > <marcus.herou@xxxxxxxxxxxxx> wrote: > > > 1. Big index files ~x Gig each > > 2. Many small files in a huge amount of directories. > > Do you plan to do any AFR (automatic file replication) ? If so, > consider that even a one-byte change to your "big index files" will > cause the /entire/ file to be AFR'd between all participating nodes. > > > Finally what tools would suite to test zillions of small files ? > > Bonnie++ ? Fewer big files ? Still Bonnie++ or perhaps IOZone ? > > IOZone is an interesting tool, assuming you can interpret the > results. :P I have been using Bonnie++ and FFSB extensively over the > past couple of weeks to stresstest / benchmark Gluster. Both have the > advantage of producing easily interpretable results, and FFSB is highly > configurable, depending on what sort of tests you'd like to run (read / > write / both, small / large files, lots / few files, etc..). > > The following page contains some sample FFSB configs to work from : > http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html<http://tastic.brillig.org/%7Ejwb/zfs-xfs-ext4.html> > (see "Step 8".) > > Cheers ! > > -- > Daniel Maher <dma AT witbe.net> > -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.herou@xxxxxxxxxxxxx http://www.tailsweep.com/ http://blogg.tailsweep.com/