On Fri, May 9, 2008 at 2:33 PM, Marcus Herou <marcus.herou@xxxxxxxxxxxxx> wrote: > Oooops. Didn't think of that with AFR. However I think Lucene always create > new files when documents are flushed to disk so on commit basis there will > be low imapact. But the scenario you're talking about will most definitely > kick in when optimization of the index occurs. Hundreds of smaller files > aggregates into bigger more compact files. Since Lucene cannot hold all > smaller files in memory it will flush parts of the merge in "log" files > which will trigger the case you're talking about. > > So basically the absolute worst case possible using GlusterFS with AFR would > be to use it with a webserver access log right ? > > I think I will go for AFR when it comes to the billion small files since > they are almost never updated but is there a smart way of updating big files > in GlusterFS ? What do you mean by smart way? are you referring to the unsmart way of selfheal happening now? or just write()s >> Do you plan to do any AFR (automatic file replication) ? If so, >> consider that even a one-byte change to your "big index files" will >> cause the /entire/ file to be AFR'd between all participating nodes. Marcus, what do you mean by this? Krishna > > Perhaps Gluster is a bad choice for Lucene indexing and I really need to go > for having many cheap boxes with local disks instead. > > Kindly > > //Marcus > > > > On Fri, May 9, 2008 at 10:37 AM, Daniel Maher > <dma+gluster@xxxxxxxxx<dma%2Bgluster@xxxxxxxxx>> > wrote: > >> On Wed, 7 May 2008 20:06:40 +0200 "Marcus Herou" >> <marcus.herou@xxxxxxxxxxxxx> wrote: >> >> > 1. Big index files ~x Gig each >> > 2. Many small files in a huge amount of directories. >> >> Do you plan to do any AFR (automatic file replication) ? If so, >> consider that even a one-byte change to your "big index files" will >> cause the /entire/ file to be AFR'd between all participating nodes. >> >> > Finally what tools would suite to test zillions of small files ? >> > Bonnie++ ? Fewer big files ? Still Bonnie++ or perhaps IOZone ? >> >> IOZone is an interesting tool, assuming you can interpret the >> results. :P I have been using Bonnie++ and FFSB extensively over the >> past couple of weeks to stresstest / benchmark Gluster. Both have the >> advantage of producing easily interpretable results, and FFSB is highly >> configurable, depending on what sort of tests you'd like to run (read / >> write / both, small / large files, lots / few files, etc..). >> >> The following page contains some sample FFSB configs to work from : >> http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html<http://tastic.brillig.org/%7Ejwb/zfs-xfs-ext4.html> >> (see "Step 8".) >> >> Cheers ! >> >> -- >> Daniel Maher <dma AT witbe.net> >> > > > > -- > Marcus Herou CTO and co-founder Tailsweep AB > +46702561312 > marcus.herou@xxxxxxxxxxxxx > http://www.tailsweep.com/ > http://blogg.tailsweep.com/ > _______________________________________________ > Gluster-devel mailing list > Gluster-devel@xxxxxxxxxx > http://lists.nongnu.org/mailman/listinfo/gluster-devel >