Re: Crawling and indexing hardware

"Marcus Herou" <marcus.herou@xxxxxxxxxxxxx> · Fri, 9 May 2008 11:03:55 +0200

Oooops. Didn't think of that with AFR. However I think Lucene always create
new files when documents are flushed to disk so on commit basis there will
be low imapact. But the scenario you're talking about will most definitely
kick in when optimization of the index occurs. Hundreds of smaller files
aggregates into bigger more compact files. Since Lucene cannot hold all
smaller files in memory it will flush parts of the merge in "log" files
which will trigger the case you're talking about.

So basically the absolute worst case possible using GlusterFS with AFR would
be to use it with a webserver access log right ?

I think I will go for AFR when it comes to the billion small files since
they are almost never updated but is there a smart way of updating big files
in GlusterFS ?

Perhaps Gluster is a bad choice for Lucene indexing and I really need to go
for having many cheap boxes with local disks instead.

Kindly

//Marcus

On Fri, May 9, 2008 at 10:37 AM, Daniel Maher
<dma+gluster@xxxxxxxxx<dma%2Bgluster@xxxxxxxxx>>
wrote:

> On Wed, 7 May 2008 20:06:40 +0200 "Marcus Herou"
> <marcus.herou@xxxxxxxxxxxxx> wrote:
>
> > 1.  Big index files ~x Gig each
> > 2.  Many small files in a huge amount of directories.
>
> Do you plan to do any AFR (automatic file replication) ?  If so,
> consider that even a one-byte change to your "big index files" will
> cause the /entire/ file to be AFR'd between all participating nodes.
>
> > Finally what tools would suite to test zillions of small files ?
> > Bonnie++ ? Fewer big files ? Still Bonnie++ or perhaps IOZone ?
>
> IOZone is an interesting tool, assuming you can interpret the
> results. :P  I have been using Bonnie++ and FFSB extensively over the
> past couple of weeks to stresstest / benchmark Gluster.  Both have the
> advantage of producing easily interpretable results, and FFSB is highly
> configurable, depending on what sort of tests you'd like to run (read /
> write / both, small / large files, lots / few files, etc..).
>
> The following page contains some sample FFSB configs to work from :
> http://tastic.brillig.org/~jwb/zfs-xfs-ext4.html<http://tastic.brillig.org/%7Ejwb/zfs-xfs-ext4.html>
> (see "Step 8".)
>
> Cheers !
>
> --
> Daniel Maher <dma AT witbe.net>
>

-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.herou@xxxxxxxxxxxxx
http://www.tailsweep.com/
http://blogg.tailsweep.com/