Re: multi terabyte fulltext searching

"Joshua D. Drake" <jd@xxxxxxxxxxxxxxxxx> · Wed, 21 Mar 2007 08:49:56 -0700

Benjamin Arai wrote:
> Hi,
> 
> I have been struggling with getting fulltext searching for very large
> databases.  I can fulltext index 10s if gigs without any problem but
> when I start geting to hundreds of gigs it becomes slow.  My current
> system is a quad core with 8GB of memory.  I have the resource to throw
> more hardware at it but realistically it is not cost effective to buy a
> system with 128GB of memory.  Is there any solutions that people have
> come up with for indexing very large text databases?

GIST indexes are very large.

> Essentially I have several terabytes of text that I need to index.  Each
> record is about 5 paragraphs of text.  I am currently using TSearch2
> (stemming and etc) and getting sub-optimal results.  Queries take more
> than a second to execute.

you are complaining about more than a second with a terabyte of text?

>  Has anybody implemented such a database using
> multiple systems or some special add-on to TSearch2 to make things
> faster?  I want to do something like partitioning the data into multiple
> systems and merging the ranked results at some master node.  Is
> something like this possible for PostgreSQL or must it be a software
> solution?
> 
> Benjamin
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
> 

-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/