Search Postgresql Archives

Re: multi terabyte fulltext searching

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Benjamin Arai wrote:
> True, but what happens when my database reaches 100 terabytes? Is 5
> seconds ok? How about 10?  My problem is that I do not believe the
> performance loss I am experiencing as the data becomes large is (log the
> # of records).  This worries me because I could be doing something
> wrong.  Or I might be able to do something better.

Well a couple of things you could do, especially if you have the ability
to throw hardware at it.

How many spindles do you have?

J


> 
> Benjamin
> 
> On Mar 21, 2007, at 8:49 AM, Joshua D. Drake wrote:
> 
>> Benjamin Arai wrote:
>>> Hi,
>>>
>>> I have been struggling with getting fulltext searching for very large
>>> databases.  I can fulltext index 10s if gigs without any problem but
>>> when I start geting to hundreds of gigs it becomes slow.  My current
>>> system is a quad core with 8GB of memory.  I have the resource to throw
>>> more hardware at it but realistically it is not cost effective to buy a
>>> system with 128GB of memory.  Is there any solutions that people have
>>> come up with for indexing very large text databases?
>>
>> GIST indexes are very large.
>>
>>> Essentially I have several terabytes of text that I need to index.  Each
>>> record is about 5 paragraphs of text.  I am currently using TSearch2
>>> (stemming and etc) and getting sub-optimal results.  Queries take more
>>> than a second to execute.
>>
>> you are complaining about more than a second with a terabyte of text?
>>
>>
>>>  Has anybody implemented such a database using
>>> multiple systems or some special add-on to TSearch2 to make things
>>> faster?  I want to do something like partitioning the data into multiple
>>> systems and merging the ranked results at some master node.  Is
>>> something like this possible for PostgreSQL or must it be a software
>>> solution?
>>>
>>> Benjamin
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>>       choose an index scan if your joining column's datatypes do not
>>>       match
>>>
>>
>>
>> -- 
>>       === The PostgreSQL Company: Command Prompt, Inc. ===
>> Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
>> Providing the most comprehensive  PostgreSQL solutions since 1997
>>              http://www.commandprompt.com/
>>
>> Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
>> PostgreSQL Replication: http://www.commandprompt.com/products/
>>
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
> 


-- 

      === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
             http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux