On Wed, 21 Mar 2007 08:57:39 -0700, Benjamin Arai wrote: > Hi Oleg, > > I am currently using GIST indexes because I receive about 10GB of new data > a week (then again I am not deleting any information). The do not expect > to be able to stop receiving text for about 5 years, so the data is not > going to become static any time soon. The reason I am concerned with > performance is that I am providing a search system for several newspapers > since essentially the beginning of time. Many bibliographer etc would > like to use this utility but if each search takes too long I am not going > to be able to support many concurrent users. > > Benjamin > At a previous job, I built a system to do this. We had 3,000 publications and approx 70M newspaper articles. Total content size (postprocessed) was on the order of >100GB, IIRC. We used a proprietary (closed-source not ours) search engine. In order to reach subsecond response time we needed to horizontally scale to about 50-70 machines, each a low-end Dell 1650. This was after about 5 years of trying to vertically scale. -arturo