On Thu, Jul 1, 2010 at 6:34 PM, Michal Fapso <michal.fapso@xxxxxxxxx> wrote: > It took about 4.5 seconds. If I rerun it, it takes > less than 2 miliseconds, but it is because of the cache. I need to > optimize the first-run. > > laptop ASUS, CPU dual core T2300 1.66GHz, 1.5G RAM > > EXPLAIN ANALYZE SELECT h1.docid > FROM hyps AS h1 > WHERE h1.wordid=65658; > > Bitmap Heap Scan on hyps h1 (cost=10.97..677.09 rows=171 width=4) > (actual time=62.106..4416.864 rows=343 loops=1) > Recheck Cond: (wordid = 65658) > -> Bitmap Index Scan on hyps_wordid_index (cost=0.00..10.92 > rows=171 width=0) (actual time=42.969..42.969 rows=343 loops=1) > Index Cond: (wordid = 65658) > Total runtime: 4432.015 ms > > If I run the same query in Lucene search engine, it takes 0.105 > seconds on the same data which is quite a huge difference. So PostgreSQL is reading 343 rows from disk in 4432 ms, or about 12 ms/row. I'm not an expert on seek times, but that might not really be that unreasonable, considering that those rows may be scattered all over the index and thus it may be basically random I/O. Have you tried clustering hyps on hyps_wordid_index? If you had a more sophisticated disk subsystem you could try increasing effective_io_concurrency but that's not going to help with only one spindle. If you run the same query in Lucene and it takes only 0.105 s, then Lucene is obviously doing a lot less I/O. I doubt that any amount of tuning of your existing schema is going to produce that kind of result on PostgreSQL. Using the full-text search stuff, or a gin index of some kind, might get you closer, but it's hard to beat a special-purpose engine that implements exactly the right algorithm for your use case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance