Re: Slow count(*) again...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




I ran into a fine example of this when I was searching this mailing list,
"Searching in 856,646 pages took 13.48202 seconds. Site search powered by
PostgreSQL 8.3." Obviously at some point count(*) came into play here

Well, tsearch full text search is excellent, but it has to work inside the limits of the postgres database itself, which means row visibility checks, and therefore, yes, extremely slow count(*) on large result sets when the tables are not cached in RAM.

Also, if you want to use custom sorting (like by date, thread, etc) possibly all the matching rows will have to be read and sorted.

Consider, for example, the Xapian full text search engine. It is not MVCC (it is single writer, multiple reader, so only one process can update the index at a time, but readers are not locked out during writes). Of course, you would never want something like that for your main database ! However, in its particular application, which is multi-criteria full text search (and flexible sorting of results), it just nukes tsearch2 on datasets not cached in RAM, simply because everything in it including disk layout etc, has been optimized for the application. Lucene is similar (but I have not benchmarked it versus tsearch2, so I can't tell).

So, if your full text search is a problem, just use Xapian. You can update the Xapian index from a postgres trigger (using an independent process, or simply, a plpython trigger using the python Xapian bindings). You can query it using an extra process acting as a server, or you can write a set-returning plpython function which performs Xapian searches, and you can join the results to your tables.

Pg will never have such a fast count() as MyISAM does or the same insanely fast read performance,

Benchmark it you'll see, MyISAM is faster than postgres for "small simple selects", only if : - pg doesn't use prepared queries (planning time takes longer than a really simple select)
- myisam can use index-only access
- noone is writing to the myisam table at the moment, obviously

On equal grounds (ie, SELECT * FROM table WHERE pk = value) there is no difference. The TCP/IP overhead is larger than the query anyway, you have to use unix sockets on both to get valid timings. Since by default on localhost MySQL seems to use unix sockets and PG uses tcp/ip, PG seem 2x slower, which is in fact not true.

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux