Re: Full text search with ORDER BY performance issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/21/2009 11:32, valgog wrote:
Hi,

There is a problem with GIN and GIST indexes, that they cannot be used
by the ORDER BY. Maybe it will be a nice idea to ask Oleg to make it
possible to use the b-tree columns in GIST or GIN to make the sort
easier, but I have no idea how difficult it will be to implement it in
current GIN or GIST structures. I think Oleg or even Tom will be the
right people to ask it :) But even if it is possible it will not be
implemented at least until 8.5 that will need a year to come, so until
then...
Unfortunately, it's not even just the lack of ORDER BY support, btree_gin indexes seem to be broken under some circumstances. So I can't even use my idea to limit searches to the last 10 days.

See this:
http://pgsql.privatepaste.com/5219TutUMk

The first query gives bogus results. It's not using the index correctly.

timestamp_comment_gin is a GIN index on timestamp, comment_tsv. The timestamp column is an integer. The queries work right if I drop the index. Is this a bug in btree_gin?
It is possible to strip your table in several smaller ones putting
them on different machines and then splitting your query with DBLINK.
This will distribute the burden of sorting to several machines that
will have to sort smaller parts as well. After you have your 25 ids
from each of the machines, you can merge them, sort again and limit as
you wish. Doing large offsets will be still problematic but faster
anyway in most reasonable offset ranges. (Load balancing tools like
pg_pool can automate this task, but I do not have practical experience
using them for that purposes)

Yet another very interesting technology -- sphinx search (http://
www.sphinxsearch.com/). It can distribute data on several machines
automatically, but it will be probably too expensive to start using
(if your task is not your main one :)) as they do not have standard
automation scripts, it does not support live updates (so you will
always have some minutes delay), and this is a standalone service,
that needs to be maintained and configured and synchronized with our
main database separately (though you can use pg/python to access it
from postgres).

Good luck with your task :)
Yeah, I don't really have that sort of resources. This is a small hobby project (ie: no budget) that is growing a bit too large. I might just have to do text searches without time ordering.

On 7/21/2009 5:06, Scott Marlowe wrote:
Couldn't you do tge second query as a with query then run another
query to limit that result to everything greater than now()-xdays ?
I suppose I could, but I have no way to do a fast query that does both a full text match and a < or > in the same WHERE due to the issue I described above, so my original plan won't work. A separate BTREE timestamp index obviously does nothing.

And again, thank you for all the help.

--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux