On Fri, Nov 16, 2007 at 07:56:45PM -0800, adrobj wrote: > > Hello, > > I have a moderately large (~10-20GB) table: > > CREATE TABLE msgs ( > msg varchar(2048), > msg_tsv tsvector, > posted timestamp > ); > > CREATE INDEX msgs_i ON msgs USING gin(msg_tsv); > > The table never gets updated (more specifically, it gets re-created > once a day with no updates in between). > > I want to run queries of the following form: > > SELECT msg, posted FROM msgs WHERE 'blah blah'::tsquery @@ msg_tsv > ORDERED BY posted DESC; (with various LIMIT/OFFSET) > > Which obviously may get too expensive, for it will cause reading and > sorting of all rows meeting the condition, i.e. too many disk reads. > > On the other hand, (as far as I understand) GIN always produces hits > already sorted in the insertion order. > > So - what if I just populate my table in the order of decreasing > 'posted', remove the "ORDERED BY" clause and just hope for the best? > Will the correct ordering be guaranteed? Ordering is never guaranteed without an ORDER BY, except in the time between a CLUSTER and the first write operation after it. > If not, are there any other ideas around? Rather than assuming you know where problems will arise, do some profiling and find out where they actually do :) Cheers, David. -- David Fetter <david@xxxxxxxxxx> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@xxxxxxxxx Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate ---------------------------(end of broadcast)--------------------------- TIP 5: don't forget to increase your free space map settings