Jesper Krogh wrote: > I have a table that consists of somewhere in the magnitude of 100.000.000 > rows and all rows are of this tuples > > (id1,id2,evalue); > > Then I'd like to speed up a query like this: > > explain analyze select id from table where id1 = 2067 or id2 = 2067 order > by evalue asc limit 100; > > ...The inner sets are on average 3.000 for > both id1 and id2 and a typical limit would be 100, so if I could convince > postgresql to not fetch all of them then I would reduce the set retrieved > by around 60. The dataset is quite large so the random query is not very > likely to be hitting the same part of the dataset again, so there is going > to be a fair amount of going to disk., If disk seeks are killing you a kinda crazy idea would be to duplicate the table - clustering one by (id1) and the other one by an index on (id2) and unioning the results of each. Since each of these duplicates of the table will be clustered by the column you're querying it on, it should just take one seek in each table. Then your query could be something like select * from ( select * from t1 where id1=2067 order by evalue limit 100 union select * from t2 where id2=2067 order by evalue limit 100 ) as foo order by evalue limit 100; Hmm.. and I wonder if putting evalue into the criteria to cluster the tables too (i.e. cluster on id1,evalue) if you could make it so the limit finds the right 100 evalues first for each table.... -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance