Igor Lobanov wrote:
Richard Huxton wrote:
I have rather large table with about 5 millions of rows and a dozen
of columns. Let's suppose that columns are named 'a', 'b', 'c' etc. I
need to query distinct pairs of ('a';'b') from this table.
>
What version of PostgreSQL is it?
8.1.4
Current release is 8.1.6 - probably worth upgrading when you've got
time. It should be a simple matter of replacing the binaries but do
check the release notes.
How many distinct values are you getting back from your 5 million
rows? If there are too many, an index isn't going to help.
No more than 10,000.
OK. Should be possible to do something then.
Can you share the EXPLAIN ANALYSE output? You might want to try
increasing work_mem for this one query to speed any sorting.
Real table and colum names are obfuscated because of NDA, sorry.
Fair enough.
explain analyze select distinct a, b from tbl
EXPLAIN ANALYZE output is:
Unique (cost=500327.32..525646.88 rows=1848 width=6) (actual
time=52719.868..56126.356 rows=5390 loops=1)
-> Sort (cost=500327.32..508767.17 rows=3375941 width=6) (actual
time=52719.865..54919.989 rows=3378864 loops=1)
Sort Key: a, b
-> Seq Scan on tbl (cost=0.00..101216.41 rows=3375941
width=6) (actual time=16.643..20652.610 rows=3378864 loops=1)
Total runtime: 57307.394 ms
Hmm - am I right in thinking (a,b) are the only two columns on this
table? That means you'll have a lot of rows per page and an index scan
could end up fetching lots of pages to check the rows are visible. Still
- I'd expect it to be better than a seq scan.
The first thing to try is to put the index back and run "SET
enable_seqscan=off" before the explain analyse. That should force it to
use the index. Then we'll see what costs it's expecting.
How often is the table updated? Clustering might buy you some
improvements (but not a huge amount I suspect).
It is updated once per 3-5 seconds.
OK - forget clustering then.
And one more thing. I don't know if it helps, but column 'a' can have
value from a limited set: 0, 1 or 2. Column 'b' is also an integer
(foreign key, actually).
Hmm - might be worth trying distinct on (b,a) with an index that way
around - that would give you greater selectivity at the top-level of the
btree. Can you repeat the EXPLAIN ANALYSE with that too please.
--
Richard Huxton
Archonet Ltd