Search Postgresql Archives

Re: Non-unique index performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Sezai YILMAZ wrote:
Hello!

I have a table eith name person as described below. It has an unique index for id column (it is also primary key) and has an index for parent column.

If I run a query with where clause on id column it uses the index (look at the first explain analyze result; it says "Index Scan using...") and the query for 582856 rows table results in 225,893 ms.

But, if I run another query with where clause on parent column it does not use the index (look at the second explain analyze result; it says "Seq Scan using...") and the query for 582856 rows table results in 11192.913 ms.

Why the difference of both queries is so dramatical for unique and non-unique indexed columns? Why PostgreSQL does not use the non-unique indexes (it says that it does sequential scan)?

Because it thinks it will be faster/cheaper.

I have to use an index on non-unique column. What is the solution for that? Is there a way to speed up non-unique indexes?

You don't want to force it to use an index, you want it to make better estimates. Let's have a look...

*****************************************************************
test=> \d person
          Table "public.person"
Column  |         Type          | Modifiers
---------+-----------------------+-----------
name    | character varying(30) |
surname | character varying(30) |
id      | integer               | not null
parent  | integer               |
Indexes:
   "person_pkey" primary key, btree (id)
   "parent_ndx" btree (parent)

OK - all very simple. And you've said there are about 580,000 rows.

test=> explain analyze select id,name from person where id in ('17201', '338191', '244319', '515209', '20415');

Why are you quoting integers?

test=> explain analyze select * from person where parent in ('17201', '338191', '244319', '515209', '20415');
                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------- Seq Scan on person (cost=0.00..35871.26 rows=14427 width=32) (actual time=0.063..11192.809 rows=5 loops=1) Filter: ((parent = 17201) OR (parent = 338191) OR (parent = 244319) OR (parent = 515209) OR (parent = 20415))
Total runtime: 11192.913 ms

Hmm - for some reason it's expecting 14427 rows to be returned. If there were that many matches, then it might well be a better choice than going back and fore between the index and the table all the time.

So - we need to find out why it thinks there will be so many rows returned.

1. VACUUM FULL ANALYSE person;
2. re-run the explain

That will make sure that the table's statistics are up-to-date. If that hasn't helped, then perhaps we need to educate PG about the distribution of values in "parent".

2. ALTER TABLE person ALTER COLUMN parent SET STATISTICS 100;
3. ANALYSE person;
4. re-run the explain

If that still doesn't work, keep increasing the statistics (max. value 1000).

Let us know how that works for you.
--
  Richard Huxton
  Archonet Ltd

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@xxxxxxxxxxxxxx

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux