Hi, I am trying to understand how indexes works to get the most of them. First I would like to know if there is more advantage than overhead to split an index in several ones using conditions e.g. doing : CREATE INDEX directory_id_user_0_btree_idx ON mike.directory USING btree (id_user) WHERE id_user < 250000; CREATE INDEX directory_id_user_250000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 250000 AND id_user < 500000; CREATE INDEX directory_id_user_500000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 500000 AND id_user < 750000; CREATE INDEX directory_id_user_750000_btree_idx ON mike.directory USING btree (id_user) WHERE id_user >= 750000 AND id_user < 1000000; instead of having only one index for all the id_user. the forecasts for the table directory are +500 millions records and something like 1 million distinct id_user. If there is my idea was to do a repartition in the indexes using a consistent hash algorithm in order to fill the indexes in parallel instead of successively : CREATE OR REPLACE FUNCTION mike.__mod_cons_hash( IN in_dividend bigint, IN in_divisor integer, OUT remainder integer ) AS $__$ BEGIN SELECT in_dividend % in_divisor INTO remainder; END; $__$ LANGUAGE plpgsql IMMUTABLE COST 10; CREATE INDEX directory_id_user_mod_cons_hash_0_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 0; CREATE INDEX directory_id_user_mod_cons_hash_1_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 1; CREATE INDEX directory_id_user_mod_cons_hash_2_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 2; CREATE INDEX directory_id_user_mod_cons_hash_3_btree_idx ON mike.directory USING btree (id_user) WHERE __mod_cons_hash(id_user, 4) = 3; But the thing is the indexes are not used : mike=# SELECT version(); version ------------------------------------------------------------------------------------------------------------------- PostgreSQL 8.4.7 on i686-pc-linux-gnu, compiled by GCC gcc-4.4.real (Ubuntu/Linaro 4.4.4-14ubuntu5) 4.4.5, 32-bit (1 row) mike=# REINDEX INDEX directory_id_user_mod_cons_hash_0_btree_idx; LOG: duration: 14644.160 ms statement: REINDEX INDEX directory_id_user_mod_cons_hash_0_btree_idx; REINDEX mike=# EXPLAIN ANALYZE SELECT * FROM directory WHERE id_user = 4; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Seq Scan on directory (cost=0.00..38140.66 rows=67 width=148) (actual time=0.077..348.211 rows=10303 loops=1) Filter: (id_user = 4) Total runtime: 351.114 ms (3 rows) So I also did this test : mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user > 3 and id_user < 5; CREATE INDEX mike=# EXPLAIN ANALYZE select * from directory where id_user = 4; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using directory_id_user_4_btree_idx on directory (cost=0.00..10.58 rows=67 width=148) (actual time=0.169..7.753 rows=10303 loops=1) Index Cond: (id_user = 4) Total runtime: 10.973 ms (3 rows) mike=# DROP INDEX directory_id_user_4_btree_idx; DROP INDEX mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user - 1 > 2 and id_user + 1 < 6; CREATE INDEX mike=# EXPLAIN ANALYZE select * from directory where id_user = 4; QUERY PLAN ---------------------------------------------------------------------------------------------------------------- Seq Scan on directory (cost=0.00..38140.66 rows=67 width=148) (actual time=0.153..360.020 rows=10303 loops=1) Filter: (id_user = 4) Total runtime: 363.106 ms (3 rows) mike=# DROP INDEX directory_id_user_4_btree_idx; DROP INDEX mike=# CREATE INDEX directory_id_user_4_btree_idx ON mike.directory USING btree (id_user) WHERE id_user > 2 + 1 and id_user < 6 - 1; CREATE INDEX mike=# EXPLAIN ANALYZE select * from directory where id_user = 4; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------- Index Scan using directory_id_user_4_btree_idx on directory (cost=0.00..10.58 rows=67 width=148) (actual time=0.245..8.262 rows=10303 loops=1) Index Cond: (id_user = 4) Total runtime: 11.110 ms (3 rows) As you see the index condition although, differently written, is the same but the second index is not used apparently because the immutable function is applied on the column. So do you know the reason why the planner is not able to use indexes which have immutable functions applied to the column in their condition ? Regards. -- Sylvain Rabot <sylvain@xxxxxxxxxxxxxx>
Attachment:
signature.asc
Description: This is a digitally signed message part