Re: WTF with hash index?

Laurenz Albe <laurenz.albe@xxxxxxxxxxx> · Tue, 13 Nov 2018 18:55:49 +0100

Олег Самойлов wrote:
> \set table_size 1000000
> begin;
> create table gender (gender varchar);
> 
> insert into gender (gender) select case when random<0.50 then 'female' when random<0.99 then 'male' else 'other' end from (select random() as random, generate_series(1,:table_size)) as subselect;
> 
> create index gender_btree on gender using btree (gender);
> create index gender_hash on gender using hash (gender);
> commit;
> vacuum full analyze;
> 
> Vacuum full is not necessary here, just a little vodoo programming. I expected that the hash index will be much smaller and quicker than the btree index, because it doesn’t keep values inside itself, only hashes. But:
> 
> => \d+
>                    List of relations
>  Schema |  Name  | Type  | Owner | Size  | Description
> --------+--------+-------+-------+-------+-------------
>  public | gender | table | olleg | 35 MB |
> (1 row)
> 
> => \di+
>                           List of relations
>  Schema |     Name     | Type  | Owner | Table  | Size  | Description
> --------+--------------+-------+-------+--------+-------+-------------
>  public | gender_btree | index | olleg | gender | 21 MB |
>  public | gender_hash  | index | olleg | gender | 47 MB |
> (2 rows)
> 
> The hash index not only is more than the btree index, but also is bigger than the table itself. What is wrong with the hash index?

I guess the problem here is that there are so few distinct values, so
all the index items end up in only three hash buckets, forming large
linked lists.

I can't tell off-hand why that would make the index so large though.

Anyway, indexes are pretty useless in such a case.

Is the behavior the same if you have many distinct values?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com