19.09.11 18:19, Robert Klemme написав(ла):
On Mon, Sep 19, 2011 at 4:04 PM, Merlin Moncure<mmoncure@xxxxxxxxx> wrote:
Postgres's hash index implementation used to be pretty horrible -- it
stored the pre-hashed datum in the index which, while making it easier
to do certain things, made it horribly slow, and, for all intents and
purposes, useless. Somewhat recently,a lot of work was put in to fix
that -- the index now packs the hash code only which made it
competitive with btree and superior for larger keys. However, certain
technical limitations like lack of WAL logging and uniqueness hold
hash indexing back from being used like it really should be. In cases
where I really *do* need hash indexing, I do it in userland.
create table foo
(
a_long_field text;
);
create index on foo(hash(a_long_field));
select * from foo where hash(a_long_field) = hash(some_value) and
a_long_field = some_value;
This technique works fine -- the main disadvantage is that enforcing
uniqueness is a PITA but since the standard index doesn't support it
either it's no great loss. I also have the option of getting
'uniqueness' and being able to skip the equality operation if I
sacrifice some performance and choose a strong digest. Until the hash
index issues are worked out, I submit that this remains the go-to
method to do this.
Is this approach (storing the hash code in a btree) really faster than
a regular btree index on "a_long_field"? And if so, for which kind of
data and load?
Actually sometimes the field in [potentially] so long, you can't use
regular b-tree because it won't fit in the page. Say, it is "text" type.
If you will create regular index, you will actually limit column value
size to few KB. I am using md5(text) indexes in this case coupled with
rather ugly queries (see above). Native support would be nice.
Best regards, Vitalii Tymchyshyn.
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance