I had skimmed the presentation slides, but I hadn't looked that closely
because it appeared to be using cosine similarity metrics, and only
operated on matrices, neither of which are what I wanted.
On closer examination, I think I could explode my packed hash values to
a matrix. I'm not sure how the cosine similarity metric would work given
that the arrays would only contain the values either 0 or 1 (as my hash
value is fundamentally a integer of configurable length (you can alter
the hash size, I'm using 64 bits)).
I'll check out the code and poke it a bit.
I'll probably also move this discussion to the hackers mailing list (the
instructions say to ask here first, but Oleg suggested I go there, and I
generally agree).
Connor
On 10/9/2014 8:19 AM, Sergey Konoplev wrote:
On Thu, Oct 9, 2014 at 12:09 AM, Connor Wolf
<wolf@xxxxxxxxxxxxxxxxxxxxxxx> wrote:
I'm trying to implement a custom indexing system using the GiST index
framework, and I have a few questions.
Basically, I'm trying to implement a system that allows me to search across
a set of 64 bit integers by hamming distance. This is intended to be used in
searching for similar images, where the 64 bit integer is actually a phash
of an image, and similarity between two images is reflected in the hamming
distance between two integers.
Have you seen the smlar extension?
http://www.pgcon.org/2012/schedule/events/443.en.html
http://sigaev.ru/git/gitweb.cgi?p=smlar.git;a=blob;hb=HEAD;f=README
http://railsware.com/blog/2012/05/10/effective-similarity-search-in-postgresql/
Anyways, The appropriate approach here is to use something called a BK tree,
for which I've written some test code and I think I have a decent grip of
(my test code seems to work, in any event).
That said:
Is there a simple piece of example-code for implementing a GiST index for a
single index? I've been looking through the /contrib/ directory,
particularly the /contrib/btree_gist/ files, but it looks like 90% of the
complexity there is related to supporting all the column types Postgres has,
rather then actually tied to producing a functional index.
Once I have something compiling, how can I check to be sure that I'm
actually using the indexing module I created? I can enable my compiled
extension using `CREATE EXTENSION`, but is there an option for `CREATE INDEX
test_index ON testing USING gist (val);` that lets me specify *which* GiST
index is actually employed? Is this even a valid question?
This is probably something that's available in one of the system tables
somewhere, but I haven't had much luck with google finding out where.
Testing: What's the appropriate way to examine the generated tree structure
of the index? I certainly went through a few bugs with my test tree system
(and that was in python!). Is there any way to examine the index structure
for debugging purposes?
Also, it looks like `ereport()` is the proper way to emit debugging
information. Is this correct?
In that vein, is there any way to have information that is on the operations
of an entire query? Looking at the number of tree nodes touched for a scan
would be nice (and I would not be surprised if there is already a facility
for it).
Project code is here if anyone is interested, any help would be great. I
have very little idea what I'm doing.
Thanks,
Connor
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general