Re: scoring differences between bitmasks

Ben <bench@xxxxxxxxxxxxxxx> · Sun, 2 Oct 2005 11:49:03 -0700

Just the number of bits, not which ones. Basically, the hamming  
distance.

On Oct 2, 2005, at 11:44 AM, Todd A. Cook wrote:

Hi,

It may be that I don't understand your problem. :)

Are you searching the table for the closest vector?  If so, is
"closeness" defined only as the number of bits that are different?
Or, do you need to know which bits as well?

-- todd

Ben wrote:

Hrm, I don't understand. Can you give me an example with some   
reasonably sized vectors?
On Oct 2, 2005, at 10:59 AM, Todd A. Cook wrote:

Hi,

Try breaking the vector into 4 bigint columns and building a  
multi- column
index, with index columns going from the most evenly distributed  
to  the
least.  Depending on the distribution of your data, you may only   
need 2
or 3 columns in the index.  If you can cluster the table in that   
order,
it should be really fast.  (This structure is a tabular form of  
a  linked
trie.)

-- todd

Ben wrote:

Yes, that's the straightforward way to do it. But given that  
my   vectors are 256 bits in length, and that I'm going to  
eventually  have  about 4 million of them to search through, I  
was hoping  greater minds  than mine had figured out how to do  
it faster, or  how compute some  kind of indexing....... somehow.

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match