Search Postgresql Archives

Re: scoring differences between bitmasks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Just the number of bits, not which ones. Basically, the hamming distance.

On Oct 2, 2005, at 11:44 AM, Todd A. Cook wrote:

Hi,

It may be that I don't understand your problem. :)

Are you searching the table for the closest vector?  If so, is
"closeness" defined only as the number of bits that are different?
Or, do you need to know which bits as well?

-- todd


Ben wrote:

Hrm, I don't understand. Can you give me an example with some reasonably sized vectors?
On Oct 2, 2005, at 10:59 AM, Todd A. Cook wrote:

Hi,

Try breaking the vector into 4 bigint columns and building a multi- column index, with index columns going from the most evenly distributed to the least. Depending on the distribution of your data, you may only need 2 or 3 columns in the index. If you can cluster the table in that order, it should be really fast. (This structure is a tabular form of a linked
trie.)

-- todd


Ben wrote:


Yes, that's the straightforward way to do it. But given that my vectors are 256 bits in length, and that I'm going to eventually have about 4 million of them to search through, I was hoping greater minds than mine had figured out how to do it faster, or how compute some kind of indexing....... somehow.





---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not
      match

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux