On Apr 20, 12:07 pm, and...@xxxxxxxxxxxx (Andrew Lazarus) wrote: > I have a table with 2.5 million real[] arrays. (They are points in a > time series.) Given a new array X, I'd like to find, say, the 25 > closest to X in some sense--for simplification, let's just say in the > usualvectornorm. Speed is critical here, and everything I have tried > has been too slow. > > I imported the cube contrib package, and I tried creating an index on > a cube of the last 6 elements, which are the most important. Then I > tested the 2.5MM rows for being contained within a tolerance of the > last 6 elements of X, +/- 0.1 in each coordinate, figuring that would > be an indexed search (which I CLUSTERED on). I then ran the sort on > this smaller set. The index was used, but it was still too slow. I > also tried creating new columns with rounded int2 values of the last 6 > coordinates and made a multicolumn index. > > For each X the search is taking about 4-15 seconds which is above my > target at least one order of magnitude. Absolute numbers are dependent > on my hardware and settings, and some of this can be addressed with > configuration tweaks, etc., but first I think I need to know the > optimum data structure/indexing strategy. > > Is anyone on the list experienced with this sort of issue? > > Thanks. > Andrew Lazarus and...@xxxxxxxxxxxx > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq Having worked in high dimensional spaces a lot in my career I think you'll find that there are mathematical limits in terms of speed. In practical terms, a seq_scan will be unavoidable since on first approximation you are limited to doing an exhaustive search in 101-dimensional space unless you make approximations or dimensionality reductions of some kind. Read up on the Curse of Dimensionality: http://en.wikipedia.org/wiki/Curse_of_dimensionality Have you considered dimension reduction techniques such as Singular Value Decomposition, Principal Components Analysis, etc.?