Re: arrays of floating point numbers / linear algebra operations into the DB

Ron Mayer <rm_pg@xxxxxxxxxxxxxxxxxxxxxxx> · Fri, 01 Feb 2008 23:45:49 -0800

Ted Byers wrote:
> --- Webb Sprague <webb.sprague@xxxxxxxxx> wrote:
>>>>>>> ...linear algebra ...
>>>>>> ... matrices and vectors .
>>>>> ...Especially if some GIST or similar index
>> could efficiently search
>>>>> for vectors "close" to other vectors...
> 
> I see a potential problem here, in terms of how one
> defines "close" or similitude.  I think, though,
> practical answers can be found in examples of applying
> quantitative methods in some subdisciplines of
> biology.

Even if the best GIST can give is selecting vectors
constrained by a n-dimensional bounding box (in much
the same way it does for postgis) it can help a lot.

Then your application can select everything in a
conservatively large box, and use whatever it's
favorite metric is to narrow the data further.

> Someone pointed to the potential utility of pl/R.  I
> would be interested at least in learning about your
> assessment of the two (postgis and pl/r.  

I think they'd be complimentary.

IMHO if a native postgresql datatype could allow indexes
to narrow the amount of data that needs to be processed;
it'd be great to do the rest of the work using R (though
we're perhaps foolishly using something else in our
application).

> If you're looking at similarity, and some practicality
> in the USE of quantitative procedures, you may want to
> look into the biogeography and numerical taxonomy
> literature, and to a lesser extent quantitative plant
> ecology.

Indeed.  Though the current literature I wade through
is crime analysis.   Ideally our software tries to
match witness descriptions of "black SUV" as substantially
similar to "dark green 4runner" - especially when seen
at night; and understand that a "Glock 19" is a
"9mm handgun" and that a "9mm handgun" might be but
isn't necessarily a "Glock 19".  Also - two person
records with a tattoo of a cross might contribute a little
similarity -- but two records with tattoos of Darth
Maul / Maori face art contribute much more to the
similarity scores because they're so much more distinctive.

And of course there are entire companies focused on similarity
metrics for fingerprints, DNA, and names (Bill is arguably more
similar to William than to Bell).

Any magic indexes to help such queries would be
very cool; but so far we do most of it in our
application logic.

> A good linear algebra library would be useful, but
> there are a lot of nonlinear analyses that would be of
> interest; and there are nonparametric, yet
> quantitative approaches that are of considerable
> interest in assessing similarity.

True, many things don't map cleanly to linear algebra;
but they would be quite useful.

> If I can manage the time, I hope to start a project
> where I can store description data for specimens of
> plants and animals, use analyses including but not
> limited to ordination, clustering, discriminant
> functions, cannonical correlation, to create a
> structure for comparing them, and for identifying new
> specimens, or at a minimum, if the specimen is truly
> something unknown, learn what known specimens or
> groups thereof it is most similar to, and how it is
> different.

Very cool.  Sounds like a somewhat similar issue to mine.

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq