--- Webb Sprague <webb.sprague@xxxxxxxxx> wrote: > > >>>> ...linear algebra ... > > >>> ... matrices and vectors . > > >> ...Especially if some GIST or similar index > could efficiently search > > >> for vectors "close" to other vectors... > > > I see a potential problem here, in terms of how one defines "close" or similitude. I think, though, practical answers can be found in examples of applying quantitative methods in some subdisciplines of biology. > > > Hmm. If I get some more interest on this list > (I need just one LAPACK > > > / BLAS hacker...), I will apply for a pgFoundry > project and appoint > > > myself head of the peanut gallery... > > Someone pointed to the potential utility of pl/R. I would be interested at least in learning about your assessment of the two (postgis and pl/r. Alas, I don't have decent date I could use to experiment with either (except possibly for time series analysis, which is a completely different kettle of fish. > > > and deal with a big database doing lots of > similarity-based searches (a > > 6'2" guy with light brown hair being similar to a > 6'1" guy with dark > > blond hair) - and am experimenting with modeling > some of the data as > > vectors in postgres. > > Well, I bet a good linear algebra library would > help. A lot. :) > If you're looking at similarity, and some practicality in the USE of quantitative procedures, you may want to look into the biogeography and numerical taxonomy literature, and to a lesser extent quantitative plant ecology. All three subdisciplines of biology have decades of experience, and copious literature, looking at similarity measures, and in my experience much more practical or pragmatic than the 'pure' biostatistics literature, and infinitely more practical than any theoretical statistical or mathematical literature I have seen (trust me, I have a bookcase full of this "stuff"). A good linear algebra library would be useful, but there are a lot of nonlinear analyses that would be of interest; and there are nonparametric, yet quantitative approaches that are of considerable interest in assessing similarity. I don't know of work looking at applying things like discriminant functions analysis or cluster analysis or any of the many ordination analyses that may be considered to searches in a database, but then I haven't looked at the question since I graduated. I am interested in the question, though, and would be interested in hearing about your experience on the question. If I can manage the time, I hope to start a project where I can store description data for specimens of plants and animals, use analyses including but not limited to ordination, clustering, discriminant functions, cannonical correlation, to create a structure for comparing them, and for identifying new specimens, or at a minimum, if the specimen is truly something unknown, learn what known specimens or groups thereof it is most similar to, and how it is different. I have managed to install pl/r, but I haven't had the time to figure out how best to analyze data stored in the database using it. In the data I Do have, it changes daily, and some of the tables are well over 100MB, so I am a bit worried about how well it can handle such an amount of data, and how long it would take. Cheers, Ted ---------------------------(end of broadcast)--------------------------- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to majordomo@xxxxxxxxxxxxxx so that your message can get through to the mailing list cleanly