On Wed, Apr 13, 2011 at 10:16 PM, Tomas Vondra <tv@xxxxxxxx> wrote: > You mean Support Vector Classifiers? Interesting idea, although I don't > see how to apply that to query planning, especially with non-numeric > inputs. Could you share more details on that statistical system and how > do you think it could be applied in the pgsql world? Well, in my case, the data was a simple list of attributes. You either had them or not, and data was very sparse, so the task was to fill the missing bits. For that, what I did is take a training set of data, and each time I wanted to know the likelihood of having a certain attribute I would compute the conditional probability given the training data - conditional on a set of other data. So, for postgres, if I had an index over a few columns of booleans (yea, bare with me) (a,b,c,d), and I wanted to know the selectivity of "where a", IF i already accounted for "where b" then I'd pick my training data and count how many of those that have b have also a. So P(a if b). Of course, my application had to handle thousands of attributes, so I couldn't apply conditional distributions on everything, I'd pick the conditional part (if b) to something that selected an "appropriately sized" sample from my training data. All that's very expensive. So I thought... what about replacing that with an SVC - train an SVC or SVR model for a, taking b, c, d as parameters. I never had the oportunity to test the idea, but the SVC variant would probably be usable by postgres, since all you need to know is b, c, d, they don't need to be booleans, or scalars in fact, SVCs are very flexible. Unknown values could easily be compensated for, with some cleverness. The tough part is, of course, training the SVC, picking the kind of SVC to use, and storing it into stats tables during analyze. Oh, and hoping it doesn't make fatal mistakes. I know, the idea is very green, but it would be a fun project - cough GSoC ;-) -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance