Robert Haas wrote:
The individual boolean fields don't kill me and in terms of some of the application issues they're actually rather easy to code for.On Mon, Sep 7, 2009 at 8:51 PM, Karl Denninger<karl@xxxxxxxxxxxxx> wrote:Robert Haas wrote: On Sat, Sep 5, 2009 at 8:19 PM, Karl Denninger<karl@xxxxxxxxxxxxx> wrote: There was a previous thread and I referenced it. I don't have the other one in my email system any more to follow up to it. I give up; the attack-dog crowd has successfully driven me off. Ciao. Perhaps I'm biased by knowing some of the people involved, but I don't think anyone on this thread has been anything but polite. It would certainly be great if PostgreSQL could properly estimate the selectivity of expressions like this without resorting to nasty hacks, but it can't, and unfortunately, there's really no possibility of that changing any time soon. Even if someone implements a fix today, the soonest it will appear in a production release is June 2010. So, any suggestion for improvement is going to be in the form of suggesting that you modify the schema in some way. I know that's not really what you're looking for, but unfortunately it's the best we can do. As far as I can tell, it is not correct to say that you referenced the previous thread. I do not see any such reference. ...Robert I was asking about modifying the schema. The current schema is an integer being used as a bitmask. If the planner knows how to handle a type of "bit(X)" (and will at least FILTER rather than NESTED LOOP it on a select, as happens for an Integer used in this fashion), that change is easier than splitting it into individual boolean fields.Well, the first several replies seem to address that question - I think we all agree that won't help. I'm not sure what you mean by "at least FILTER rather than NESTED LOOP it on a select". However, typically, the time when you get a nested loop is when the planner believes that the loop will be executed very few times (in other words, the outer side will return very few rows). It probably isn't the case that the planner COULDN'T choose to execute the query in some other way; rather, the planner believes that the nested loop is faster because of a (mistaken) belief about how many rows the bitmap-criterion will actually match. All the suggestions you've gotten upthread are tricks to enable the planner to make a better estimate, which will hopefully cause it to choose a better plan. As a general statement, selectivity estimation problems are very painful to work around and often involve substantial application redesign. In all honesty, I think you've run across one of the easier variants. As painful as it is to hear the word easy applied to a problem that's killing you, there actually IS a good solution to this problem: use individual boolean fields. I know that's not what you want to do, but it's better than "sorry, you're hosed, no matter how you do this it ain't gonna work". And I do think there are a few in the archives that fall into that category. Good luck, and sorry for the bad news. ...Robert The problem with re-coding for them is extensibility (by those who install and administer the package); a mask leaves open lots of extra bits for "site-specific" use, where hard-coding booleans does not, and since the executable is a binary it instantly becomes a huge problem for everyone but me. It does appear, however, that a bitfield doesn't evaluate any differently than does an integer used with a mask, so there you have it..... it is what it is, and if I want this sort of selectivity in the search I have no choice. -- Karl |
begin:vcard fn:Karl Denninger n:Denninger;Karl email;internet:karl@xxxxxxxxxxxxx x-mozilla-html:TRUE version:2.1 end:vcard
-- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance