Hello,
I've been using PostgreSQL for a few years and mostly love it. Aside
from a few (perceived, anyway) annoying limitations in PL/PGSQL (which I
almost exclusively am using for db interaction), I'm very satisfied with it.
I ran across this problem several months back and decided to blow it off
back then, but now I'm really wanting to understand why it's happening,
if there's a work around, if I'm doing something wrong, and if it's just
a shortcoming / bug if the developers are aware of it and might
implement a fix some day.
I've already done a fair amount of testing (a total of hours, I'm sure)
and Googling around on it. I also read this FAQ on indexes:
http://www.postgresql.org/docs/faqs.FAQ.html#item4.6
I think my issue is not explained by any of the resources I reviewed.
To simplify my actual problem, I conducted a very simple test. First I
defined a table:
create table t (
c1 int primary key,
c2 int
)
with the single index (primary key). Then I filled the table with
100,000 rows using a quick pl/pgsql function (c1 = 1, 2, 3... 100,000).
Then I ran these two tests (with EXPLAIN in pgAdmin):
select * from t where c1 = 75000;
select * from t where ((0 is null) OR (c1 = 75000));
The first one properly uses the index on c1, the second does not.
Obviously, a human looking at the second one would realize it's
essentially identical to the first and properly evaluate the (0 is null)
part once (since it's immutable) and then ignore it for the rest of the
searching.
Now, I'm sure some of you might ask "why the hell are you doing that in
the first place?" I have a good reason. I write a lot of pl/pgsql
functions that are search functions with a list of *optional*
parameters. I don't know ahead of time whether a user will include on
or not. In MSSQL, what I'm able to do (with no obvious index problems
that I've seen) is add those all to the WHERE clause like this:
((vC1 IS NULL) OR (C1 = vC1)) AND
((vC2 IS NULL) OR (C2 = vC2)) ...
(here vC1 and vC2 represent variables passed into the pl/pgsql function).
So my question were basically asked at the beginning of this post: is
there another way to get the optimizer to understand what I'm trying to
do here? Is this a known problem? Is it working as preferred and
unlikely to change any time soon? Is there some setting I can hit
somewhere to make it work like I want?
The only solution that I've come up with so far is making all of my
statements that otherwise wouldn't have to be dynamic and then only
including criteria for ones that I really need (based on the tested
nullness of the variables), but I find that highly annoying and have run
into other problems as a result as well.
I'd appreciate any suggestions you might have to help resolve this.
Thank,
John Lawler