Search Postgresql Archives

Re: [SQL] Curious problem of using BETWEEN with start and end being the same versus EQUALS '='

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for the quick and detailed response, Tom. 

Yes, I did add a redundant where clause with a restriction on b.date_id on the range queries. This appears to speed things up since it does an index scan on the b table before the merge join. 

We will get more intelligent on query generation (our system generates queries on the fly) to work around this problem. 
________________________________________

Venky Kandaswamy

Principal Engineer, Adchemy Inc.

925-200-7124

________________________________________
From: Tom Lane [tgl@xxxxxxxxxxxxx]
Sent: Tuesday, January 15, 2013 2:30 PM
To: Venky Kandaswamy
Cc: pgsql-general@xxxxxxxxxxxxxx; pgsql-sql@xxxxxxxxxxxxxx
Subject: Re: [SQL] Curious problem of using BETWEEN with start and end being the same versus EQUALS '='

Venky Kandaswamy <venky@xxxxxxxxxxx> writes:
>    On 9.1, I am running into a curious issue.

It's not very curious at all, or at least people on pgsql-performance
(the right list for this sort of question) would have figured it out
quickly.  You're getting a crummy plan because of a crummy row estimate.
When you do this:

>  WHERE a.date_id = 20120228

you get this:

> "                    ->  Index Scan using alps_agg_date_id on bi2003.alps_agg a  (cost=0.00..17870.00 rows=26292 width=1350) (actual time=0.047..142.383 rows=36132 loops=1)"
> "                          Output: a.date_id, a.page_group, a.page, a.int_alloc_type, a.componentset, a.adc_visit, upper((a.adc_visit)::text)"
> "                          Index Cond: (a.date_id = 20120228)"
> "                          Filter: ((a.page)::text = 'ddi_671'::text)"

26K estimated rows versus 36K actual isn't the greatest estimate in the
world, but it's plenty good enough.  But when you do this:

>  WHERE a.date_id BETWEEN 20120228 AND 20120228

you get this:

> "        ->  Index Scan using alps_agg_date_id on bi2003.alps_agg a  (cost=0.00..10.12 rows=1 width=1350)"
> "              Output: a.date_id, a.adc_visit, a.page_group, a.page, a.int_alloc_type, a.componentset, a.variation_tagset, a.page_instance"
> "              Index Cond: ((a.date_id >= 20120228) AND (a.date_id <= 20120228))"
> "              Filter: ((a.page)::text = 'ddi_671'::text)"

so the bogus estimate of only one row causes the planner to pick an
entirely different plan, which would probably be a great choice if there
were indeed only one such row, but with 36000 of them it's horrid.

The reason the row estimate is so crummy is that a zero-width interval
is an edge case for range estimates.  We've seen this before, although
usually it's not quite this bad.

There's been some talk of making the estimate for "x >= a AND x <= b"
always be at least as much as the estimate for "x = a", but this would
increase the cost of making the estimate by quite a bit, and make things
actually worse in some cases (in particular, if a > b then a nil
estimate is indeed the right thing).

You might look into whether queries formed like "date_id >= 20120228 AND
date_id < 20120229" give you more robust estimates at the edge cases.

BTW, I notice in your EXPLAIN results that the same range restriction
has been propagated to b.date_id:

> "        ->  Index Scan using event_agg_date_id on bi2003.event_agg b  (cost=0.00..10.27 rows=1 width=1694)"
> "              Output: b.date_id, b.vcset, b.eventcountset, b.eventvalueset"
> "              Index Cond: ((b.date_id >= 20120228) AND (b.date_id <= 20120228))"

I'd expect that to happen automatically for a simple equality
constraint, but not for a range constraint.  Did you do that manually
and not tell us about it?

                        regards, tom lane




-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux