Re: slow bitmap heap scans on pg 9.2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 13-04-12 09:20 PM, Jeff Janes wrote:
On Thursday, April 11, 2013, Steve Singer wrote:


    I think the reason why it is picking the hash join based plans is
    because of

    Index Scan using table_b_1_ptid_orgid_ym_unq on table_b_1 b
    (cost=0.00..503.86 rows=1 width=10) (actual time=0.016..0.017 rows=1
    loops=414249)
                         Index Cond: ((a.id <http://a.id> = a_id) AND
    (organization_id = 2) AND (year = 2013) AND (month = 3))
                         Filter: (product_id = 1)



Trying to reason about how the planner estimates costs for the inner
side of nested loops makes my head hurt.
So before doing that, could you run explain (analyze,buffers) on both of
these much simpler (but hopefully morally equivalent to this planner
node) sql:

select * from table_b_1_b where a_id = <some plausible value> and
organization_id=2 and year=2013 and month=3

select * from table_b_1_b where a_id = <some plausible value> and
organization_id=2 and year=2013 and month=3 and product_id=1


table_b_1 is a partition of table_b on product_id so when querying table table_b_1 directly they are equivalent

explain (analyze,buffers) select * FROM table_b_1 where a_id=1128944 and organization_id=2 and year=2013 and month=3;

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------
-----------
Index Scan using table_b_1_ptid_orgid_ym_unq on table_b_1 (cost=0.00..50.73 rows=1 width=56) (actual time=60.328..60.330 rows=
1 loops=1)
Index Cond: ((a_id = 1128944) AND (organization_id = 2) AND (year = 2013) AND (month = 3))
   Buffers: shared hit=1 read=5
 Total runtime: 60.378 ms
(4 rows)


The plans are the same if I do or do not specify the product_id in the where clause (if I query the parent table and neglect to query the query clause it of course queries all the other partitions)





Of particular interest here is whether the estimate of 1 row is due to
the specificity of the filter, or if the index clauses alone are
specific enough to drive that estimate.  (If you get many rows without
the product_id filter, that would explain the high estimate.).

The index clauses alone , we normally expect 1 row back for a query like that.



Please run with the default cost parameters, or if you can't get the
right plan with the defaults, specify what the used parameters were.

indexTotalCost += index->pages * spc_random_page_cost / 100000.0;

Is driving my high costs on the inner loop. The index has 2-5 million pages depending on the partition . If I run this against 9.2.2 with / 10000.0 the estimate is even higher.

If I try this with this with the

*indexTotalCost += log(1.0 + index->pages / 10000.0) * spc_random_page_cost;

from 9.3 and I play I can make this work I can it pick the plan on some partitions with product_id=2 but not product_id=1. If I remove the fudge-factor cost adjustment line I get the nested-loop plan always.

Breaking the index into smaller partial indexes for each year seems to be giving me the plans I want with random_page_cost=2 (I might also try partial indexes on the month).

Even with the 9.3 log based fudge-factor we are seeing the fudge-factor being big enough so that the planner is picking a table scan over the index. A lot of loop iterations can be satisfied by cached pages of the index the fudge-factor doesn't really account for this.





Cheers,

Jeff



--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux