Re: Planner not using column limit specified for one column for another column equal to first

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





16 квітня 2010 р. 11:31 Yeb Havinga <yebhavinga@xxxxxxxxx> написав:
Віталій Тимчишин wrote:
Hello.

I have a query that performs very poor because there is a limit on join column that is not applied to other columns:

select * from company this_ left outer join company_tag this_1_ on this_.id=this_1_.company_id left outer join company_measures companymea2_ on this_.id=companymea2_.company_id left outer join company_descr ces3_ on this_.id=ces3_.company_id where this_1_.tag_id = 7 and this_.id>50000000 and this_1_.company_id>50000000
order by this_.id asc limit 1000;

(plan1.txt)
Total runtime: 7794.692 ms

At the same time if I apply the limit (>50000000) to other columns in query itself it works like a charm:

select * from company this_ left outer join company_tag this_1_ on this_.id=this_1_.company_id left outer join company_measures companymea2_ on this_.id=companymea2_.company_id left outer join company_descr ces3_ on this_.id=ces3_.company_id where this_1_.tag_id = 7 and this_.id>50000000 and this_1_.company_id>50000000
and companymea2_.company_id>50000000 and ces3_.company_id>50000000
order by this_.id asc limit 1000;

(plan2.txt)
Total runtime: 27.547 ms

I've thought and someone in this list've told me that this should be done automatically.
Yes, if you have in a query a=b and b=c, then the optimizer figures out that a=c as well. (a,b and c are then member of the same equivalence class).

However both queries are not the same, since the joins you're using are outer joins. In the first it's possible that records are returned for company records with no matching ces3_ records, the ces3_ records is null in that case. In the second query no NULL ces3_ information may be returned.

OK, but when I move limit to join condition the query is still fast:

select * from company this_ left outer join company_tag this_1_ on this_.id=this_1_.company_id 
left outer join company_measures companymea2_ on this_.id=companymea2_.company_id and companymea2_.company_id>50000000
left outer join company_descr ces3_ on this_.id=ces3_.company_id and ces3_.company_id>50000000
where this_1_.tag_id = 7 and this_.id>50000000 
and this_1_.company_id>50000000
order by this_.id asc limit 1000;

(plan3.txt),
Total runtime: 26.327 ms
BTW: Changing slow query to inner joins do not make it fast
 

Another thing is it seems that the number of rows guessed is far off from the actual number of rows, is the number 5000000 artificial or are you're statistics old or too small histogram/mcv's?

Nope, I suppose this is because of limit. If I remove the limit, the estimations are quite correct. There are ~6 millions of row in each table.

Limit  (cost=293.40..1028.60 rows=1000 width=1209) (actual time=0.128..24.751 rows=1000 loops=1)
  ->  Merge Left Join  (cost=293.40..4197731.11 rows=5709243 width=1209) (actual time=0.124..21.968 rows=1000 loops=1)
        Merge Cond: (this_.id = (companymea2_.company_id)::bigint)
        ->  Merge Left Join  (cost=246.59..3681230.10 rows=5709243 width=1141) (actual time=0.099..15.284 rows=1000 loops=1)
              Merge Cond: (this_.id = (ces3_.company_id)::bigint)
              ->  Merge Join  (cost=37.87..2435536.00 rows=5709243 width=639) (actual time=0.074..8.487 rows=1000 loops=1)
                    Merge Cond: (this_.id = (this_1_.company_id)::bigint)
                    ->  Index Scan using pk_comp_m on company this_  (cost=0.00..2139324.53 rows=6886598 width=623) (actual time=0.037..1.860 rows=1054 loops=1)
                          Index Cond: (id > 50000000)
                    ->  Index Scan using company_tag_company_id_idx on company_tag this_1_  (cost=0.00..207678.85 rows=5709243 width=16) (actual time=0.027..1.758 rows=1000 loops=1)
                          Index Cond: ((this_1_.company_id)::bigint > 50000000)
                          Filter: ((this_1_.tag_id)::bigint = 7)
              ->  Index Scan using cws_company_descr_unique on company_descr ces3_  (cost=0.00..1169511.92 rows=5757068 width=502) (actual time=0.020..1.788 rows=1054 loops=1)
                    Index Cond: ((ces3_.company_id)::bigint > 50000000)
        ->  Index Scan using pk_comp_ms on company_measures companymea2_  (cost=0.00..440945.79 rows=5688885 width=68) (actual time=0.019..1.729 rows=1054 loops=1)
              Index Cond: ((companymea2_.company_id)::bigint > 50000000)
Total runtime: 26.327 ms

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux