Horribly slow query/ sequential scan

"Gregory S. Williamson" <gsw@xxxxxxxxxxxxxxxx> · Tue, 9 Jan 2007 03:55:56 -0800

This is a query migrated from postgres. In postgres it runs about 10,000 times *slower* than on informix on somewhat newer hardware. The problem is entirely due to the planner. This  PostgreSQL 8.1.4 on linux, 2 gigs of ram.

The table:
          Table "reporting.bill_rpt_work"
    Column     |         Type          | Modifiers
---------------+-----------------------+-----------
 report_id     | integer               |
 client_id     | character varying(10) |
 contract_id   | integer               | not null
 rate          | numeric               | not null
 appid         | character varying(10) | not null
 userid        | text                  | not null
 collection_id | integer               | not null
 client_name   | character varying(60) |
 use_sius      | integer               | not null
 is_subscribed | integer               | not null
 hits          | numeric               | not null
 sius          | numeric               | not null
 total_amnt    | numeric               | not null
 royalty_total | numeric               |
Indexes:
    "billrptw_ndx" UNIQUE, btree (report_id, client_id, contract_id, rate, appid, userid, collection_id)
    "billrpt_cntrct_ndx" btree (report_id, contract_id, client_id)
    "billrpt_collid_ndx" btree (report_id, collection_id, client_id, contract_id)
Foreign-key constraints:
    "$1" FOREIGN KEY (report_id) REFERENCES billing_reports(report_id)
    "$2" FOREIGN KEY (client_id) REFERENCES "work".clients(client_id)

The query:
explain analyze select
w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs,
sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits,
sum(w.sius) * w.rate AS BYIUS
from bill_rpt_work w, billing_reports b
where w.report_id in
(select b.report_id from billing_reports where b.report_s_date = '2006-09-30')
and (w.client_id = '227400001' or w.client_id = '2274000010')
group by 1,2,3
order by 1,2,3;
                                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------------------
------------------------------
 GroupAggregate  (cost=298061335.44..298259321.62 rows=26 width=58) (actual time=372213.673..372213.689 rows=2 loops=1)
   ->  Sort  (cost=298061335.44..298083333.83 rows=8799354 width=58) (actual time=372213.489..372213.503 rows=37 loops=1)
         Sort Key: w.appid, w.rate, w.is_subscribed
         ->  Nested Loop  (cost=0.00..296121313.45 rows=8799354 width=58) (actual time=286628.486..372213.053 rows=37 loops=1)
               Join Filter: (subplan)
               ->  Seq Scan on bill_rpt_work w  (cost=0.00..85703.20 rows=11238 width=62) (actual time=1.239..1736.746 rows=61020 loops=1)
                     Filter: (((client_id)::text = '227400001'::text) OR ((client_id)::text = '2274000010'::text))
               ->  Seq Scan on billing_reports b  (cost=0.00..29.66 rows=1566 width=8) (actual time=0.001..0.879 rows=1566 loops=61020)
               SubPlan
                 ->  Result  (cost=0.00..29.66 rows=1566 width=0) (actual time=0.000..0.002 rows=1 loops=95557320)
                       One-Time Filter: ($1 = '2006-09-30'::date)
                       ->  Seq Scan on billing_reports  (cost=0.00..29.66 rows=1566 width=0) (actual time=0.001..0.863 rows=1565 loops=61020)
 Total runtime: 372214.085 ms

Informix uses report id/client id as an index, thus eliminating a huge number of rows. The table has 2280545 rows currently; slightly fewer when the above analyze was run. Informix has about 5 times as much data.

select count(*) from bill_rpt_work where report_id in (select report_id from billing_reports where report_s_date = '2006-09-30') and (client_id = '227400001' or client_id = '2274000010');
 count
-------
    37
(1 row)

So scanning everything seems particularly senseless.

I had some success adding client id and report id to the initial select list, but that causes all sorts of problems in calling procedures that expect different data grouping.

Any suggestion would be welcome because this is a horrible show stopper.

Thanks,

Greg Williamson
DBA
GlobeXplorer LLC