Re: PostgreSQL 9.0.1 on Windows performance tunning help please

tuanhoanganh <hatuan05@xxxxxxxxx> · Sat, 6 Aug 2011 09:16:12 +0700

Thanks for your help.I create index on channel_id and data_id like your comment. 

- Index: idx_d_channel_id2

-- DROP INDEX idx_d_channel_id2;

CREATE INDEX idx_d_channel_id2
  ON sym_data
  USING btree
  (channel_id);

-- Index: idx_d_channel_id3

-- DROP INDEX idx_d_channel_id3;

CREATE INDEX idx_d_channel_id3
  ON sym_data
  USING btree
  (data_id)
  WHERE channel_id::text = 'sale_transaction'::text;

-- Index: idx_d_channel_id4

-- DROP INDEX idx_d_channel_id4;

CREATE INDEX idx_d_channel_id4
  ON sym_data
  USING btree
  (data_id)
  WHERE channel_id::text = 'item'::text;

Here is new explan analyze

explain analyze select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, '' from sym_data d inner join sym_data_gap g on g.status='GP' and d.data_id between g.start_id and g.end_id where d.channel_id='sale_transaction' order by d.data_id asc;

Nested Loop  (cost=0.00..1512979014.35 rows=26268463088 width=1401) (actual time=25741.704..7650979.311 rows=2764140 loops=1)
  ->  Index Scan using idx_d_channel_id3 on sym_data d  (cost=0.00..1781979.40 rows=3117384 width=1401) (actual time=83.718..55126.002 rows=3124631 loops=1)
  ->  Index Scan using sym_data_gap_pkey on sym_data_gap g  (cost=0.00..358.37 rows=8426 width=8) (actual time=2.428..2.429 rows=1 loops=3124631)
        Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
        Filter: (g.status = 'GP'::bpchar)
Total runtime: 7651803.073 ms

But query performance don't change.
Please help me.

Tuan Hoang ANh

On Sat, Aug 6, 2011 at 12:20 AM, Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx> wrote:

tuanhoanganh <hatuan05@xxxxxxxxx> wrote:

> I have postgresql 9.0.1

http://www.postgresql.org/support/versioning

> 6GB ram

> work_mem = 2097151

I think that has the potential to push you into swapping:

cc=> set work_mem = 2097151;

SET

cc=> show work_mem;

 work_mem

-----------

 2097151kB

(1 row)

That's 2GB, and that much can be allocated, potentially several

times, per connection.

>   ->  Index Scan using sym_data_pkey on sym_data d

>           (cost=0.00..637148.72 rows=3129103 width=1403)

>           (actual time=71.989..55643.665 rows=3124631 loops=1)

>         Filter: ((channel_id)::text = 'sale_transaction'::text)

This index scan is going to randomly access all tuples in the

table's heap. That is probably going to be much slower than a

sequential scan. It is apparently choosing this index to avoid a

sort, because of the mis-estimation on the number of rows.  Is it

critical that the rows be returned in that order?  If not, you might

see much faster performance by leaving off the ORDER BY clause so

that it can use the seqscan.

You could potentially make queries like this much faster by indexing

on channel_id, or by indexing on data_id WHERE channel_id =

'sale_transaction'..

You could also set up optimization barriers with clever use of a CTE

or an OFFSET 0 to force it to use a seqscan followed by a sort, but

I would look at the other options first.

-Kevin