I create index on channel_id and data_id like your comment.
- Index: idx_d_channel_id2
-- DROP INDEX idx_d_channel_id2;
CREATE INDEX idx_d_channel_id2
ON sym_data
USING btree
(channel_id);
-- Index: idx_d_channel_id3
-- DROP INDEX idx_d_channel_id3;
CREATE INDEX idx_d_channel_id3
ON sym_data
USING btree
(data_id)
WHERE channel_id::text = 'sale_transaction'::text;
-- Index: idx_d_channel_id4
-- DROP INDEX idx_d_channel_id4;
CREATE INDEX idx_d_channel_id4
ON sym_data
USING btree
(data_id)
WHERE channel_id::text = 'item'::text;
Here is new explan analyze
explain analyze select d.data_id, d.table_name, d.event_type, d.row_data, d.pk_data, d.old_data, d.create_time, d.trigger_hist_id, d.channel_id, d.transaction_id, d.source_node_id, d.external_data, '' from sym_data d inner join sym_data_gap g on g.status='GP' and d.data_id between g.start_id and g.end_id where d.channel_id='sale_transaction' order by d.data_id asc;
Nested Loop (cost=0.00..1512979014.35 rows=26268463088 width=1401) (actual time=25741.704..7650979.311 rows=2764140 loops=1)
-> Index Scan using idx_d_channel_id3 on sym_data d (cost=0.00..1781979.40 rows=3117384 width=1401) (actual time=83.718..55126.002 rows=3124631 loops=1)
-> Index Scan using sym_data_gap_pkey on sym_data_gap g (cost=0.00..358.37 rows=8426 width=8) (actual time=2.428..2.429 rows=1 loops=3124631)
Index Cond: ((d.data_id >= g.start_id) AND (d.data_id <= g.end_id))
Filter: (g.status = 'GP'::bpchar)
Total runtime: 7651803.073 ms
But query performance don't change.
Please help me.
Tuan Hoang ANh
On Sat, Aug 6, 2011 at 12:20 AM, Kevin Grittner <Kevin.Grittner@xxxxxxxxxxxx> wrote:
http://www.postgresql.org/support/versioning
> 6GB ram
> work_mem = 2097151
I think that has the potential to push you into swapping:
cc=> set work_mem = 2097151;
SET
cc=> show work_mem;
work_mem
-----------
2097151kB
(1 row)
That's 2GB, and that much can be allocated, potentially several
times, per connection.
This index scan is going to randomly access all tuples in the
> -> Index Scan using sym_data_pkey on sym_data d
> (cost=0.00..637148.72 rows=3129103 width=1403)
> (actual time=71.989..55643.665 rows=3124631 loops=1)
> Filter: ((channel_id)::text = 'sale_transaction'::text)
table's heap. That is probably going to be much slower than a
sequential scan. It is apparently choosing this index to avoid a
sort, because of the mis-estimation on the number of rows. Is it
critical that the rows be returned in that order? If not, you might
see much faster performance by leaving off the ORDER BY clause so
that it can use the seqscan.
You could potentially make queries like this much faster by indexing
on channel_id, or by indexing on data_id WHERE channel_id =
'sale_transaction'..
You could also set up optimization barriers with clever use of a CTE
or an OFFSET 0 to force it to use a seqscan followed by a sort, but
I would look at the other options first.
-Kevin