Re: Can I do better than this heapscan and sort?

Merlin Moncure <mmoncure@xxxxxxxxx> · Wed, 27 Jun 2012 11:49:02 -0500

On Wed, Jun 27, 2012 at 10:40 AM, Andy Halsall <halsall_andy@xxxxxxxxxxx> wrote:
>
>
>> Date: Tue, 26 Jun 2012 08:42:34 -0500
>> Subject: Re:  Can I do better than this heapscan and sort?
>> From: mmoncure@xxxxxxxxx
>> To: halsall_andy@xxxxxxxxxxx
>> CC: pgsql-performance@xxxxxxxxxxxxxx
>
>>
>> On Tue, Jun 26, 2012 at 8:36 AM, Merlin Moncure <mmoncure@xxxxxxxxx>
>> wrote:
>> > On Thu, Jun 21, 2012 at 3:07 PM, Andy Halsall <halsall_andy@xxxxxxxxxxx>
>> > wrote:
>> >> I have two tables node and relationship. Each relationship record
>> >> connects
>> >> two nodes and has an application keys (unfortunately named) that can be
>> >> used
>> >> by the application to look-up a relationship and get from one node to
>> >> the
>> >> other.
>> >>
>> >> My query uses a node id and a description of a relationship from the
>> >> node,
>> >> and must find the "next" relationship that the node has. It does this
>> >> by
>> >> finding all the relationships that could be "next", ordering them and
>> >> then
>> >> getting the first.
>> >>
>> >> Details are below but I end up with 6896 candidates for "next".
>> >>
>> >> If I'm reading the output correctly it takes 13.509 ms to apply the
>> >> filter
>> >> and another 7 ms or so to do the sort of the remaining 6896 nodes.
>> >>
>> >> Have tried many index combinations to try and improve the results. I
>> >> suspect
>> >> that with so many nodes to sort, postgresql will opt for heap scan
>> >> rather
>> >> than index. But why does it not use the IDX_order_sort_down_2 index for
>> >> the
>> >> sort?
>> >>
>> >> Thanks,
>> >> Andy
>> >>
>> >>
>> >> Details..........
>> >>
>> >> Version
>> >> -------
>> >> PostgreSQL 9.1.2 on x86_64-unknown-linux-gnu, compiled by gcc (GCC)
>> >> 4.5.2,
>> >> 64-bit
>> >>
>> >> Tables
>> >> ------
>> >> CREATE TABLE node (
>> >>  node_id      bigint NOT NULL,
>> >>  node_type    int4 NOT NULL,
>> >>  c_state      int4 NOT NULL,
>> >>  d_state      int4 NOT NULL,
>> >>  sort_key     bigint NOT NULL,
>> >>  permissions  bytea NOT NULL,
>> >>  audit        bytea  NOT NULL,
>> >>  pkg_id       bytea NULL,
>> >>  created      timestamp NOT NULL
>> >> );
>> >>
>> >> CREATE TABLE relationship (
>> >>  rel_id           bigint NOT NULL,
>> >>  rel_type         integer NOT NULL,
>> >>  s_t_n            bigint NOT NULL,
>> >>  t_s_n            bigint NOT NULL,
>> >>  state            integer NOT NULL,
>> >>  control          integer NOT NULL,
>> >>  sort_key         bigint NOT NULL,
>> >>  prime_key        bytea NULL,
>> >>  prime_key_len    integer NOT NULL,
>> >>  sec_key          bytea NULL,
>> >>  sec_key_len      integer NOT NULL,
>> >>  up_sort_key      bigint NOT NULL,
>> >>  up_prime_key     bytea NULL,
>> >>  up_prime_key_len integer NOT NULL,
>> >>  up_sec_key       bytea NULL,
>> >>  up_sec_key_len   integer NOT NULL,
>> >>  permissions      bytea NOT NULL,
>> >>  t_s_n_type       integer NOT NULL,
>> >>  created          timestamp NOT NULL
>> >> );
>> >>
>> >> Constraints
>> >> -----------
>> >> -- Primary keys
>> >> ALTER TABLE node ADD CONSTRAINT PK_node PRIMARY KEY (node_id);
>> >>
>> >> ALTER TABLE relationship ADD CONSTRAINT PK_relationship PRIMARY KEY
>> >> (rel_id);
>> >>
>> >> -- Foreign keys
>> >> ALTER TABLE relationship ADD CONSTRAINT FK_node_s FOREIGN KEY (s_t_n)
>> >> REFERENCES node (node_id);
>> >>
>> >> ALTER TABLE relationship ADD CONSTRAINT FK_node_n FOREIGN KEY (t_s_n)
>> >> REFERENCES node (node_id);
>> >>
>> >>
>> >> Indexes
>> >> -------
>> >> CREATE INDEX IDX_node_type ON node (node_type ASC) TABLESPACE
>> >> ds_appex_ts_10
>> >> ;
>> >> CREATE INDEX IDX_node_sort_key ON node (sort_key ASC) TABLESPACE
>> >> ds_appex_ts_10
>> >> ;
>> >> CREATE INDEX IDX_relationship_s_t_n ON relationship (s_t_n ASC)
>> >> TABLESPACE
>> >> ds_appex_ts_10
>> >> ;
>> >> CREATE INDEX IDX_relationship_t_s_n ON relationship (t_s_n ASC)
>> >> TABLESPACE
>> >> ds_appex_ts_10
>> >> ;
>> >> CREATE INDEX IDX_relationship_type ON relationship (rel_type ASC)
>> >> TABLESPACE
>> >> ds_appex_ts_10
>> >> ;
>> >> CREATE INDEX IDX_relationship_prime_key ON relationship (prime_key ASC)
>> >> TABLESPACE ds_appex_ts_10
>> >> ;
>> >> CREATE INDEX IDX_relationship_u_prime_key ON relationship (up_prime_key
>> >> ASC)
>> >> TABLESPACE ds_appex_ts_10
>> >> ;
>> >> CREATE INDEX IDX_relationship_sec_key ON relationship (sec_key ASC)
>> >> TABLESPACE ds_appex_ts_10
>> >> ;
>> >> CREATE INDEX IDX_order_first ON node(sort_key DESC, node_id DESC)
>> >> TABLESPACE
>> >> ds_appex_ts_10
>> >> ;
>> >> CREATE INDEX IDX_order_sort_down_1 ON relationship(sort_key DESC,
>> >> prime_key
>> >> ASC NULLS FIRST, sec_key ASC NULLS FIRST) TABLESPACE ds_appex_ts_10
>> >> ;
>> >> CREATE INDEX IDX_order_sort_down_2 ON relationship(sort_key DESC,
>> >> prime_key
>> >> ASC NULLS FIRST, sec_key DESC NULLS FIRST) TABLESPACE ds_appex_ts_10
>> >> ;
>> >> CREATE INDEX IDX_order_sort_up ON relationship(up_sort_key DESC,
>> >> up_prime_key ASC NULLS FIRST, up_sec_key ASC NULLS FIRST) TABLESPACE
>> >> ds_appex_ts_10
>> >> ;
>> >>
>> >> Query
>> >> -----
>> >> CREATE OR REPLACE FUNCTION sp_get_rel_sort_dup_sec_desc(in_rel_type1
>> >> integer, in_rel_type2 integer, in_node_type integer, in_own_guid
>> >> bigint,
>> >> in_prev_prime_key bytea, in_prev_prime_key_len integer, in_prev_sec_key
>> >> bytea, in_prev_sec_key_len integer, in_prev_sort_key bigint, in_ctrl
>> >> integer) RETURNS select_rel_holder AS
>> >> '
>> >> declare
>> >> h select_rel_holder%rowtype;
>> >>
>> >> begin
>> >>        SELECT INTO h  r.rel_id, r.t_s_n, r.rel_type, r.sort_key,
>> >>                       r.state,r.permissions, r.control,
>> >>                       r.prime_key, r.prime_key_len, r.sec_key,
>> >> r.sec_key_len,
>> >>                       r.up_prime_key, r.up_prime_key_len, r.up_sec_key,
>> >> r.up_sec_key_len
>> >>         FROM relationship r
>> >>         WHERE r.s_t_n = in_own_guid AND (r.rel_type = in_rel_type1 OR
>> >> r.rel_type = in_rel_type2)
>> >>         AND
>> >>         (
>> >>             (
>> >>                (
>> >>                   r.prime_key > in_prev_prime_key
>> >>                   OR
>> >>                   ( r.prime_key = in_prev_prime_key AND r.sec_key <
>> >> in_prev_sec_key)
>> >>                )
>> >>                AND
>> >>                r.sort_key = in_prev_sort_key
>> >>             )
>> >>
>> >>             OR
>> >>             r.sort_key < in_prev_sort_key
>> >>         )
>> >>         AND t_s_n_type = in_node_type
>> >>         AND r.control >= in_ctrl
>> >>         ORDER BY sort_key DESC, prime_key ASC NULLS FIRST, sec_key DESC
>> >> NULLS FIRST LIMIT 1;
>> >>         RETURN h;
>> >> end
>> >> '
>> >> language 'plpgsql' STABLE;
>> >>
>> >>
>> >> EXPLAIN ANALYZE output
>> >> -------------------------------
>> >>         Limit  (cost=48.90..48.90 rows=1 width=89) (actual
>> >> time=21.480..21.480 rows=1 loops=1)
>> >>           Output: rel_id, t_s_n, rel_type, sort_key, state,
>> >> permissions,
>> >> control, prime_key, prime_key_len, sec_key, sec_key_len, up_prime_key,
>> >> up_prime_key_l
>> >> en, up_sec_key, up_sec_key_len
>> >>
>> >>           ->  Sort  (cost=48.90..48.90 rows=1 width=89) (actual
>> >> time=21.479..21.479 rows=1 loops=1)
>> >>                 Output: rel_id, t_s_n, rel_type, sort_key, state,
>> >> permissions, control, prime_key, prime_key_len, sec_key, sec_key_len,
>> >> up_prime_key, up_prime
>> >> _key_len, up_sec_key, up_sec_key_len
>> >>                 Sort Key: r.sort_key, r.prime_key, r.sec_key
>> >>                 Sort Method: top-N heapsort  Memory: 25kB
>> >>
>> >>                 ->  Bitmap Heap Scan on public.relationship r
>> >> (cost=3.39..48.89 rows=1 width=89) (actual time=1.034..13.509 rows=6986
>> >> loops=1)
>> >>                       Output: rel_id, t_s_n, rel_type, sort_key, state,
>> >> permissions, control, prime_key, prime_key_len, sec_key, sec_key_len,
>> >> up_prime_key, up
>> >> _prime_key_len, up_sec_key, up_sec_key_len
>> >>                       Recheck Cond: (r.s_t_n = $4)
>> >>                       Filter: ((r.control >= $10) AND (r.t_s_n_type =
>> >> $3)
>> >> AND ((r.rel_type = $1) OR (r.rel_type = $2)) AND ((((r.prime_key > $5)
>> >> OR
>> >> ((r.prime_
>> >> key = $5) AND (r.sec_key < $7))) AND (r.sort_key = $9)) OR (r.sort_key
>> >> <
>> >> $9)))
>> >>
>> >>                       ->  Bitmap Index Scan on idx_relationship_s_t_n
>> >> (cost=0.00..3.39 rows=18 width=0) (actual time=0.951..0.951 rows=6989
>> >> loops=1)
>> >>                             Index Cond: (r.s_t_n = $4)
>> >
>> > Absolutely.  You need to learn and master row-wise comparison.  It was
>> > added for exactly this purpose :-).
>> >
>> > SELECT * FROM foo WHERE (a,b,c) > (a1,b1,c1) ORDER BY a,b,c LIMIT k;
>> >
>> > will be fully optimized if you have an index on a,b,c (a1,b1,c1 are
>> > the last ones you read off).  Be advised that if there is not a lot of
>> > cardinality on 'a', you may need to disable certain index plans to get
>> > a good plan in some cases.
>>
>> hm, one more point: I notice you are mixing ASC/DESC in the index
>> definition. Try to avoid doing that: it will make index based paging
>> of the table more difficult. If you have to, try transforming the
>> values so that you can index all the fields ASC or DESC. This will
>> also fit easier into row-wise comparisons strategy although it will
>> slow down insertion a bit.
>>
>> merlin
>
> Thanks Merlin. I wasn't aware of the row-wise comparison stuff. It's tricky
> in the case above though as I want to do (a > b) OR (a=b AND b < c).
>
> As it happens, the application needs to iterate the set of "next"
> relationships, which as I had it meant establishing a new set on each
> iteration (where new set = old set minus its first entry). This is what the
> filter was doing in the above query. This is too expensive (as ~7000
> iterations) and so have modified solution to cache the ordered set when
> first established.
>
> Even without the filter I still need to order the return as above. Still
> don't understand why it doesn't use the index for this.

postgres isn't smart enough to convert the complex boolean expressions
for multiple field set ordering into a multi-column index lookup
unless the row-wise comparison syntax is used.

where (a,b) > (a1,b1)
can be written as boolean:
where (a>a1) OR (a=a1 AND b>b1)
alternate form is:
where (a>=a1) AND (a>a1 OR b>b1)

where (a,b,c) > (a1,b1,c1)
can be written as boolean:
where (a>a1) OR (a=a1 AND b>b1) OR (a = b1 AND b1 AND c>c1)
alternate form is:
where (a>=a1) AND (a>a1 OR b>=b1) AND (a>a1 OR b>b1 OR c>=c1)

in either case using boolean construction postgres will only use an
index on 'a' if it exists, but will fully optimize the row-wise case
if it matches the index.  therefore, making your query faster using
row-wise is going to be an exercise of making an index matching your
set browsing that contains either all ASC or all DESC -- you can't mix
and match.   in order to do that some transformation of your values is
in order -- either directly on the table itself or with functional
indexes doing the transformation during index build.  For integers we
can do that by multiplying by -1.  other datatypes might have more
complicated constructions but it's usually doable.

all this is assuming btw that you can logically order your table with
a consistent ordering and you are trying to index lookups between some
two points withing that ordering (perhaps sliding that windows as you
crawl the table)

merlin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance