Re: limit clause breaks query planner?

Guillaume Cottenceau <gc@xxxxxx> · Tue, 02 Sep 2008 15:55:33 +0200

"David West" <david.west 'at' cusppoint.com> writes:

> INFO:  "jbpm_taskinstance": moved 1374243 row versions, truncated 166156 to
> 140279 pages

nothing which would explain so much planning off :/

> Yep, the table is from the jboss jbpm (business process management) schema.

I've went to that kind of test then, but it didn't help much:

  create table foo ( bar character varying(255), baz character varying(255),
    id_ bigint NOT NULL,
    class_ character(1) NOT NULL,
    version_ integer NOT NULL,
    name_ character varying(255),
    description_ character varying(4000),
    create_ timestamp without time zone,
    start_ timestamp without time zone,
    end_ timestamp without time zone,
    duedate_ timestamp without time zone,
    priority_ integer,
    iscancelled_ boolean,
    issuspended_ boolean,
    isopen_ boolean,
    issignalling_ boolean,
    isblocking_ boolean,
    task_ bigint,
    token_ bigint,
    procinst_ bigint,
    swimlaninstance_ bigint,
    taskmgmtinstance_ bigint,
    processname_ character varying(255) );

  insert into foo ( select generate_series(0, 10000000) / 1000000, case when random() < 0.05 then 'Today Alcatel-Lucent has announced that Philippe Camus is appointed non-executive Chairman and Ben Verwaayen is appointed Chief Executive Officer.' else null end, 1, 'a', 1  );

  create index foobaz on foo(baz);
  create index foobar on foo(bar);
  analyze foo;

Estimated costs still look correct on my side:

  gc=# explain select * from foo where baz is null and bar in ('8') limit 15;
                                       QUERY PLAN                                     
  ------------------------------------------------------------------------------------
   Limit  (cost=0.00..0.46 rows=15 width=1795)
     ->  Index Scan using foobar on foo  (cost=0.00..26311.70 rows=860238 width=1795)
           Index Cond: ((bar)::text = '8'::text)
           Filter: (baz IS NULL)
  (4 rows)

  gc=# set enable_indexscan = off;
  SET
  gc=# explain select * from foo where baz is null and bar in ('8') limit 15;
                                QUERY PLAN                              
  ----------------------------------------------------------------------
   Limit  (cost=0.00..3.46 rows=15 width=1795)
     ->  Seq Scan on foo  (cost=0.00..198396.62 rows=860238 width=1795)
           Filter: ((baz IS NULL) AND ((bar)::text = '8'::text))
  (3 rows)

>>Btw, it would help if you could reproduce my test scenario and
>>see if PG uses "correctly" the indexscan. It is better to try on
>>your installation, to take care of any configuration/whatever
>>variation which may create your problem.
>
> I have tried your example and I get the same results as you.
>
> db=# explain select * from foo where baz is null and bar = '8' limit 15;
>
>                                     QUERY PLAN
>
> ----------------------------------------------------------------------------
> ----
> ---
>  Limit  (cost=0.00..0.53 rows=15 width=154)
>    ->  Index Scan using foobar on foo  (cost=0.00..33159.59 rows=934389
> width=15
> 4)
>          Index Cond: (bar = 8)
>          Filter: (baz IS NULL)
> (4 rows)
>
> db=# drop index foobar;
> DROP INDEX
> db=# explain select * from foo where baz is null and bar = '8' limit 15;
>
>                              QUERY PLAN
> ---------------------------------------------------------------------
>  Limit  (cost=0.00..2.87 rows=15 width=154)
>    ->  Seq Scan on foo  (cost=0.00..178593.35 rows=934389 width=154)
>          Filter: ((baz IS NULL) AND (bar = 8))
> (3 rows)
>
> It's choosing the index because of a cost of 0.53 vs a cost of 2.87 for
> sequential scan.  I wonder why in my real tables the index scan cost is
> higher than the sequential scan cost.  Perhaps because of the extra width of
> my rows?

You may try to crosscheck with the new test I've put upper, but
I'm skeptical :/

I think I've unfortunately more than reached my level of
incompetence on that subject, sorry I wasn't able to better
locate your problem :/

>>> From looking at the plans, it seems to be postgres is assuming it will 
>>> only
>>> have to sequentially scan 15 rows, which is not true in my case 
>>> because column B is not distributed randomly (nor will it be in 
>>> production).  Would
>>
>>Why do you say that? The explanation seems to rather tell that it
>>(correctly) assumes that the seqscan would bring up about 1M rows for the
> selected values of A and B, and then it will limit to 15 rows.
>
> I say that because the plan gives a really really low number (3.21) for the
> estimated cost after the limit on sequential scan:
>
> Select * from JBPM_TASKINSTANCE this_ where actorid_ is null and
> this_.POOLEDACTOR_ in ('21') limit 15
> "Limit  (cost=0.00..3.21 rows=15 width=128) (actual
> time=84133.211..84187.247 rows=15 loops=1)"
> "  ->  Seq Scan on jbpm_taskinstance this_  (cost=0.00..234725.85
> rows=1095365 width=128) (actual time=84133.205..84187.186 rows=15 loops=1)"
> "        Filter: ((actorid_ IS NULL) AND ((pooledactor_)::text =
> '21'::text))"
> "Total runtime: 84187.335 ms"
>
> It just seems to me it is not taking into account at all that it might have
> to scan thousands or millions of rows before it gets the 15 rows it needs.

Well, if your have 95% of NULL actorid_ and 10% for each value of
pooledactor_, then it makes sense to assume it will have to fetch
about 150 rows to find the 15 awaited ones...

In the end, if PG doesn't know about data distribution, its
behavior makes total sense to me: 150 rows of width=128 bytes
need only 3 disk pages, so it shouldn't be faster than with a
seqscan, theoretically; however, I am not sure then why on my
simple "foo" test it isn't using the same decision..

Btw, that should not solve your problem, but normally, to help PG
choose indexscan often enough, it's good to reduce
random_page_cost which is 4 by default (a high value for nowadays
servers), increase effective_cache_size to what's available on
your machine, and potentially the shared_buffers which normally
helps for a good deal of matters, performance-wise.

-- 
Guillaume Cottenceau, MNC Mobile News Channel SA, an Alcatel-Lucent Company
Av. de la Gare 10, 1003 Lausanne, Switzerland - direct +41 21 317 50 36