Re: Query planner unaware of possibly best plan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tom Lane <tgl@xxxxxxxxxxxxx> wrote:

> The point here is that you've repeated the same example N times
> without actually making a case that it's interesting to support.  We
> have to think about the intellectual complexity that would be added
> to the planner to support this case, and the cycles that would be
> expended on every query (and wasted, for most queries) on trying to
> detect whether the case applies.  If it were simple and cheap to do,
> these arguments wouldn't hold much weight, but it doesn't look to
> me like either is the case.
> 
> Another problem is that it's not clear there's much to be gained.
> Avoiding the sort step is only interesting if the query produces so
> many rows that a sort would be expensive ... but if that's the case, it
> seems unlikely that a nestloop indexscan plan would be the best
> choice anyway.
> 
> So basically this looks like a lot of work for a narrow and questionable
> gain.  If you want it to happen you need to convince people that it's
> easier and more useful than it looks.
> 
> 			regards, tom lane




Well, I probably won't tell you that it's easy to do, because you know 
the planner far far better than I do -- I'm just a user who knows almost 
nothing about what's happening inside it. I just thought that if the 
planner is examining a plan, where the outer scan is using a unique 
index with all it's columns AND doing a nestloop join with the inner scan 
returning the rows in some order, then the planner could say "this is 
already ordered by (outer.unique, inner.some_order)" and wouldn't 
make a sort at the end. Of course this is far from perfect, because what 
about three/four/more table joins... maybe that can be derived from 
this, I don't really know now.
Another situation that could be optimized this way is if I write "ORDER 
BY outer.idxcol1, outer.idxcol2, outer.id, inner.something" where
 + there is a non-unique index on (outer.idxcol1, outer.idxcol2),
 + outer.id is a PRI KEY,
 + there is an index on (inner.outer_id, inner.something) too
but that's getting really complicated. Of course if the simpler case 
would be working, then I could create a unique index on (outer.idxcol1, 
outer.idxcol2, outer.id) and this would run optimized too.

I think what's common is:
 + outer scan produces uniquely sorted rows
 + nested loop
 + inner scan produces sorted rows
 + ORDER BY says outer.unique_sort, inner.sort
If this is the situation, the sort could be done separately. Even like:

->  Nested Loop
  -> Sort by outer.unique
     ->  Scan producing the required rows from outer
  -> Sort by inner.something
     ->  Scan producing the required rows from inner
         Filter or Index Cond for the join

And this is good for any query that needs data from two tables, but 
wants to get the joined rows so that all rows for a single outer-row 
are "packed together" (I mean they do not mix).


Now about if it's worth it. I tried a query on real data, with lots of rows.
Tables involved:
- threads: ~200K rows
- msgs: ~8M rows
I wanted to see the last msgs from the last threads, but did not want 
to mix them. It's like PgSQL's mailing archive viewed by threads. I 
wanted them paged, not all 8M msgs at once (LIMIT/OFFSET). Query is:

SELECT *
FROM threads AS thr JOIN msgs AS msg ON msg.thrid = thr.id
WHERE thr.forid = 1
ORDER BY thr.id DESC, msg.msgid DESC
LIMIT 100

thr.forid is a FKEY to forums. Say forum 1 is the pgsql-performance list.
Table msgs has only a PRI KEY: (thrid, msgid).
Table threads has a PRI KEY: (id) and a unique index: (forid, id).
First time I ran the query with seqscan, bitmapscan, hashjoin, 
mergejoin disabled (just to get the nested loop plan with the needless 
sort). Second time I ran it without disabling anything, but I modified 
ORDER BY to only "thr.id DESC" (deleted "msg.msgid DESC"), to give me 
the same plan as before, but without the sort.
PSQL input and output attached.

I think the 5000x speed would be worth it.


Regards,
Denes Daniel
---------------------------------




Játssz a megújult Kvízparton! Válassz több mint 400 kvíz közül minden témában!
________________________________________________________
http://cthandler.adverticum.net/?cturl=http%3A%2F%2Fkvizpart.hu%2F?fmalja

-- =========== Test #1 =========== --

BEGIN;
SET enable_seqscan TO false; SET enable_bitmapscan TO false; SET enable_hashjoin TO false; SET enable_mergejoin TO false;
EXPLAIN ANALYZE SELECT * FROM threads AS thr JOIN msgs AS msg ON msg.thrid = thr.id WHERE thr.forid = 1 ORDER BY thr.id DESC, msg.msgid DESC LIMIT 100;
ROLLBACK;

                                                                            QUERY PLAN                                                                   
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=64537577.92..64537677.92 rows=100 width=447) (actual time=61923.813..61924.379 rows=100 loops=1)
   ->  Sort  (cost=64537577.92..64633839.92 rows=96262 width=447) (actual time=61923.804..61924.038 rows=100 loops=1)
         Sort Key: thr.id, msg.msgid
         ->  Nested Loop  (cost=0.00..61339121.37 rows=96262 width=447) (actual time=0.248..40366.113 rows=431350 loops=1)
               ->  Index Scan using threads_uni_forid_id on threads thr  (cost=0.00..14229.22 rows=2850 width=176) (actual time=0.127..5246.470 rows=8339 loops=1)
                     Index Cond: (forid = 1)
               ->  Index Scan using msgs_pkey on msgs msg  (cost=0.00..12595.51 rows=2974 width=271) (actual time=1.544..4.000 rows=52 loops=8339)
                     Index Cond: (msg.thrid = "outer".id)
 Total runtime: 62079.452 ms
(9 rows)






-- =========== Test #2 =========== --

EXPLAIN ANALYZE SELECT * FROM threads AS thr JOIN msgs AS msg ON msg.thrid = thr.id WHERE thr.forid = 1 ORDER BY thr.id DESC LIMIT 100;

                                                                             QUERY PLAN                                                                    
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..63721.01 rows=100 width=447) (actual time=9.579..10.775 rows=100 loops=1)
   ->  Nested Loop  (cost=0.00..61339121.37 rows=96262 width=447) (actual time=9.574..10.438 rows=100 loops=1)
         ->  Index Scan Backward using threads_uni_forid_id on threads thr  (cost=0.00..14229.22 rows=2850 width=176) (actual time=0.082..0.145 rows=4 loops=1)
               Index Cond: (forid = 1)
         ->  Index Scan using msgs_pkey on msgs msg  (cost=0.00..12595.51 rows=2974 width=271) (actual time=2.399..2.475 rows=25 loops=4)
               Index Cond: (msg.thrid = "outer".id)
 Total runtime: 11.071 ms
(7 rows)

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux