Re: Query optimization using order by and limit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



* Tom Lane (tgl@xxxxxxxxxxxxx) wrote:
> Yeah, it's spending quite a lot of time finding the first matching row
> in each child table.  I'm curious why that is though; are the child
> tables not set up with nonoverlapping firstloadtime ranges?

They are set up w/ nonoverlapping firstloadtime ranges, using CHECK
constraints such as:

"osmoduleloads_2011_09_14_firstloadtime_check" CHECK (firstloadtime >=
129604464000000000::bigint::numeric AND firstloadtime <
129605328000000000::bigint::numeric)

The issue here is that the query is saying "Give me the first 150
records with this host_id in this week-long range".  PG happily
eliminates all the tables that are outside of the week-long range during
constraint exclusion.  After that, however, it hunts down the earliest
records (which matches 'host_id') from each child table.  Sure, from
each table there's a record in the week-long range with the host_id that
matches.  What PG doesn't realize is that it can stop after pulling the
150 records from the most recent table (and flipping the direction of
the query or the tables doesn't help- PG still pulls a record from each
table).

> > What would be great is if PG would realize that the CHECK constraints
> > prevent earlier records from being in these earlier tables,
> 
> The explain shows that that isn't the case, because it *is* finding at
> least one candidate row in each table.  It's just running quite far into
> the firstloadtime sequence to do it.

My point above is that the CHECK constraints ensure an ordering which
could be leveraged to use the latest table first and then stop if enough
tuples are returned (or immediately go to the next table), without ever
considering the other tables.  I'm not looking for PG to eliminate those
other tables for consideration in all cases- if the limit is large
enough, it may get all the way down to them.  I'm pretty sure this isn't
something which PG does today and I don't expect teaching it to do this
to be trivial, but it certainly would be nice as this strikes me as a
very common use-case.

> If you're stuck with this table arrangement, one thing that would help
> is a two-column index on (host_guid, firstloadtime) on each child table.

Agreed, I mentioned this to the OP previously and it's on his list of
things to try.

	Thanks,
		
		Stephen

Attachment: signature.asc
Description: Digital signature


[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux