Re: Apply LIMIT when computation is logically irrelevant

David Rowley <dgrowleyml@xxxxxxxxx> · Tue, 7 Jul 2020 09:48:00 +1200

On Tue, 7 Jul 2020 at 09:03, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote:
>
> On Mon, 6 Jul 2020 at 21:49, David Rowley <dgrowleyml@xxxxxxxxx> wrote:
>>
>> On Tue, 7 Jul 2020 at 00:43, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote:
>> >
>> > On Mon, 6 Jul 2020 at 12:37, Robins Tharakan <tharakan@xxxxxxxxx> wrote:
>> >> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class);
>> >> vs.
>> >> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); -- Faster
>> >
>> >
>> > Those two queries aren't logically equivalent, so you can't apply the LIMIT 1 as an optimization.
>> >
>> > First query returns lots of random rows, the second query returns just one random row.
>>
>> I think the idea here is that because the target list contains only
>> constants that pulling additional rows from the query after the first
>> one will just be a duplicate row and never add any rows after the
>> UNION is processed.
>
>
> OK, I see. Are you saying you think it's a worthwhile optimization to autodetect?

I guess it's all about how much effort would be involved to detect
such cases vs how likely it is that we're going to speed up someone's
query.  I imagine it's not much effort to detect this, but also, this
is the first time I recall seeing this mentioned, so perhaps that
means not many people would be rewarded by making such a change. (It
does seem like quite a strange way to express the query.)

There is currently a patch floating around that implements UniqueKeys
which allows RelOptInfos to be tagged with the properties that they're
unique on.  With the current design of that patch, there is no way to
say "this relation *only* has duplicate rows".  Perhaps some design
tweaks there can make detecting this case cheaper in terms of CPU
effort during planning, and perhaps also in terms of how much code it
would take to make it work.  I'll try to keep this in mind when I
review that work soon.   If we were to start adding optimisations for
cases such as this, then I'd rather they were done in some
general-purpose way that just makes them "just work" rather than
adding special cases around the codebase that trigger some special
behaviour.  Likely in this case, it would still take at least some
code when planning setops.

Additionally, the setops code is badly in need of a rewrite, so the
bar is likely pretty high on adding too many smarts in there that we
need to migrate forwards after a rewrite. The setops planning code is
still not making use of the upper planner pathification work that Tom
did years ago. In many cases UNION would be far more optimal if it
were implemented as an Index Scan -> MergeAppend -> Unique.  I think
right now we only consider Append -> Sort -> Unique and Append -> Hash
Aggregate.

David