On Tue, 7 Jul 2020 at 09:03, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote: > > On Mon, 6 Jul 2020 at 21:49, David Rowley <dgrowleyml@xxxxxxxxx> wrote: >> >> On Tue, 7 Jul 2020 at 00:43, Simon Riggs <simon@xxxxxxxxxxxxxxx> wrote: >> > >> > On Mon, 6 Jul 2020 at 12:37, Robins Tharakan <tharakan@xxxxxxxxx> wrote: >> >> (SELECT 1 FROM pg_class) UNION (SELECT 1 FROM pg_class); >> >> vs. >> >> (select 1 FROM pg_class limit 1) UNION (SELECT 1 FROM pg_class limit 1); -- Faster >> > >> > >> > Those two queries aren't logically equivalent, so you can't apply the LIMIT 1 as an optimization. >> > >> > First query returns lots of random rows, the second query returns just one random row. >> >> I think the idea here is that because the target list contains only >> constants that pulling additional rows from the query after the first >> one will just be a duplicate row and never add any rows after the >> UNION is processed. > > > OK, I see. Are you saying you think it's a worthwhile optimization to autodetect? I guess it's all about how much effort would be involved to detect such cases vs how likely it is that we're going to speed up someone's query. I imagine it's not much effort to detect this, but also, this is the first time I recall seeing this mentioned, so perhaps that means not many people would be rewarded by making such a change. (It does seem like quite a strange way to express the query.) There is currently a patch floating around that implements UniqueKeys which allows RelOptInfos to be tagged with the properties that they're unique on. With the current design of that patch, there is no way to say "this relation *only* has duplicate rows". Perhaps some design tweaks there can make detecting this case cheaper in terms of CPU effort during planning, and perhaps also in terms of how much code it would take to make it work. I'll try to keep this in mind when I review that work soon. If we were to start adding optimisations for cases such as this, then I'd rather they were done in some general-purpose way that just makes them "just work" rather than adding special cases around the codebase that trigger some special behaviour. Likely in this case, it would still take at least some code when planning setops. Additionally, the setops code is badly in need of a rewrite, so the bar is likely pretty high on adding too many smarts in there that we need to migrate forwards after a rewrite. The setops planning code is still not making use of the upper planner pathification work that Tom did years ago. In many cases UNION would be far more optimal if it were implemented as an Index Scan -> MergeAppend -> Unique. I think right now we only consider Append -> Sort -> Unique and Append -> Hash Aggregate. David