On Tue, 2 Mar 2021 at 21:53, Liu, Xinyu <liuxy@xxxxxxxxxx> wrote: > *Expected Behavior > > Since these two queries are semantically equivalent, we were hoping that PostgreSQL would evaluate them in roughly the same amount of time. > It looks to me that there is a missing optimization rule related to pushing the sort operator (i.e., order and limit) through the left join. > Given the significant query execution time difference, I was wondering if it is worth adding such a rule to make the system evaluate the first query more efficiently. > It would also be helpful if you could comment on if there is a standard practice to evaluate the tradeoff associated with adding such a rule in Postgresql. We currently don't attempt to push down LIMIT clauses into subqueries. Before we did that we'd need to get much better at figuring out how joins duplicate rows so that we could be sure that we're not limiting the subquery more than the number of records that the outer query will need to reach its limit. If you want some advice, you're likely to get more people on your side and possible support for making improvements to the query planner if you provide examples that look remotely like real-world queries. In the other emails that I've read from you on this list [1], it seems you're example queries are all completely bogus. I suspect that the queries are generated by some fuzz testing tool. I very much imagine that really don't need help with these at all. With respect, it seems to me that there's about zero chance that you genuinely need the results of this query more quickly and you've come for help with that. Because PostgreSQL does not proactively cache query plans, ad-hoc queries are always parsed, planned then executed. This means that it's often not practical to spend excessive amounts of time planning a query that gets executed just once. Adding new optimisations to the query planner means they either have to be very cheap to detect, or they must pay off in many cases. If you happen to think there's a genuine case for having the query planner do a better job of doing LIMIT pushdowns into subqueries, then you're welcome to submit a patch to implement that. You'll also need to carefully document exactly which cases the LIMIT can be pushed down and when it cannot. That's the hard part. The actual pushing down of the clause is dead easy. If you're going to do that, then I'd suggest you come up with better examples than this one. I don't think many people will get on board with your newly proposed optimisations when the queries are obviously not real. It's hard to imagine the optimisation being useful to any queries with a query that's so obviously not a real one. David [1] https://www.postgresql.org/message-id/BN7PR07MB52024B973EAB075F4DF6C19ACD999%40BN7PR07MB5202.namprd07.prod.outlook.com