Search Postgresql Archives

Re: Why doesn't Postgres apply limit on groups when retrieving N results per group using WHERE IN + ORDER BY

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 6 Feb 2024 at 01:23, Sean v <sean@xxxxxxxxxxxxxx> wrote:
> SELECT "orders".*
> FROM "orders"
> WHERE (user_id IN ?, ?, ?)
> ORDER BY "orders"."created_at" LIMIT 50
>
> I have two indexes - `(user_id)` and `(user_id, created_at)`. Only the first index is ever used with this query.

> I imagined that it would be clever enough to determine that only 50 results are needed, and that it could use the `(user_id, created_at)` index to get 50 orders for each user. Then sort and filter those few hundred results in memory.

It's as simple as the planner currently does not consider fetching 50
rows per user and doing a final sort before applying an overall LIMIT
50.

> I have found that I can speed this up significantly using CROSS JOIN LATERAL and it will use the composite index, but I'm struggling to understand WHY the CROSS JOIN LATERAL is needed here for it to use the index.

I'm afraid that's the best workaround until someone submits a patch to
have the planner consider doing this optimisation automatically.

> I've tried tweaking costs, disabling bitmap scans, etc, so it seems like this is a functional limitation rather than something to do with cost/statistics.

No amount of that will get you the plan you want without the LATERAL JOIN query.

> So my question is twofold:
> - why doesn't Postgres use the composite index, and then retrieve only the minimum necessary amount of rows (50 per user) using the query I posted above?
>
>  - If it is a functional limitation, is it lack of implementation, or is there a deeper incompatibility with how the query planner works that would prevent it from being able to do this?

Likely it wouldn't be too difficult to make the planner consider this
optimisation. However, for it to be valid, the ORDER BY clause would
have to contain only columns from the column(s) on the left side of
the IN clause.  I think likely this could be done by having the
planner consider performing a Nested Loop with an outer VALUES scan
and an inner parameterized index scan with a Limit node above it as a
path for scanning the base relation. The tricky part would be
adjusting the planner so it didn't needlessly leave the IN clause in
the WHERE clause when the chosen plan is the Nested Loop with the
values scan.  The current planner data structures are not really
geared up for optional base quals right now.  Something would need to
be done to make that work and off the top of my head, I don't know
what that would be.

David





[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux