Re: How do I bump a row to the front of sort efficiently

Sam Saffron <sam.saffron@xxxxxxxxx> · Wed, 4 Feb 2015 21:53:52 +1100

The union hack may be able to work, but what I want is slightly more complex

I want this to work efficiently, even without the case it chokes:

explain select * from testing order by id in (100,2,-1) desc, id limit 30;
                                QUERY PLAN
---------------------------------------------------------------------------
 Limit  (cost=4869.45..4869.52 rows=30 width=9)
   ->  Sort  (cost=4869.45..5119.45 rows=100000 width=9)
         Sort Key: ((id = ANY ('{100,2,-1}'::integer[]))), id
         ->  Seq Scan on testing  (cost=0.00..1916.00 rows=100000 width=9)
(4 rows)

I need to be able to offset and limit the union hack in a view, which
is proving very tricky.

On Wed, Feb 4, 2015 at 9:15 PM, BladeOfLight16 <bladeoflight16@xxxxxxxxx> wrote:
> On Tue, Feb 3, 2015 at 11:28 PM, Sam Saffron <sam.saffron@xxxxxxxxx> wrote:
>>
>> Note: I still consider this a bug/missing feature of sorts since the
>> planner could do better here, and there is no real clean way of
>> structuring a query to perform efficiently here, which is why I
>> erroneously cross posted this to hacker initially:
>
>
> No, it should not be considered a bug or a deficiency. You're telling the
> system to use a computed value that uses an arbitrary logic construct for
> sorting, and that's before you actually give it a filter to work with.
> You're also trying to abuse ORDER BY to make LIMIT do filtering. How do you
> expect that to go? I would expect the planner to do exactly what you told it
> you wanted: sort the entire table by that computed value, and it would have
> to do a sequential scan to compute the value for every row before it knew
> which ones came first for the LIMIT.
>
> CASE is well known to cause optimization problems; arbitrary conditional
> logic isn't especially conducive to a planner optimizing things. In general,
> the planner has no idea what the logic really means, so the planner would
> have to have some kind of special logic trying to pick up on this case. Your
> particular use case is uncommon; why should the planner code be junked up
> with a thousand little optimizations for uncommon situations like this
> (which would make it unmaintainable) when you already have a reasonable
> alternative? PG is great for a reason: the devs have made a lot of fantastic
> choices in designing it. Trying to keep the planner relatively simple is one
> of them, if I recall correctly.
>
> What you want is well represented by a UNION query: you want it to fetch one
> particular row by ID, and then you want to tack on 29 other particular rows
> based on a particular sort order and possibly an offset. These are two
> completely disparate ways of fetching data; of course the most optimal way
> is going to be to essentially write two queries and put the results
> together. That's also going to be the clearest way of writing the query, so
> the next person to work on it knows what you were doing.
>
> And that aside, you don't even need to use CASE. You could've just used (id
> = 1), which would give you a boolean result. You can most certainly sort by
> a boolean. (I believe PG sorts TRUE first.) You should see if that gets
> optimized at all. (If it doesn't, I won't be surprised and everything else I
> said still holds.)
>
> By the way, you can do better on your UNION query:
>
> select * from topic
> where id = 1000
> union all
> (select * from topic
> where id <> 1000
> order by bumped_at desc
> limit 29)
>
> I imagine your original would be at risk of LIMITing out the very row you
> seek to get at the "top", since you don't have an ORDER BY to tell it which
> ones to keep during the outer LIMIT.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general