Re: SQL select query becomes slow when using limit (with no offset)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for your response.

I think your analysis is correct, When there are more than 100 rows that match this query, limit 100 is fast.

However, we often have less than hundred rows, so this is not sufficient for us.

This suggestion ('OFFSET 0' trick) did not show differences in response time (runs in 947ms).

One thing that helps, is limiting the set by adding this to where clause:
and events_events.dateTime > '2009-07-24'.
(query now runs in approx 500ms)

The workaround we implemented,  is query caching in our application (Java, with JPA / Hibernate second level query cache). This actually solves the problem for us, but I'd prefer to get this query perform in postgres as well. I'd think that the Postgresql query planner should be smarter in handling LIMIT statements.

Would it get attention if I submit this to http://www.postgresql.org/support/submitbug ? (in fact it is not really a bug, but an improvement request).

Best regards,

Kees


2009/8/5 Russell Smith <mr-russ@xxxxxxxxxx>
Kees van Dieren wrote:
> Hi Folks,
>
> Thanks for your response.
>
> I have added the following index (suggested by other post):
>
> CREATE INDEX events_events_cleared_eventtype
>   ON events_events
>   USING btree
>   (eventtype_id, cleared)
>   WHERE cleared = false;
>
> Also with columns in reversed order.
>
> No changes in response time noticed.
>
> Index on cleared column already is there (indices are in sql file
> attached to initial post.). eventtype_id has a foreign key constraint,
> which adds an index automatically I believe?
>
> The explain analyze results for both queries:
> explain analyze select events_events.id <http://events_events.id> FROM
> events_events
> left join events_event_types on
> events_events.eventType_id=events_event_types.id
> <http://events_event_types.id>
> where events_event_types.severity=70
> and not events_events.cleared
> order by events_events.dateTime DESC LIMIT 100
> >>>
> "Limit  (cost=0.00..125.03 rows=100 width=16) (actual
> time=0.046..3897.094 rows=77 loops=1)"
> "  ->  Nested Loop  (cost=0.00..120361.40 rows=96269 width=16) (actual
> time=0.042..3896.881 rows=77 loops=1)"
> "        ->  Index Scan Backward using events_events_datetime_ind on
> events_events  (cost=0.00..18335.76 rows=361008 width=24) (actual
> time=0.025..720.345 rows=360637 loops=1)"
> "              Filter: (NOT cleared)"
> "        ->  Index Scan using events_event_types_pkey on
> events_event_types  (cost=0.00..0.27 rows=1 width=8) (actual
> time=0.003..0.003 rows=0 loops=360637)"
> "              Index Cond: (events_event_types.id
> <http://events_event_types.id> = events_events.eventtype_id)"
> "              Filter: (events_event_types.severity = 70)"
> "Total runtime: 3897.268 ms"
>
The plan here is guessing that we will find the 100 rows we want pretty
quickly by scanning the dateTime index.  As we aren't expecting to have
to look through many rows to find 100 that match the criteria.  With no
cross column statistics it's more a guess than a good calculation.  So
the guess is bad and we end up scanning 360k rows from the index before
we find what we want.   My skills are not up to giving specific advise
on how to avert this problem.  Maybe somebody else can help there.
> explain analyze select events_events.id <http://events_events.id> FROM
> events_events
> left join events_event_types on
> events_events.eventType_id=events_event_types.id
> <http://events_event_types.id>
> where events_event_types.severity=70
> and not events_events.cleared
> order by events_events.dateTime DESC
> >>>
> "Sort  (cost=20255.18..20495.85 rows=96269 width=16) (actual
> time=1084.842..1084.951 rows=77 loops=1)"
> "  Sort Key: events_events.datetime"
> "  Sort Method:  quicksort  Memory: 20kB"
> "  ->  Hash Join  (cost=2.09..12286.62 rows=96269 width=16) (actual
> time=1080.789..1084.696 rows=77 loops=1)"
> "        Hash Cond: (events_events.eventtype_id =
> events_event_types.id <http://events_event_types.id>)"
> "        ->  Seq Scan on events_events  (cost=0.00..9968.06
> rows=361008 width=24) (actual time=0.010..542.946 rows=360637 loops=1)"
> "              Filter: (NOT cleared)"
> "        ->  Hash  (cost=1.89..1.89 rows=16 width=8) (actual
> time=0.077..0.077 rows=16 loops=1)"
> "              ->  Seq Scan on events_event_types  (cost=0.00..1.89
> rows=16 width=8) (actual time=0.010..0.046 rows=16 loops=1)"
> "                    Filter: (severity = 70)"
> "Total runtime: 1085.145 ms"
>
> Any suggestions?
This plan is faster as you avoid the index scan.  The planner is
preferring to do a tablescan to find what it needs.  This is much faster
than the 360k random I/O index lookups.  You can force this type of plan
with a subquery and the OFFSET 0 trick, but I'm not sure it's the best
solution.

eg

explain analyze SELECT * FROM
   (SELECT events_events.id <http://events_events.id> FROM events_events
        LEFT JOIN events_event_types on
events_events.eventType_id=events_event_types.id
<http://events_event_types.id>
       WHERE events_event_types.severity=70
                    AND not events_events.cleared
       ORDER BY events_events.dateTime DESC OFFSET 0) AS a LIMIT 100

Regards

Russell



--
Squins | IT, Honestly
Oranjestraat 23
2983 HL Ridderkerk
The Netherlands
Phone: +31 (0)180 414520
Mobile: +31 (0)6 30413841
www.squins.com
Chamber of commerce Rotterdam: 22048547

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux