Re: Slow first query despite LIMIT and OFFSET clause

Alban Hertroys <dalroi@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> · Thu, 29 Jan 2009 08:56:13 +0100

On Jan 29, 2009, at 1:35 AM, Phoenix Kiula wrote:

On Thu, Jan 29, 2009 at 2:25 AM, Alban Hertroys
<dalroi@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> wrote:

Ah I see, that's the original query and its plan again, not the one  
after
implementing those triggers! You had me scratching my head for a  
bit there,
wondering why the count() subquery was still there.

Yes that was just for info. Here are the new query without the count()
in there:

explain analyze SELECT
         testimonials.url
         ,testimonials.alias
         ,testimonials.aliasEntered
         ,testimonials.title
         ,testimonials.modify_date
         ,testimonials.id
         ,visitcount.visit_count
         ,visitcount.unique_count
         ,visitcount.modify_date
         ,coalesce(  extract(epoch from now()) -  extract(epoch from
visitcount.modify_date), 0)
 FROM testimonials
 LEFT OUTER JOIN visitcount USING (id)
 WHERE
              testimonials.user_id = 'superman'
      and testimonials.user_known = 1
      and testimonials.status = 'Y'
 ORDER BY testimonials.modify_date desc
 OFFSET 0 LIMIT 10
;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=61.42..61.44 rows=10 width=162) (actual
time=105.400..105.499 rows=10 loops=1)
  ->  Sort  (cost=61.42..61.46 rows=16 width=162) (actual
time=105.392..105.425 rows=10 loops=1)
        Sort Key: testimonials.modify_date
        ->  Nested Loop Left Join  (cost=0.00..61.10 rows=16
width=162) (actual time=0.092..94.516 rows=2027 loops=1)
              ->  Index Scan using new_idx_userknown on testimonials
(cost=0.00..24.29 rows=16 width=146) (actual time=0.058..10.983
rows=2027 loops=1)
                    Index Cond: ((user_id)::text = 'superman'::text)
                    Filter: (status = 'Y'::bpchar)
              ->  Index Scan using visitcount_pkey1 on visitcount
(cost=0.00..2.28 rows=1 width=24) (actual time=0.024..0.026 rows=1
loops=2027)
                    Index Cond: (testimonials.id = visitcount.id)
Total runtime: 105.652 ms
(10 rows)

Note that I have an index on user_id, but because this is a website,
there are several user_ids where we only have the IP. The above query
is only ever needed for registered users, so for just the registered
users we created another partial index called

    "new_idx_userknown" btree (user_id) WHERE user_known = 1

Of course for unregistered users we use user_known = 0, so they are
excluded from this index. Is this not a useful partial index? I think
in this SQL, the user_id is always "superman" and the user_known
always 1 which is why the guesstimate from the planner may be off?

Love to hear thoughts.

Well, that seems to have got you rid of the somewhat expensive index  
scans on new_idx_userknown as well (the duplicate entry for the scan  
being due to the subquery of course).

What's remaining is the left join.
If I understand correctly you have a PK on visitcount.id and that  
table only contains records for people who have a visitcount > 0? That  
table gets updated a lot I'd think? The query plan still shows a bad  
estimate on that join; it has improved, but not enough.

Does the plan look better right after you ANALYSE visitcount? I'm  
suspecting you either need to autovacuum visitcount more frequently or  
you need to increase the statistics size on visitcount.id. You're  
updating that table a lot I think, which creates one new dead row for  
every update. Letting vacuum mark the dead ones as reusable more  
frequently should also help keep that table and it's indexes cleaner,  
although the records the indexes are pointing to will be all over the  
place.

I'm wondering... In highly updated tables it's probably more efficient  
to leave the dead rows alone (just marking them dead) and only append  
the updated ones at the end of the table? The dead rows will  
accumulate at the start of the table while the new ones go to the end.  
After a while a large section of the start of the table could just be  
removed as it'd only contain dead rows... This may already be in place  
of course, I don't have time now to look into the design specifics and  
it seems kind of an obvious thing to do!

Regards,
Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.

!DSPAM:737,49816122747034095710041!

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general