Performance problem with table containing a lot of text (blog)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello!

Some background info.. We have a blog table that contains about eight million blog entries. Average length of an entry is 1200 letters. Because each 8k page can accommodate only a few entries, every query that involves several entries causes several random seeks to disk. We are having problems with queries like:

1) give me a list of months when I have written someting
2) give me id's of entries I have written on month X year X
3) give me the number of blog entries my friends have written since last
   time

Clustering would probably decrease random seeks but it is not an option. It locks the table and operation would take "some" time. It should also be done periodically to maintain clustering.

I guess that file system cache gets filled with text contents of blog entries although they are totally useless for queries like these. Contents of individual blog entries are cached to memcached on application level anyway. There's rarely any need to fetch them from database.

It would be nice if I could flag a column to be toasted always, regardless of it's length.

Because there isn't such option maybe I should create a separate table for blog text content. Does anybody have better ideas for this? :)

Thanks!


P.S. Here's a plan for query #3. Users can have several bookmark groups they are following. User can limit visibility of an entry to some of his/her bookmark group. Those are not any kind of bottlenecks anyway...

 Sort  (cost=34112.60..34117.94 rows=2138 width=14)
   Sort Key: count(*), upper((u.nick)::text)
   ->  HashAggregate  (cost=33962.28..33994.35 rows=2138 width=14)
         ->  Nested Loop  (cost=8399.95..33946.24 rows=2138 width=14)
               ->  Nested Loop  (cost=8399.95..9133.16 rows=90 width=22)
                     ->  HashAggregate  (cost=8399.95..8402.32 rows=237 width=8)
                           ->  Nested Loop  (cost=0.00..8395.99 rows=792 width=8)
                                 ->  Index Scan using user_bookmark_uid on user_bookmark ub  (cost=0.00..541.39 rows=2368 width=12)
                                       Index Cond: (uid = 256979)
                                 ->  Index Scan using user_bookmark_group_pkey on user_bookmark_group bg  (cost=0.00..3.30 rows=1 width=4)
                                       Index Cond: ("outer".bookmark_group_id = bg.bookmark_group_id)
                                       Filter: (("type" >= 0) AND ("type" <= 1) AND (trace_blog = 'y'::bpchar))
                     ->  Index Scan using users_uid_accepted_only on users u  (cost=0.00..3.06 rows=1 width=14)
                           Index Cond: (u.uid = "outer".marked_uid)
               ->  Index Scan using blog_entry_uid_beid on blog_entry be  (cost=0.00..275.34 rows=24 width=8)
                     Index Cond: ((be.uid = "outer".marked_uid) AND (COALESCE("outer".last_seen_blog_entry_id, 0) < be.blog_entry_id))
                     Filter: ((visibility = 'p'::bpchar) AND ((status = 'p'::bpchar) OR (status = 'l'::bpchar)) AND ((bookmark_group_id IS NULL) OR (subplan)))
                     SubPlan
                       ->  Index Scan using user_bookmark_pkey on user_bookmark fub  (cost=0.00..3.42 rows=1 width=0)
                             Index Cond: ((bookmark_group_id = $0) AND (marked_uid = 256979))

P.S. That particular user has quite many unread entries though...

    |\__/|
    ( oo )    Kari Lavikka - tuner@xxxxxx - (050) 380 3808
__ooO(  )Ooo_______ _____ ___ _ _  _   _    _      _                  _
      ""

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux