Re: How can this be optimized, if possible?

"Dann Corbit" <DCorbit@xxxxxxxxx> · Tue, 5 Jul 2005 10:20:16 -0700

Would it help if you created a trigger that puts a category count into a
statistics table for each table?  Perhaps you could gather most of the
information you need from such a process.  If an estimate is good enough
and you are using the statistics collector and if you have an index on
that column, perhaps you could even query the system statistics tables.

Do you have a large number of categories, or is it only a few?

If it is only 10 categories or so, and if they do not change, then
perhaps you can reformulate as a collection of equal joins.

What does the query plan look like?  What sort of hardware are you
using?  What version of PostgreSQL are you using?

Your query speed does seem shockingly slow for only 30,000 rows in the
largest table.

> -----Original Message-----
> From: pgsql-general-owner@xxxxxxxxxxxxxx [mailto:pgsql-general-
> owner@xxxxxxxxxxxxxx] On Behalf Of Net Virtual Mailing Lists
> Sent: Monday, July 04, 2005 9:47 PM
> To: pgsql-general@xxxxxxxxxxxxxx
> Subject:  How can this be optimized, if possible?
> 
> Hello,
> 
> 
> My database has grown far faster then expected and a query which used
to
> run acceptably now does not.   I'm trying to figure out a way to make
> this operate faster and scale better.   I'm very open to the idea that
> this does not need to be done using a SQL query at all - right now I'm
> really just in need of some conceptual/architectural help on this one.
> 
> 
> So I have two tables:
> 
>          Table "category"
>     Column    |       Type        | Modifiers
> --------------+-------------------+-----------
>  head_title   | character varying |
>  cat_title    | character varying |
>  subcat_title | character varying |
>  category     | ltree             |
> 
> 
> 
>                                              Table "test"
>        Column       |           Type           |
>    Modifiers
> --------------------+--------------------------
> +-----------------------------------------------------------------
>  id                 | integer                  | not null default
> nextval('master.test_id_seq'::text)
>  category           | ltree[]                  |
> 
> 
> ... there are other fields in the test table, but these are really the
> only two relevant to this.
> 
> 
> The query I want to run against these two tables is something like
this:
> 
> SELECT
>  count(*) as count,
>  category.category,
>  nlevel(category.category) AS level,
>  subpath(category.category,0,nlevel(category.category)-1) as parent,
>  category.head_title,
>  category.cat_title,
>  category.subcat_title
> FROM
>  test,
>  category
> WHERE
>  test.category <@ category.category
> GROUP BY
>   category.category, category.head_title, category.cat_title,
> category.subcat_title |
> 
> 
> Many times the "WHERE" clause will contain additional search criteria
on
> the 'test' table.  What I am trying to get is a count of how many rows
> from the test table fall into each category, being limited by the
search
> criteria.
> 
> This query is starting to take an enormous amount of time (30+
seconds)
> and I really need the results of this in a couple seconds tops.  I can
do
> a "select category from test" and it completes in about .5 seconds.
The
> category table currently only has 225 rows, the test table having
> approximately 30,000.  "SELECT count(category,category FROM test GROUP
BY
> category" is quite slow and I thought of making a materialized view of
> this, but then of course I don't see any way to make that result
limited
> by my search criteria.
> 
> I am completely open to re-architecting this entirely, performance of
> this query is critical to my application - I really just am not sure
> where to start.  It seems like everything I do is worse then what I
> started with.
> 
> ... It *almost* seems as if I need to build some sort of "search
engine
> like" tool which performs all the queries against the database, has
tons
> of memory, and cache the category attributes for each record in
memory.
> This sure seems like a lot of work though - I sincerely hope there is
an
> easier way.....
> 
> Thanks for your help, as always!
> 
> - Greg
> 
> 
> ---------------------------(end of
broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
    (send "unregister YourEmailAddressHere" to majordomo@xxxxxxxxxxxxxx)