Re: Query performance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 25.1.2015 07:38, Joe Van Dyk wrote:
> 
> Here's one that's not quite as well: http://explain.depesz.com/s/SgT

As Pavel already pointed out, the first problem is this part of the plan:

Seq Scan on events e2 (cost=0.00..120,179.60 rows=4,450,241 width=51)
(actual time=0.014..33,773.370 rows=4,450,865 loops=1)
    Filter: (product_id <> '81716'::citext)

Consuming ~33 seconds of the runtime. If you can make this faster
somehow (e.g. by getting rid of the citext cast), that'd be nice.

Another issue is that the hashjoin is batched:

   Buckets: 65536 Batches: 8 Memory Usage: 46085kB

The hash preparation takes ~40 seconds, so maybe try to give it a bit
more memory - I assume you have work_mem=64MB, so try doubling that
(ISTM 512MB should work with a single batch). Maybe this won't really
improve the performance, though. It still has to process ~4.5M rows.

Increasing the work mem could also result in switching to hash
aggregate, making the sort (~30 seconds) unnecessary.

Anyway, ISTM this works as expected, i.e.

(a) with rare product_id values the queries are fast
(b) with common product_id values the queries are slow

That's expected, because (b) needs to process much more data. I don't
think you can magically make it run as fast as (a). The best solution
might be to keep a pre-aggregated results - I don't think you really
need exact answers when recommending "similar" products.

I also wonder if you really need to join the tables? I mean, what if you
do something like this:

CREATE TABLE events_aggregated AS SELECT
   site_id,
   array_agg(product_id) AS product_ids,
   count(nullif(e2.type='viewed', false)) view_count,
   count(nullif(e2.type='purchased', false)) purchase_count
FROM events
GROUP BY 1;

and then using intarray with GIN indexes to query this table?
Something like this:

  CREATE products_agg_idx ON aggregated
                       USING GIN (product_ids gin__int_ops);

  SELECT * FROM events_aggregated WHERE product_ids @> ARRAY['82503'];

regards

-- 
Tomas Vondra                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Yosemite]

  Powered by Linux