Search Postgresql Archives

Efficient "filter query" with positive and/or negative conditions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We have a database with contacts within lists, which can have any number of tags associated with them. I would like to segment the contacts based on any number of tag-based conditions, all of which must be fulfilled (i.e. an ALL rule/filter). For instance, I want to extract all contacts which have the tags "customer" AND "newsletter" but NOT the "invited" tag. However, segments need not contain "positive" predicates, so a segment defined as only NOT "invited" tag should include ALL contacts except those with the "invited" tag. I have found a working query using CTE, but it's way too slow when tens of thousands of contacts are involved. Suggestions for an alternative query and/or optimizations to this one?

Server:
-------
PostgreSQL 8.4.2 on a shared server running Apache/PHP/MySQL/PostgreSQL
shared_buffers: 128MB
work_mem: 8MB
effective_cache_size: 256MB

Current query:
--------------
WITH segments_contacts_tags AS (
    SELECT segmentid, tagname, tagtype, email
        FROM segments_tags st
            LEFT JOIN contacts_tags ct USING (tagname)
    WHERE segmentid = 93
), positive_segment_matches AS (
    SELECT segmentid, email, COUNT(email) AS tags
        FROM segments_contacts_tags
        WHERE tagtype = 1
        GROUP BY 1, 2
), positive_predicates_count AS (
    SELECT segmentid, COUNT(*) AS count
        FROM segments_tags
        WHERE tagtype = 1
        GROUP BY 1
)
SELECT s.segmentid, cl.email
    FROM segments s
        INNER JOIN contacts_lists cl USING (lid)
        LEFT JOIN positive_segment_matches psm USING (segmentid, email)
        LEFT JOIN positive_predicates_count ppc USING (segmentid)
    WHERE segmentid = 93
AND ((psm.email IS NULL AND ppc.count IS NULL) OR psm.tags = ppc.count)
EXCEPT
SELECT segmentid, email
    FROM segments_contacts_tags st
    WHERE st.tagtype = 0;

Explain analyze:
----------------
HashSetOp Except (cost=1817.12..5822.48 rows=200 width=36) (actual time=5827.885..6051.771 rows=64672 loops=1)
   CTE segments_contacts_tags
-> Nested Loop Left Join (cost=0.00..1323.86 rows=21819 width=61) (actual time=0.143..696.589 rows=64672 loops=1) -> Seq Scan on segments_tags st (cost=0.00..1.05 rows=1 width=29) (actual time=0.014..0.025 rows=1 loops=1)
                 Filter: (segmentid = 93)
-> Index Scan using contacts_tags_tagname on contacts_tags ct (cost=0.00..1050.07 rows=21819 width=49) (actual time=0.110..265.453 rows=64672 loops=1)
                 Index Cond: (st.tagname = ct.tagname)
   CTE positive_segment_matches
-> HashAggregate (cost=491.75..491.88 rows=11 width=36) (actual time=1733.143..1995.489 rows=64672 loops=1) -> CTE Scan on segments_contacts_tags (cost=0.00..490.93 rows=109 width=36) (actual time=0.161..1245.836 rows=64672 loops=1)
                 Filter: (tagtype = 1)
   CTE positive_predicates_count
-> HashAggregate (cost=1.07..1.12 rows=4 width=4) (actual time=0.069..0.088 rows=4 loops=1) -> Seq Scan on segments_tags (cost=0.00..1.05 rows=4 width=4) (actual time=0.015..0.032 rows=4 loops=1)
                 Filter: (tagtype = 1)
-> Append (cost=0.26..3681.56 rows=64811 width=36) (actual time=2734.790..5363.556 rows=64672 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.26..3189.55 rows=64702 width=36) (actual time=2734.782..4923.849 rows=64672 loops=1) -> Hash Left Join (cost=0.26..2542.53 rows=64702 width=36) (actual time=2734.773..4483.547 rows=64672 loops=1) Hash Cond: ((s.segmentid = psm.segmentid) AND ((cl.email)::text = (psm.email)::text)) Filter: (((psm.email IS NULL) AND (ppc.count IS NULL)) OR (psm.tags = ppc.count)) -> Nested Loop (cost=0.00..2052.95 rows=64702 width=44) (actual time=0.169..687.204 rows=64702 loops=1)
                           Join Filter: (s.lid = cl.lid)
-> Nested Loop Left Join (cost=0.00..1.15 rows=1 width=16) (actual time=0.130..0.189 rows=1 loops=1) Join Filter: (s.segmentid = ppc.segmentid) -> Seq Scan on segments s (cost=0.00..1.05 rows=1 width=8) (actual time=0.025..0.032 rows=1 loops=1)
                                       Filter: (segmentid = 93)
-> CTE Scan on positive_predicates_count ppc (cost=0.00..0.09 rows=1 width=12) (actual time=0.086..0.129 rows=1 loops=1)
                                       Filter: (ppc.segmentid = 93)
-> Seq Scan on contacts_lists cl (cost=0.00..1243.02 rows=64702 width=36) (actual time=0.014..231.331 rows=64702 loops=1) -> Hash (cost=0.25..0.25 rows=1 width=44) (actual time=2734.535..2734.535 rows=64672 loops=1) -> CTE Scan on positive_segment_matches psm (cost=0.00..0.25 rows=1 width=44) (actual time=1733.169..2481.359 rows=64672 loops=1)
                                 Filter: (segmentid = 93)
-> Subquery Scan "*SELECT* 2" (cost=0.00..492.02 rows=109 width=36) (actual time=23.767..23.767 rows=0 loops=1) -> CTE Scan on segments_contacts_tags st (cost=0.00..490.93 rows=109 width=36) (actual time=23.758..23.758 rows=0 loops=1)
                     Filter: (tagtype = 0)
 Total runtime: 6278.849 ms

Involved tables:
----------------

Table "public.contacts_lists"
     Column     |            Type             |          Modifiers
----------------+----------------------------- +------------------------------
 email          | email                       | not null
 lid            | integer                     | not null
lstatus | character(1) | not null default 'a'::bpchar
 ladded         | timestamp without time zone | not null default now()
 lstatuschanged | timestamp without time zone | not null default now()
Indexes:
    "contacts_lists_pkey" PRIMARY KEY, btree (email, lid)
    "contacts_lists_statchanged" btree (lstatuschanged)
    "contacts_lists_status" btree (lstatus)
Check constraints:
"status_check" CHECK (lstatus = 'a'::bpchar OR lstatus = 'b'::bpchar OR lstatus = 'u'::bpchar)
Foreign-key constraints:
"contacts_lists_email_fkey" FOREIGN KEY (email) REFERENCES contacts(email) ON UPDATE CASCADE ON DELETE CASCADE "contacts_lists_lid_fkey" FOREIGN KEY (lid) REFERENCES lists(lid) ON UPDATE RESTRICT ON DELETE CASCADE

Table "public.contacts_tags"
 Column  | Type  | Modifiers
---------+-------+-----------
 email   | email | not null
 tagname | text  | not null
Indexes:
    "contacts_tags_pkey" PRIMARY KEY, btree (email, tagname)
    "contacts_tags_tagname" btree (tagname)
Foreign-key constraints:
"contacts_tags_email_fkey" FOREIGN KEY (email) REFERENCES contacts(email) ON UPDATE CASCADE ON DELETE CASCADE "contacts_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE

Table "public.tags"
  Column   |            Type             |       Modifiers
-----------+-----------------------------+------------------------
 tagname   | text                        | not null
 createdat | timestamp without time zone | not null default now()
Indexes:
    "tags_pkey" PRIMARY KEY, btree (tagname)
Referenced by:
TABLE "contacts_tags" CONSTRAINT "contacts_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE TABLE "segments_tags" CONSTRAINT "segments_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE

Table "public.segments"
Column | Type | Modifiers -------------+----------------------------- +-------------------------------------------------------------- segmentid | integer | not null default nextval('segments_segmentid_seq'::regclass)
 lid         | integer                     | not null
 segmentname | text                        | not null
 createdat   | timestamp without time zone | not null default now()
Indexes:
    "segments_pkey" PRIMARY KEY, btree (segmentid)
Foreign-key constraints:
"segments_lid_fkey" FOREIGN KEY (lid) REFERENCES lists(lid) ON UPDATE RESTRICT ON DELETE CASCADE
Referenced by:
TABLE "segments_tags" CONSTRAINT "segments_tags_segmentid_fkey" FOREIGN KEY (segmentid) REFERENCES segments(segmentid) ON UPDATE RESTRICT ON DELETE CASCADE

Table "public.segments_tags"
  Column   |  Type   | Modifiers
-----------+---------+-----------
 segmentid | integer | not null
 tagname   | text    | not null
 tagtype   | integer | not null
Indexes:
    "segments_tags_pkey" PRIMARY KEY, btree (segmentid, tagname)
Foreign-key constraints:
"segments_tags_segmentid_fkey" FOREIGN KEY (segmentid) REFERENCES segments(segmentid) ON UPDATE RESTRICT ON DELETE CASCADE "segments_tags_tagname_fkey" FOREIGN KEY (tagname) REFERENCES tags(tagname) ON UPDATE CASCADE ON DELETE CASCADE

-- anders

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux