Best approach for query with optional constraints

Jon Smark <jon.smark@xxxxxxxxx> · Sun, 27 Jan 2013 08:18:17 -0800 (PST)

Hi,

I have a problem with multiple solutions, and my question concerns which
solution is preferred given whatever optimizations happen inside the query
planner.

In a bug tracking system, I have the tables "users", "bugs" and "tags".
One bug may have one user (the reporter) but multiple tags.  Moreover, one
tag may be associated with multiple bugs.  There is therefore a many-to-many
relation between bugs and tags.  I use an extra table "bug_tags" to keep
track of this relation, as shown below:

CREATE TABLE bugs (
  bid int4 UNIQUE NOT NULL,
  uid int4 REFERENCES users(uid) NOT NULL,
  title text NOT NULL,
  PRIMARY KEY (bid));

CREATE TABLE bug_tags (
  bid int4 REFERENCES bugs(bid) NOT NULL,
  tid int4 REFERENCES tags(tid) NOT NULL,
  PRIMARY KEY (bid, tid));

Here's the problem: I want to retrieve a list of bugs (possibly) matching
certain constraints.  One possible constraint is a user ID: if given, only
those bugs reported by the user will be returned.  Another constraint is
a set of tags: only those bugs that contain *all* the tags in the set must
be returned (if the set is empty, then all bugs match).  Moreover, the user
and tags constraints are conjunctive: if both set, that means I want only
the bugs by that user having also the given set of tags.

Below is one possible solution using a CTE to fetch all bugs that match
the tags constraint.  (Note that $IDENTIFIER will be replaced by the actual
contents of the variable using prepared statements).

with tagged as
  (select bid
  from bug_tags inner join tags on bug_tags.tid = tags.tid
  group by bid
  having array_agg (tags.tag) @> $TAGS)
select *
from bugs
where (($UID :: int4) is null or uid = $UID) and
($TAGS = (array[] :: text[]) or bid in (select bid from tagged))

The first alternative is shown below.  The basic principle is the same,
except that instead of a CTE there is a subquery.  The idea is that the
subquery will only be performed if the set of tags is non-empty.  Or is
the query planner smart enough to never execute the CTE if this is the
case?  If so, then both these solutions should be identical.

select *
from bugs
where (($UID :: int4) is null or uid = $UID) and
($TAGS = (array[] :: text[]) or
bid in (
  select bid
  from bug_tags inner join tags on bug_tags.tid = tags.tid
  group by bid
  having array_agg (tags.tag) @> $TAGS))

The second alternative is shown below.  This one uses a different approach:
instead of constructing a set of bug IDs matching the tag constraints, it
goes one by one through the bugs, verifying if each one matches the tag
constraints.

select *
from bugs
where (($UID :: int4) is null or uid = $UID) and
($TAGS = (array[] :: text[]) or
$TAGSs <@ array (select tags.tag from bug_tags inner join tags on bug_tags.tid = tags.tid where bug_tags.bid = bugs.bid))

To conclude: is it worth worrying about such details, or can I assume the
query optimizer will automagically choose the best approach.  And if not,
from which solution should I expect better results when a) there are no
constraints, b) there are only user constraints, c) there are only tag
constraints, and d) there are both user and tag constraints?

Thank you in advance for any light you might be able to shed!
Best,
Jon

P.S. Other solutions to a similar problem can be found on this thread:
http://stackoverflow.com/questions/14527080/selecting-matching-subset-in-many-to-many-relation

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general