Search Postgresql Archives

Re: Help Optimizing a Summary Query

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Arthur. I don't think there is as big a different between BIGINT and INTEGER as you think there is. In fact with an extended filesystem you might not see any difference at all. 

As I put in the first emal I am using a GIST index on user.name.

I was really more interested in the LEFT OUTER JOINs vs EXISTS queries and if there was a better alternative I had not considered. 

On Tue, Dec 9, 2014 at 11:44 AM, Arthur Silva <arthurprs@xxxxxxxxx> wrote:
On Tue, Dec 9, 2014 at 4:18 PM, Robert DiFalco <robert.difalco@xxxxxxxxx> wrote:
I'm sorry, I missed a JOIN on the second variation. It is:

SELECT u.id, u.name, u.imageURL, u.bio, 
   CASE 
      WHEN f.friend_id IS NOT NULL THEN 'isFriend'
      WHEN s.to_id IS NOT NULL THEN 'hasSentRequest' 
      WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
      ELSE 'none'
   END AS 'friendStatus',
   (SELECT COUNT(1) AS d 
      FROM friends f1 
         JOIN friends f2 ON f1.fiend_id = f2.friend_id 
      WHERE f1.user_id = 33 AND f2.user_id = u.id)  
FROM users u 
LEFT OUTER JOIN friends f ON f.user_id = 33 AND f.friend_id = u.id
LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
WHERE u.id != 33 AND u.name LIKE '%John%' ORDER BY u.name;


On Tue, Dec 9, 2014 at 10:15 AM, Robert DiFalco <robert.difalco@xxxxxxxxx> wrote:
I have users, friends, and friend_requests. I need a query that essentially returns a summary containing:
   
    * user (name, imageURL, bio, ...)
    * Friend status (relative to an active user)
       * Is the user a friend of the active user?
       * Has the user sent a friend request to the active user?
       * Has the user received a friend request from the active user?
    * # of mutualFriends
    * Exclude the active user from the result set.

So I have mocked this up two ways but both have complicated query plans that will be problematic with large data sets. I'm thinking that my lack of deep SQL knowledge is making me miss the obvious choice.

Here's my two query examples:

SELECT u.id, u.name, u.imageURL, u.bio, 
   CASE 
      WHEN EXISTS(SELECT 1 FROM friends f WHERE f.user_id = 33 AND f.friend_id = u.id)       THEN 'isFriend'
      WHEN EXISTS(SELECT 1 FROM friend_requests s WHERE s.to_id = 33   AND s.from_id = u.id) THEN 'hasSentRequest'
      WHEN EXISTS(SELECT 1 FROM friend_requests r WHERE r.to_id = u.id AND r.from_id = 33)   THEN 'hasReceivedRequest'
      ELSE 'none'
   END AS "friendStatus",
   (SELECT COUNT(1)  
      FROM friends f1 
         JOIN friends f2 ON f1.friend_id = f2.friend_id 
      WHERE f1.user_id = 33 AND f2.user_id = u.id) AS mutualFriends 
FROM users u 
WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;

SELECT u.id, u.name, u.imageURL, u.bio, 
   CASE 
      WHEN f.friend_id IS NOT NULL THEN 'isFriend'
      WHEN s.to_id IS NOT NULL THEN 'hasSentRequest' 
      WHEN r.to_id IS NOT NULL THEN 'hasReceivedRequest'
      ELSE 'none'
   END AS 'friendStatus',
   (SELECT COUNT(1) AS d 
      FROM friends f1 
         JOIN friends f2 ON f1.fiend_id = f2.friend_id 
      WHERE f1.user_id = 33 AND f2.user_id = u.id)  
FROM users u 
LEFT OUTER JOIN friend_requests s ON s.to_id = 33 AND s.from_id = u.id
LEFT OUTER JOIN friend_requests r ON r.to_id = u.id AND r.from_id = 33
WHERE u.id != 33 AND u.name LIKE 'John%' ORDER BY u.name;

33 is just the id of the active user I am using for testing. The WHERE clause could be anything. I'm just using "u.name" here but I'm more concerned about the construction of the result set than the WHERE clause. These have more or less similar query plans, nothing that would change things factorially. Is this the best I can do or am I missing the obvious?

Here are the tables:


CREATE TABLE users (
  id            BIGINT,
  name          VARCHAR,
  imageURL      VARCHAR
  created       TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  phone_natl    BIGINT,       /* National Phone Number */
  country_e164  SMALLINT,     /* E164 country code */
  email         VARCHAR(255),
  PRIMARY KEY (id),
  UNIQUE (email),
  UNIQUE (phone_natl, country_e164)
);


CREATE TABLE friends (
  user_id  BIGINT,
  friend_id   BIGINT,
  PRIMARY KEY (user_id, user_id),
  FOREIGN KEY (user_id)    REFERENCES users(id) ON DELETE CASCADE,
  FOREIGN KEY (friend_id)  REFERENCES users(id) ON DELETE CASCADE
);
CREATE INDEX idx_friends_friend ON friends(friend_id);

CREATE TABLE friend_requests (
  from_id  BIGINT,
  to_id    BIGINT,
  created  TIMESTAMPTZ DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (from_id, user_id),
  FOREIGN KEY (from_id)  REFERENCES users(id) ON DELETE CASCADE,
  FOREIGN KEY (to_id)    REFERENCES users(id) ON DELETE CASCADE
);
CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);

Let me know if you guys need anything else.




Hello Robert, none of your schemas worked for me, here's a clean version

CREATE TABLE users (
  id            BIGINT,
  name          VARCHAR,
  imageURL      VARCHAR,
  created       TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  phone_natl    BIGINT,     
  country_e164  SMALLINT,   
  email         VARCHAR(255),
  PRIMARY KEY (id),
  UNIQUE (email),
  UNIQUE (phone_natl, country_e164)
);


CREATE TABLE friends (
  user_id  BIGINT,
  friend_id   BIGINT,
  PRIMARY KEY (user_id, friend_id),
  FOREIGN KEY (user_id)    REFERENCES users(id) ON DELETE CASCADE,
  FOREIGN KEY (friend_id)  REFERENCES users(id) ON DELETE CASCADE
);
CREATE INDEX idx_friends_friend ON friends(friend_id);

CREATE TABLE friend_requests (
  from_id  BIGINT,
  to_id    BIGINT,
  created  TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (from_id, to_id),
  FOREIGN KEY (from_id)  REFERENCES users(id) ON DELETE CASCADE,
  FOREIGN KEY (to_id)    REFERENCES users(id) ON DELETE CASCADE
);
CREATE INDEX idx_friend_requests_to ON friend_requests(to_id);

I may look into the query later but here's some thoughts

* you want an index to speed up name searches on user.name, I suggest pg_trgm + ILIKE as a starting point.
* you really expect more than a billion users? If not (or not in the near future) use INT instead. This will save you a significant amount of memory and eventually buy you time later.
* as long as you don't hit the disk for the queries you'll be fine, so make sure you have enough memory or use read-slaves with smaller working sets.



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux