On Tue, 11 Nov 2014 14:38:16 -0800 Robert DiFalco <robert.difalco@xxxxxxxxx> wrote: > I have a question about modeling a mutual relationship. It seems basic but > I can't decide, maybe it is 6 of one a half dozen of the other. > > In my system any user might be friends with another user, that means they > have a reciprocal friend relationship. > > It seems I have two choices for modeling it. > > 1. I have a table with two columns userOne and userTwo. If John is friends > with Jane there will be one row for both of them. > 2. I have a table with two columns owner and friend. If John is friends > with Jane there will be two rows, one that is {John, Jane} and another > {Jane, John}. > > The first option has the advantage of saving table size. But queries are > more complex because to get John's friends I have to JOIN friends f ON > f.userA = "John" OR f.userB = "John" (not the real query, these would be > id's but you get the idea). > > In the second option the table rows would be 2x but the queries would be > simpler -- JOIN friends f ON f.owner = "John". > > There could be >1M users. Each user would have <200 friends. > > Thoughts? Do I just choose one or is there a clear winner? TIA! I recommend a single row per relationship, because your estimates suggest that the size might be big enough to be worth optimizing on the basis of size. As far as optimizing queries and what not, I did this recently, and here's what worked well for me. Take this example table definition: CREATE TABLE friendship ( person1 INT NOT NULL, person2 INT NOT NULL, PRIMARY KEY (person1, person2), CHECK (person1 < person2) ); CREATE INDEX friendship_person2 ON friendship(person2); The check constraint guarantees the data will always be stored in a certain order, which allows you to optimize many queries (i.e., when figuring out wheter person 57 and person 86 are friends, the where clause is simplified becuase you know that person1 can't be 86). If you'll need to do queries of the "list all person 57's friends" variety, then the queries are still pretty simple, but you could create a stored procedure to make them even easier on the part of application developers. It's basically "WHERE person1 = 57 or person2 = 57" which will be able to use the indexes to provide quick results. Or something more like: SELECT person1 AS friend FROM friendship WHERE person2 = 57 UNION SELECT person2 AS friend FROM friendship WHERE person1 = 57; A view should work very well: CREATE VIEW friendship_view AS SELECT person1 AS person, person2 AS friend FROM friendship UNION SELECT person2 AS person, person1 AS friend FORM friendship; That should be a very performant view when a WHERE clause on person is specified. Those types of queries weren't a requirement in the implementation I did, as the code only ever asked "is person x a friend of person y" and never wanted the entire list. -- Bill Moran I need your help to succeed: http://gamesbybill.com -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general