2010/4/30 Mike Christensen <mike@xxxxxxxxxxxxx>: > Ok I've been blatantly lying, err, purposely simplifying the problem for the > sake of the original email :) > > I've read over the responses, and am actually now considering just not using > any index at all. Here's why: > > First, this actually isn't the only thing on the WHERE clause. It will only > query for users who are "friends" with you so it can notify them of your > activities. That's done via a weird JOIN on a table that holds all the > friend relationships. So in reality, it will only load maybe a hundred > rows, or maybe a thousand every once in a while if you're way popular. If > I'm not mistaken, it should use the index to narrow it down to the list of > friends, and then use a sequential scan to weed out the ones who subscribe > to that type of notification. > > Second, the only thing /ever/ that will do this query is the queue service > whose job it is to process notifications (which are files dropped on the > file system) and email people all day long. This service handles one job at > a time, and could potentially run on its own machine with its own read-only > copy of the database. Thus, even if it was a fairly slow query, it's not > gonna bring down the rest of the site. > > Regarding the idea of putting an index on each bit, I thought about this > earlier as well as just kinda cringed. The users table gets updated quite a > bit (last logon, session id, any time they change their profile info, > etc).. Too many indexes is bad. I could just put the data in another table > of course, which lead me to another idea. Have a table called Subscriptions > and have each row hold a user id and a notification type. I could index > both, and join on (Subscriptions.UserId = Users.UserId AND > Subscriptions.Type = 8). This would be pretty dang fast, however updates > are kinda a royal pain. When the user changes which types of subscriptions > they want (via a list of checkboxes), I'd have to figure out which rows to > delete and which new ones to insert. However, I think I have an idea in > mind for a PgSQL function you pass in the bitmask to and then it > "translates" it to conditional deletes and inserts. > > A third idea I'm tossing around is just not worry about it. Put the bitmask > in the DB, but not filter on it. Every "friend" would be loaded into the > dataset, but the queue processor would just "skip" rows if they didn't > subscribe to that event. In other words, move the problem down to the > business layer. The drawback is potentially large number of rows are > loaded, serialized, etc into memory that will just be ignored. But of > course the DB is probably a read-only copy and it's not even close to the > bottle neck of the email queue under heavy load, so it's probably a > non-issue. If mailing is slow, I just add more queue services.. > > I'm exploring all these ideas. I predict using the bitwise AND on the where > clause isn't gonna be the worst design ever, and it's sure easier to > implement than a table of subscriptions. What do you guys think? I would say "normalize". Which means that I like your "separate table" idea best. It's clear, obvious, and 3NF - conforming solution. Changing the set of subscriptions with delete-update-insert combo is not so bad as you would think. Encapsulating it in some kind of functional API looks nice too. Filip -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general