Search Postgresql Archives

Best way to store a threaded message list/tree in SQL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi guys -

I'm looking for the best way to store a set of "posts" as well as comments on those posts in SQL. Imagine a design similar to a "Wall" on Facebook where users can write posts on their wall and other users can comment on those posts. I need to be able to display all wall posts as well as the comments.

When I first started out, I came up with a table such as:

CREATE Table wallposts
(
 id uuid NOT NULL,
 posted timestamp NOT NULL,
 userid uuid NOT NULL,
 posterid uuid NOT NULL,
 parentid uuid NOT NULL,
 comment text NOT NULL
)

id is unique, parentid will be null on original posts and point to an id if the row is a comment on an existing post. Easy enough and super fast to insert new data. However, doing a select which would return me:

POST 1
COMMENT 1
COMMENT 2
POST 2
COMMENT 1
COMMENT 2

Regardless of which order the rows existed in the database proved to be extremely difficult. I obviously can't just order by date, as someone might comment on post 1 after post 2 has been posted. If I do a LEFT JOIN to get the parent post on all rows, and then sort by that date first, all the original posts group together as they'd have a value of null.

Then I got this idea:

CREATE TABLE wallposts
(
 id uuid NOT NULL,
 threadposted timestamp,
 posted timestamp,
 ...
 comment text
)

On an original post, threadposted and posted would be the same. On a comment, timestamp would be the time the original post was posted and "posted" would be the time the comment on that thread was posted. Now I can just do:

select * from wallposts order by threadposted, posted;

This works great, however one thing irks me. If two people create a post at the same time, comments on the two posts would get munged together as they'd have the same timestamp. I could use "ticks" instead of a datetime, but still the accuracy is only 1/1000 of a second. I could also setup a unique constraint on threadposted and posted which makes inserts a bit more expensive, but if I had multiple database servers in a farm, the chance of a collision is still there. I almost went ahead with this anyway since the chances of this happening are extremely small, but I wanted to see if I could eat my cake and still have it too. Mostly for my own educational curiosity.

Third solution would be to store this data in the form of a graph. Each node would have a v-left and v-right pointer. I could order by "left" which would traverse the tree in the order I need. However, every time someone inserts a comment I'd have to re balance the whole tree. This would create a ton of row locking, and all sorts of problems if the site was very busy. Plus, it's kinda extreme and also causes replication problems. So I tossed this idea quickly.

I also thought about just storing the original posts and then serializing the comments in a binary form, since who cares about individual comments. This would be very fast, however if a user wants to delete their comment or append a new comment to the end, I have to deserialize this data, modify the structure, then serialize it back and update the row. If a bunch of people are commenting on the same post at the same time, I might have random issues with that.

So here's what I eventually did. I query for all the posts ordered by date entered. In the middle ware layer, I loop through the recordset and create a "stack" of original posts, each node on the stack points to a linked list of comments. When I come across an original post, I push a new node on the stack and when I come across a comment I add a node to the linked list. I organize this in memory so I can traverse the recordset once and have O(n). After I create the in-memory representation of the wall, I traverse through this data structure again and write out HTML. This works great and has super fast inserts and super fast selects, and no weird row locking issues; however it's a bit heavier on my presentation layer and requires me to build an in memory representation of the user's wall to move stuff around so it's in the right order. Still, I believe this is the best approach I've found so far.

I thought I'd check with some other SQL experts and see if 1) Postgres has some super nifty tree functions that I've never heard of and 2) see if there's actually a way to do this using joins or unions or something which would still be performant with millions of users. Thoughts?

Mike

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux