Best way to store a threaded message list/tree in SQL

Mike Christensen <imaudi@xxxxxxxxxxx> · Thu, 26 Mar 2009 16:57:10 -0700

Hi guys -

I'm looking for the best way to store a set of "posts" as well as 
comments on those posts in SQL.  Imagine a design similar to a "Wall" on 
Facebook where users can write posts on their wall and other users can 
comment on those posts.  I need to be able to display all wall posts as 
well as the comments.

When I first started out, I came up with a table such as:

CREATE Table wallposts
(
 id uuid NOT NULL,
 posted timestamp NOT NULL,
 userid uuid NOT NULL,
 posterid uuid NOT NULL,
 parentid uuid NOT NULL,
 comment text NOT NULL
)

id is unique, parentid will be null on original posts and point to an id 
if the row is a comment on an existing post.  Easy enough and super fast 
to insert new data.  However, doing a select which would return me:

POST 1
COMMENT 1
COMMENT 2
POST 2
COMMENT 1
COMMENT 2

Regardless of which order the rows existed in the database proved to be 
extremely difficult.  I obviously can't just order by date, as someone 
might comment on post 1 after post 2 has been posted.  If I do a LEFT 
JOIN to get the parent post on all rows, and then sort by that date 
first, all the original posts group together as they'd have a value of null.

Then I got this idea:

CREATE TABLE wallposts
(
 id uuid NOT NULL,
 threadposted timestamp,
 posted timestamp,
 ...
 comment text
)

On an original post, threadposted and posted would be the same.  On a 
comment, timestamp would be the time the original post was posted and 
"posted" would be the time the comment on that thread was posted.  Now I 
can just do:

select * from wallposts order by threadposted, posted;

This works great, however one thing irks me.  If two people create a 
post at the same time, comments on the two posts would get munged 
together as they'd have the same timestamp.  I could use "ticks" instead 
of a datetime, but still the accuracy is only 1/1000 of a second.  I 
could also setup a unique constraint on threadposted and posted which 
makes inserts a bit more expensive, but if I had multiple database 
servers in a farm, the chance of a collision is still there.  I almost 
went ahead with this anyway since the chances of this happening are 
extremely small, but I wanted to see if I could eat my cake and still 
have it too.  Mostly for my own educational curiosity.

Third solution would be to store this data in the form of a graph.  Each 
node would have a v-left and v-right pointer.  I could order by "left" 
which would traverse the tree in the order I need.  However, every time 
someone inserts a comment I'd have to re balance the whole tree.  This 
would create a ton of row locking, and all sorts of problems if the site 
was very busy.  Plus, it's kinda extreme and also causes replication 
problems.  So I tossed this idea quickly.

I also thought about just storing the original posts and then 
serializing the comments in a binary form, since who cares about 
individual comments.  This would be very fast, however if a user wants 
to delete their comment or append a new comment to the end, I have to 
deserialize this data, modify the structure, then serialize it back and 
update the row.  If a bunch of people are commenting on the same post at 
the same time, I might have random issues with that.

So here's what I eventually did.  I query for all the posts ordered by 
date entered.  In the middle ware layer, I loop through the recordset 
and create a "stack" of original posts, each node on the stack points to 
a linked list of comments.  When I come across an original post, I push 
a new node on the stack and when I come across a comment I add a node to 
the linked list.  I organize this in memory so I can traverse the 
recordset once and have O(n).  After I create the in-memory 
representation of the wall, I traverse through this data structure again 
and write out HTML.  This works great and has super fast inserts and 
super fast selects, and no weird row locking issues; however it's a bit 
heavier on my presentation layer and requires me to build an in memory 
representation of the user's wall to move stuff around so it's in the 
right order.  Still, I believe this is the best approach I've found so far.

I thought I'd check with some other SQL experts and see if 1) Postgres 
has some super nifty tree functions that I've never heard of and 2) see 
if there's actually a way to do this using joins or unions or something 
which would still be performant with millions of users.  Thoughts?

Mike

--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general