Re: newsfeed type query

Jonathan Vanasco <postgres@xxxxxxxx> · Wed, 29 Apr 2015 12:54:25 -0400

On Apr 29, 2015, at 12:25 PM, Ladislav Lenart wrote:

> Could you please explain to me the error(s) in my reasoning?

Let me just flip your list in reverse... and add in some elements (marked with a *):

posting  ts  context
p60      60  friend
p55      55 friend*
p54      54 friend*
p50      50  group
p50      49  group*
p50      49  group*
p40      40  friend
p30      30  group
p20      20  friend
p10      10  group
p0        0  friend

With the 2 limited subqueries, the results would be:
	60F, 55F, 50G, 49G

But the "most recent" data is
	50F, 55F, 54F, 50G

So we end up showing 49 which is less relevant than 54.

In some situations this isn't much of an issue, but in others it is detrimental. 
For example, one of my "feeds" contains a distribution of events according-to-type that is very uneven.  While "friend" and "group" might be relatively close in time to one another, "system" or other events may be months old -- and that older content gets pulled in with this style of query.  

If you need to paginate the data and select the next 10 overall items, it gets even more complicated.

IIRC, the best mix of performance and "product" that I've found is do something like this:

	SELECT * FROM (
		SELECT a,b,c FROM table_a ORDER BY TIMESTAMP DESC LIMIT 10000;
		UNION
		SELECT a,b,c FROM table_b ORDER BY TIMESTAMP DESC LIMIT 10000;
	) as unioned
 	order by unioned TIMESTAMP DESC LIMIT 100 OFFSET 0; 

by creating an artificial limit on the inner queries, you can save postgres from doing a lot of intensive i/o work and memory usage (like a lot)
then, joining a few lists and sorting 20k (or even 100k) items is really cheap.
the downside is that you effectively limit the 'relevancy' of the query to whatever the inner limit is (ie, 10000 -- not the combined total of 20000), but that number can be arbitrarily high enough that it is irrelevant while still showing the right amount of content for people.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general