On 29.4.2015 18:54, Jonathan Vanasco wrote: > > On Apr 29, 2015, at 12:25 PM, Ladislav Lenart wrote: > >> Could you please explain to me the error(s) in my reasoning? > > Let me just flip your list in reverse... and add in some elements (marked with a *): > > posting ts context > p60 60 friend > p55 55 friend* > p54 54 friend* > p50 50 group > p50 49 group* > p50 49 group* > p40 40 friend > p30 30 group > p20 20 friend > p10 10 group > p0 0 friend > > With the 2 limited subqueries, the results would be: > 60F, 55F, 50G, 49G > > But the "most recent" data is > 50F, 55F, 54F, 50G > > So we end up showing 49 which is less relevant than 54. I would expect the overall query to return only 60F nad 55F as the most recent data. No? You expect it to return 4 items when the LIMIT is only 2. Remember that the overall query should be also ordered by ts and limited to 2. I thought you want most recent items across all contexts and not 2 most recent items from friends plus two most recent items from groups... Ladislav Lenart > In some situations this isn't much of an issue, but in others it is detrimental. > For example, one of my "feeds" contains a distribution of events according-to-type that is very uneven. While "friend" and "group" might be relatively close in time to one another, "system" or other events may be months old -- and that older content gets pulled in with this style of query. > > If you need to paginate the data and select the next 10 overall items, it gets even more complicated. > > IIRC, the best mix of performance and "product" that I've found is do something like this: > > SELECT * FROM ( > SELECT a,b,c FROM table_a ORDER BY TIMESTAMP DESC LIMIT 10000; > UNION > SELECT a,b,c FROM table_b ORDER BY TIMESTAMP DESC LIMIT 10000; > ) as unioned > order by unioned TIMESTAMP DESC LIMIT 100 OFFSET 0; > > by creating an artificial limit on the inner queries, you can save postgres from doing a lot of intensive i/o work and memory usage (like a lot) > then, joining a few lists and sorting 20k (or even 100k) items is really cheap. > the downside is that you effectively limit the 'relevancy' of the query to whatever the inner limit is (ie, 10000 -- not the combined total of 20000), but that number can be arbitrarily high enough that it is irrelevant while still showing the right amount of content for people. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general