Re: Optimize query for listing un-read messages

Andreas Joseph Krogh <andreas@xxxxxxxxxx> · Sat, 3 May 2014 23:29:21 +0200 (CEST)

På lørdag 03. mai 2014 kl. 23:21:21, skrev Alban Hertroys <haramrae@xxxxxxxxx>:

On 03 May 2014, at 12:45, Andreas Joseph Krogh <andreas@xxxxxxxxxx> wrote:

> Do you really need to query message_property twice? I would think this would give the same results:

>

> SELECT

>     m.id                          AS message_id,

>     prop.person_id,

>     coalesce(prop.is_read, FALSE) AS is_read,

>     m.subject

> FROM message m

>     LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND prop.person_id = 1 AND prop.is_read = FALSE

> ;

Ah yes, of course that would match a bit too much. This however does give the same results:

SELECT

   m.id                          AS message_id,

   prop.person_id,

   coalesce(prop.is_read, FALSE) AS is_read,

   m.subject

FROM message m

   LEFT OUTER JOIN message_property prop ON prop.message_id = m.id AND prop.person_id = 1

WHERE prop.is_read IS NULL OR prop.is_read = FALSE

;

That shaves off half the time of the query here, namely one indexscan.

The remaining time appears to be spent finding the rows in “message" that do not have a corresponding “message_property" for the given (message_id, person_id) tuple. It’s basically trying to find no needle in a haystack, you won’t know that there is no needle until you’ve searched the entire haystack.

It does seem to help a bit to create separate indexes on message_property.message_id and  message_property.person_id; that reduces the sizes of the indexes that the database needs to match and merge other in order to find the missing message_id’s.

I think the consesus here is to create a caching-table, there's no way around it as PG is unable to index the difference between two sets.

--
Andreas Jospeh Krogh

CTO / Partner - Visena AS

Mobile: +47 909 56 963

andreas@xxxxxxxxxx

www.visena.com