Re: Watching for view changes

Mitar <mmitar@xxxxxxxxx> · Sun, 6 Jan 2019 21:40:57 -0800

Hi!

I have a followup to this thread.

George, thanks for all the help. I spend some time now investigating
various approaches here and I am reporting here some findings, so that
they might help others as well.

First, I have tried the approach with MATERIALIZED VIEWs to hold the
cached contents of the query. I attached triggers to all source tables
which then notified (using LISTEN/NOTIFY) the client about changes.
Client throttled those notifications and eventually triggered a
REFRESH MATERIALIZED VIEW. Because I also attached triggers on the
materialized view, I got notifications (using LISTEN/NOTIFY) of what
rows have changed. Client might decided to fetch also rows themselves.
For this to work well I made two patches. A patch to allow creation of
TEMPORARY MATERIALIZED VIEWs [1] and a patch to allow attaching
triggers on materialized views [2]. In the second patch I also changed
the REFRESH MATERIALIZED VIEW CONCURRENTLY logic to issue not just
REMOVEs and INSERTs for changed rows, but to compute which rows have
changed and issue UPDATEs for them as well. This makes it easier for
the client to know what changed.

This worked well but it was slower than some other packages I have
found on the web which were trying to provide a similar functionality.
I made a benchmark to compare them [3] and found out that there was
room for improvement.

First, instead of sending updates of a MATERIALIZED VIEW using
LISTEN/NOTIFY and then fetching rows, I instead used a trigger to copy
changes to another temporary table, and then just used DELETE FROM
temp_table RETURNING * to get all results from a table and returning
it to the client, all inside same transaction, so data in that
temporary table was never committed.

This made things a bit better, mostly latency between a change and
getting it to the client became more predictable. Before there were
sometimes quite large spikes. Once I did that I realized that in fact
MATERIALIZED VIEW is not really necessary. All I need is a place to
cache previous results of the query, but I do not really care about
updates to the MATERIALIZED VIEW. So I decided to do the following. If
I already have a temporary table with previous results, when I want to
refresh my query, I create a new temporary table using CREATE TABLE AS
using the PREPAREd query, I compute diff between those two tables in
the same way as REFRESH MATERIALIZED VIEW CONCURRENTLY does, in one
query, and I return those results to the client. Then I just DROP
TABLE old cache table, and rename new cache table to old cache name.
So instead of computing a diff, updating materialized view, running
triggers, and copying to the table, I just compute a diff and this is
it.

This works now very well. I made and published a package doings this [4].

Of course, the main problem is still that for every change in source
table I have to eventually refresh the query. And without more logic
this can really become problematic. Ideally, we could reuse some
ideas/code from Incremental View Maintenance [5] to compute what
changes to the query results should happen based on changes to source
tables. Then we could just push those to the client (and update
temporary table). We would then do full refresh only when things could
not be inferred.

Thanks again for everything.

[1] https://commitfest.postgresql.org/21/1951/
[2] https://commitfest.postgresql.org/21/1948/
[3] https://github.com/mitar/node-pg-reactivity-benchmark
[4] https://github.com/tozd/node-reactive-postgres
[5] https://www.postgresql.org/message-id/flat/20181227215726.4d166b4874f8983a641123f5%40sraoss.co.jp

Mitar

-- 
http://mitar.tnode.com/
https://twitter.com/mitar_m