Dear List,
I am interested in replicating views of my data in real time to a
frontend visualizer. I've looked around, and it seems that most
applications in this direction write some ad-hoc json-formatter that
spits out exactly the columns it is interested in. I want something
more like Cubes[1], where a user (or at least, some javascript) can
say "I am interested in this slice of the world", and then get updates
to that slice, but unlike Cubes it must be near-real-time: I want to
hook events, not just redownload tables.
In principle, I am looking for some way to say
```
CREATE VIEW view13131 AS select (id, name, bank_account) from actors
where age > 22;
WATCH view13131;
```
and get output to stdout like
```
....
INSERT view13131 VALUES (241, "Mortimer", 131.09);
...
INSERT view13131 VALUES (427, "Schezwan", 95.89);
UPDATE view13131 SET bank_account = 1017.12 WHERE id = 427;
DELETE FROM view13131 WHERE id = 92;
...
```
(and then I would stick a SQL-to-JSON proxy in the middle to make it
more web-happy, and handle disconnects with a corresponding UNWATCH
operation)
I am stumped about the best way to go about this. Is there any
extension that does this specific task for postgres?
CouchDB seems to have this implemented[2](!) but there are a lot of
reasons I am wary of going that route. dat[3] is specifically for
"real-time replication and versioning for data sets"; it's super new
and shakey, but promising.
I spent awhile trying to implement the replication protocol[4] before
I found a tip[5] which pointed out that the confusing, opaque, binary
data I was getting (like
b'x00\x98\x08\x00\x00\x00\x00\n\x00\x1f\x00\x10@bid\x00\x98\x08\x00\x00\x00\x00\n\x00\x1e\x00\x18@dicks\x00\x00\x00\x98\x08\x00\x00\x00\x00\x00\x00\x00\x00\n\x00\x10\x00\x18@event_2_2\x00\x00\x00\x15@\x00\x00\x00\x00\n\x00\x08\x00\x18@event_2_1\x00\x00\x00\x15@\x00\x00\x00\x00\n\x00\x01\x00\x18@event_2_0\x00\x00\x00\x15@\x00\x00\x00\x00\t\x00\x14\x00 @event_2_2_txid_idx\x00\x00\x15@\x00\x00\x00\x00\t\x00\x0f\x00\x18@event_2_2\x00\x00\x00\x15@\x00\x00\x00\x00\t\x00\x0e\x00 @event_2_1_txid_idx\x00\x00\x15@') is--I believe, please correct me if I'm wrong--a verbatim copy of postgres's internal data structures. Will it pay off to reverse and reimplement these data structures in javascript? The tipster seemed to think not, but that was 6 years ago. Also, this solution doesn't give me the ability to slice data, though I could hack it with some kind of ridiculous proxy database
setup.
I discovered Skytools[6]'s Londiste, which will replicate only
specific tables and seems very close to what I want, but it seems like
it has a lot of administrative overhead and is targetted at
postgres-to-postgres log shipping. Does anyone know if I can hook it
somewhere in the middle in order to extract the CREATE, UPDATE and
DELETE events?
My last option that I am considering is writing code myself which sets
and unsets Postgres triggers corresponding to each WATCH statement. I
could implement this as PL/pgSQL or on Python+SQLAlchemy. This seems
like it might end up fragile, so if I do end up going this route, I
would appreciate any tidbits and gotchas you might have to share. If I
ALTER VIEW will the triggers all fire appropriately? Can I even set
triggers on views?
Thanks in advance
-Nick Guenther
4B Stats/CS
University of Waterloo
[1] Cubes <http://cubes.databrewery.org/>
[2] CouchDB. "Filtered Replication".
http://couchdb.readthedocs.org/en/latest/replication/protocol.html#filter-replication
[3] https://github.com/maxogden/dat
[4] Postgres Streaming Replication Protocol
<http://www.postgresql.org/docs/current/static/protocol-replication.html> /
Guide <http://guide.couchdb.org/draft/notifications.html#continuous>
[5] Erik Jones, "Re: reading WAL files in python"
<http://www.postgresql.org/message-id/CE398E79-FFEF-4219-9606-F5D28AD2E665@xxxxxxxxxx>
[6] SkyTools <http://wiki.postgresql.org/wiki/Skytools>