I'm looking at making a data warehouse to address our rapidly spiralling report query times against the OLTP. I'm looking first at what it would take to make this a real-time data warehouse, as opposed to batch-driven.
One approach I've seen used to achieve real time data warehousing is to have middleware that is intercepting all database writes and echoing them to a program that rolls up the data warehouse values and updates the facts, dimensions and so on on the fly. Another approach was to use triggers on the tables of interest to write to tables to journal the changes, which then get processed by a batch job to achieve the same thing.
One of the problems of the trigger on the transactional database approach is that if there is ever a problem with the trigger, the main transaction is affected. I'm not sure if that is avoidable with proper exception handling in the trigger code? It does mean a lot of trigger code to maintain, and another schema to maintain (the journalled changes), so there were several drawbacks.
I've had a quick search and haven't seen this approach used yet, but I was thinking, the asynchronous replication of Postgres databases could be used as a streaming journal of changes to be processed by a data warehouse. The other approach that suggests itself is WAL file shipping. I've not dug into the async rep protocol yet, before I do so I just wanted to get some brief feedback on whether I'm on the wrong track or not, and if there's some better approach I should be looking at first instead.
Any feedback much appreciated.
Regards,
Nigel