On Wed, 2006-03-22 at 16:35, Tom Lane wrote: > "Jim C. Nasby" <jnasby@xxxxxxxxxxxxx> writes: > > Oh, so in other words, SELECT * INTO temp FROM table is inherently > > non-deterministic at the physical level, so the only way to be able to > > allow PITR to work is to duplicate all the physical changes. Darn. > > Well, lemme put it this way: I'm not prepared to require that PG be > deterministic at the physical level. One obvious source of > non-determinancy is the FSM, which is likely to hand out different free > space to different transactions depending on what else is going on at > the same time. There are others, such as deliberately random > tie-breaking during btree index insertion. While you're at talking about WAL and PITR... I see from the aboce discussion that PITR is already demanding special handling in the code (I hope I got this one right, as the following are based on this). What if the PITR logging would be disconnected from the WAL logging completely ? What I mean is to introduce a WAL subscription mechanism, which basically means some incoming connections where we stream the log records. We don't need to write them to disk at all in the normal case, I guess usually PITR will store the records on some other machine so it means network, not disk. And it doesn't need to be done synchronously, it can lag behind the running transactions, and we can do it in batches of WAL records. It also would mean that the local WAL does not need to log the things which are only needed for the PITR... that would likely mean some spared WAL disk activity. Of course it also would mean that the local WAL and PITR WAL are not the same, but that is not an issue I guess. It would also permit immediate recycling of the WAL files if the current archiving style is not used. The drawbacks I can see (please add yours): 1) the need for the subscription management code with the added complexity it implies; 2) problems if the WAL stream lags too much behind; 3) problems if the subscribed client's connection is interrupted; Nr. 2 could be solved by saving the PITR WAL separately if the lag grows over a threshold, and issue a warning. This could still be acceptable, as the writing doesn't have to be synchronous and can be made in relatively large blocks. There could be a second bigger lag threshold which completely cancels the subscription. All these thresholds should be configurable, as it depends on the application what's more important, to have the standby available all the time or have the primary faster if loaded... Nr. 3. can be solved by either canceling the subscription on connection drop, or by allowing a certain amount of time after which the subscription is canceled. The client can reconnect before this timeout expires. In the meantime the primary can store the PITR WAL on disk as mentioned above... Cheers, Csaba.