On 4/13/15 7:32 PM, David G. Johnston wrote:
The missing feature is an option to leaved restored the last
checkpoint. Instead, not knowing whether there were changes
since the last checkpoint, the system truncated the relation.
What use case is there for a behavior that the last checkpoint
data is left on the relation upon restarting - not knowing
whether it was possible the other data could have been written
subsequent?
I would like a way to have unlogged tables be available on a replica
provided that no changes were made to them between the pg_basebackup
and the recovery point.
My use case is that I mark certain read-only-after-bulk-loading
tables as unlogged solely to avoid blowing out the log archive
during the loading phase and refresh phase. This is stuff like
vendor catalogs, NCBI datasets, ChEMBL datasets, etc, which can
simply be re-derived from the reference. It would be nice if these
were still available (without having to repeat the ETL) after
crashes provided they were not written to since a checkpoint, and
available on cloned test servers without having to repeat the ETL on
those as well.
My gut reaction is that those should be in their own clusters and
accessed via postgres_fdw...
Likely to produce really crappy plans if the tables are of any real size...
That particular use-case would probably best be served with a separate
replication channel which pushes data files from the primary to the
slaves and allows for the slave to basically "rewrite" its existing
table by pointing to the newly supplied version. Some kind of "CREATE
STATIC TABLE" and "PUSH STATIC TABLE TO {all | replica name}" command
combo...though ideally with less manual intervention...
You still have the same problem of knowing if someone has scribbled on
the data since the last checkpoint.
There's been recent discussion of adding support for read-only tables.
If we had those, we might be able to support something like...
INSERT INTO unlogged;
ALTER TABLE unlogged SET READ ONLY;
CHECKPOINT;
/* take backup */
This should be safe as long as we WAL log changes to read-only status
(which presumably we would).
How much work that would entail though, I don't know.
Ultimately you still have to get the data over to the other machine
anyway. ISTM it'd be a LOT more useful to look at ways to make the WAL
logging of bulk inserts (and especially COPY into a known empty table) a
lot more efficient.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com
--
Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance