On Tue, Sep 25, 2012 at 9:09 AM, hartrc <rhart2@xxxxxx> wrote: > My version: PostgreSQL v9.1.5 > Version string: "PostgreSQL 9.1.5 on x86_64-unknown-linux-gnu, compiled by > gcc (SUSE Linux) 4.3.4 [gcc-4_3-branch revision 152973], 64-bit" > > Basically my question is: > Is there currently any way to avoid wal generation during data load for > given tables and then have point in time recovery after that? > > Background and blurb > The situation i'm referring to here is for a development environment. I > require point in time recovery because if there is crash etc I don't want to > lose up to a days work for 12 developers. What is it that is being developed? If you are developing applications that use postgres for storage, then with a crash of that database you should not lose the developers' work, unless you are using the same cluster as both the source-code repository and the database to which the test applications being developed connect (which seems like a bad idea) Or are you not documenting your database schema and code anywhere other than in the database itself? That too seems like a bad idea, at least for a busy multi-person development shop. > A developer did a data load yesterday of approximately 5GB of data into a > new schema. This generated approximately 7GB of wal. The situation arises > where if something is incorrect in the data load the data load may need to > be repeated 2 or 3 times (thus generating 20GB +of WAL). Is the problem the IO generated by this, or the short-term storage, or the long term storage of it? If it is the storage, you could do a specially-scheduled backup as soon as the load is done, and then delete the WAL soon. Are you compressing your WAL? If not, then doing that might be enough to solve the problem. It sounds like the main thing your group "produces" is code development, and from what you describe the database is a "production" database since losing it loses your product. If that is the case, you really need a "development" database as well. This could run in archive_mode=off, and then the 5GB would not go to "production" until they are fairly confident it will work and not have to be repeated. So that would cut out 2/3 of the WAL. Or, they could create the table as "unlogged" until they know the bulk load works, then recreate it as logged and repeat the load. (I think that currently there is no way to make an unlogged table then promote it to logged, that would probably be ideal for you, but someday....) > For a data load i > don't want wal to be generated. I accept the fact before there was nothing > and from the point of the next pg_basebackup there was everything. It is > from the point i say ok that is everything (the next backup) that i want > point in time recovery to apply to that table. > It is doesn't seem practical, and appears very risky to turn off wal_archive > during the data load. How long does the bulk load plus one backup plus two database restarts take? That would probably be far less than one day, so the risk of turning off archive_mode temporarily seems like it should be minimal. If your hard drives were to die horribly, how long would it take you replace them and restore the database from the last backup and roll it forward? That could calibrate how much past work you are willing to lose in the event it comes to that. Cheers, Jeff -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general