Re: Restarting DB after moving to another drive

Francisco Olarte <folarte@xxxxxxxxxxxxxx> · Fri, 15 May 2015 18:19:53 +0200

HI Daniel:

On Fri, May 15, 2015 at 5:35 PM, Daniel Begin <jfd553@xxxxxxxxxxx> wrote:
> Bonjour Francisco.
Buenos dias.

> Skimming the documentation sequentially is a cleaver advice, especially since the doc is much of the time well done and exhaustive. Unfortunately, even if I actually did it about 1 year ago, it seems this specific item slipped out of my mind :-(

Bad luck, you managed to get one of the most important commands, but
it's solved now.

> About dump/restore operation, restoring the database cluster is running for 24hrs now (psql -f pgdumpallOutputfile  postgres). Since it took 13hrs to dump the cluster, I begin to wonder how long it is going to take to restore it...

Much longer than this, especially if as you say below you have a lot
of indexes. It's one of the reasons many of us do not use pg_dumpall
for anything but global objects, but use something like  the crontab
which John R. Pierce posted ( I use a similar thing, but with an
intermediate script with dumps critical databases more frequently,
skips recreatable ( may be someone can confirm if that word is right ?
I mean test things which can be created from scratch ( as they come
from a script ) ) databases and keeps several numbered copies ). Doing
it this ways insures we can restore on criticality order if it needs
to be done ( I even move unmodified partitions to a 'historic" schema,
which gets dumped only after a change, which cuts my backups times to
a tenth )

One thing. I do not know how you are restoring the database, but when
doing this things we use a specially tuned postgresql.conf ( fsync
off, minimal loging, lots of worrk mems and similar things, as we do
not care about durability ( you can just rerun initdb and redo the
restore, and there is only 1 session connected, the restoring one ).
This cuts the restore times to easily a tenth, then after ending it we
restart the server with the normal cong. It is a must when doing this
short of things.

> My main concern is about how the indexes are managed in dump/restore operations. I understand that pg_dumpall actually uses pg_dump where the doc says "Post-data items include definitions of indexes, triggers..." I would not worry if the doc said that indexes are simply copied but it says "includes definition of indexes".
> Since some of the indexes took days to build... does someone could confirm indexes are rebuilt instead of copied?
> If indexes are actually rebuilt, why should it be done that way? - There must be good reason!

You are out of luck, and it has a reason. First, pg_dumps does not
copy, it dumps. It's simpler behaviour ( text output ) just output a
SQL script which recreates everything and inserts all the data (
normally using copy for speed, but it's the same as inserting ). It
takes care of generating a fast script ( meaning it creates the
tables, then inserts the data, then creates indexes and reactivates
constraints, which is faster than defining everything and inserting
with indexes and constraints actives ).

The reason to do it in text mode is you can dump between different
version and/or architectures, and also the dump is much smaller than
the db, specially if you compress it ( I always do it, testing a bit
you can always find a compresor with will lead to faster backups, as
saved disk writing easily offsets compression times, specially in
moder multicpu memory rich machines ). Bear in mind in many scenarios
you backup a lot ( we dump some critical things hourly, even if we are
using replication ) and restore nearly never, and prefer to use a
couple days more for the restore than a couple hours of degraded
performance every backup.

This being said, if you have an 820G db ( I still do not know which
size is this, I suppose it's $PGDATA footprint ) of important data (
it does not seem critical in availability, as you are taking days and
still in bussiness ) and you are having these kind of problems to dump
and restore and move directories in your OS, and do not know how much
time it takes for backups, you have a problem. You should practice
backup AND restore more, because your question indicates you MAY be
backing up your data, but you have never restored a backup.

Also, the text output format is really good for the global objects in
pg_dumpall, but not so much for the normal databases. For this you
should use the custom format, unless it is a really small db. The
problem with it is it can only do a database per file, and needs
pg_restore to be read ( I know those are minors ). The advantage is
instead of generating a plain text dump it builds a kind of tar file
with the definitions and data for every object clearly separated, so
you can do partial restores or whatever thing you want ( in fact,
without options and without connecting to the database pg_restore
spits out the same text file that a text dump will generate ). If you
had used this technique you could have restored your tables in order,
or restored only the data and then reindexed them concurrently with
some other ( performance degraded ) work. You can do the same thing by
editing the text dump, but it gets impractical and really complex
beyond a few megabytes.

regards.
   Francisco Olarte.

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general