Re: Backup using GiT?

"Ciprian Dorin Craciun" <ciprian.craciun@xxxxxxxxx> · Sat, 14 Jun 2008 13:50:50 +0300



On Fri, Jun 13, 2008 at 11:11 PM, Alvaro Herrera
<alvherre@xxxxxxxxxxxxxxxxx> wrote:
> Tom Lane wrote:
>> "James B. Byrne" <byrnejb@xxxxxxxxxxxxx> writes:
>
>> > GiT works by compressing deltas of the contents of successive versions of file
>> > systems under repository control.  It treats binary objects as just another
>> > object under control.  The question is, are successive (compressed) dumps of
>> > an altered database sufficiently similar to make the deltas small enough to
>> > warrant this approach?
>>
>> No.  If you compress it, you can be pretty certain that the output will
>> be different from the first point of difference to the end of the file.
>> You'd have to work on uncompressed output, which might cost more than
>> you'd end up saving ...
>
> The other problem is that since the tables are not dumped in any
> consistent order, it's pretty unlikely that you'd get any similarity
> between two dumps of the same table.  To get any benefit, you'd need to
> get pg_dump to dump sorted tuples.
>
> --
> Alvaro Herrera                                http://www.CommandPrompt.com/
> The PostgreSQL Company - Command Prompt, Inc.
>
> --
> Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

    The idea of using GIT for backing-up databases is not that bad.

    I would propose the following:
    -- dump the creation script in a separate file; (or maybe one file
per object (table, view, function) etc.;)
    -- dump the content of each table in it's own file;
    -- dump the tuples sorted but in plain text (as COPY data or
INSERTS maybe); (as Alvaro suggested);
    -- don't use compression (as Tom and Chander suggested) because
GIT already uses compression for the packed files;

    One advantage of using GIT in the manner described previously will
be change tracking by doing just a simple git diff you could see the
modifications (inserts, updates, deletes, etc., schema alteration).
Going a step further you could also do merges between multiple
databases with the same structure (each database would have it's own
branch).

    Just imagine how simple a database schema upgrade will be in most
situations, when both the development and the deployed schema have
been modified and we want to put them into sync.

    As a conclusion I would subscribe to such an idea.

    Ciprian Craciun.