Search Postgresql Archives

Re: Backup advice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 15 Apr 2013 19:54:15 -0700
Jeff Janes <jeff.janes@xxxxxxxxx> wrote:

> On Tue, Apr 9, 2013 at 3:05 AM, Eduardo Morras
> <emorrasg@xxxxxxxx<javascript:_e({}, 'cvml', 'emorrasg@xxxxxxxx');>
> > wrote:
> 
> > On Mon, 8 Apr 2013 10:40:16 -0500
> > Shaun Thomas <sthomas@xxxxxxxxxxxxxxxx <javascript:_e({}, 'cvml',
> > 'sthomas@xxxxxxxxxxxxxxxx');>> wrote:
> >
> > >
> > > Anyone else?
> > >
> >
> > If his db has low inserts/updates/deletes he can use diff between pg_dumps
> > (with default -Fp) before compressing.
> >
> 
> Most "diff" implementations will read the entirety of both files into
> memory, so may not work well with 200GB of data, unless it is broken into a
> large number of much smaller files.
> 
> open-vcdiff only reads one of the files into memory, but I couldn't really
> figure out what happens memory-wise when you try to undo the resulting
> patch, the documentation is a bit mysterious.
> 
> xdelta3 will "work" on streamed files of unlimited size, but it doesn't
> work very well unless the files fit in memory, or have the analogous data
> in the same order between the two files.

I use for my 12-13 GB dump files:

git diff -p 1.sql 2.sql > diff.patch


It uses 4MB for firts phase and upto 140MB on last one and makes a patch file that can be recovered with:

patch 1.sql < diff.patch > 2.sql

or using git apply.

> A while ago I did some attempts to "co-compress" dump files, based on the
> notion that the pg_dump text format does not have \n within records so it
> is sortable as ordinary text, and that usually tables have their "stable"
> columns, like a pk, near the beginning of the table and volatile columns
> near the end, so that sorting the lines of several dump files together will
> gather replicate or near-replicate lines together where ordinary
> compression algorithms can work their magic.  So if you tag each line with
> its line number and which file it originally came from, then sort the lines
> (skipping the tag), you get much better compression.  But not nearly as
> good as open-vcdiff, assuming you have the RAM to spare.
>
> Using two dumps taken months apart on a slowly-changing database, it worked
> fairly well:
> 
> cat 1.sql | pigz |wc -c
> 329833147
> 
> cat 2.sql | pigz |wc -c
> 353716759
> 
> cat 1.sql 2.sql | pigz |wc -c
> 683548147
> 
> sort -k2 <(perl -lne 'print "${.}a\t$_"' 1.sql) <(perl -lne 'print
> "${.}b\t$_"' 2.sql) | pigz |wc -c
> 436350774
> 
> A certain file could be recovered by, for example:
> 
> zcat group_compressed.gz |sort -n|perl -lne 's/^(\d+b\t)// and print' >
> 2.sql2

Be careful, some z* utils decompress the whole file on /tmp (zdiff).

> There all kinds of short-comings here, of course, it was just a quick and
> dirty proof of concept.

A nice one !

> For now I think storage is cheap enough for what I need to do to make this
> not worth fleshing it out any more.
> 
> Cheers,
> 
> Jeff


---   ---
Eduardo Morras <emorrasg@xxxxxxxx>


-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]
  Powered by Linux