James B. Byrne wrote:
I have recently had to teach myself how to use git and the thought came to me
that this tool might provide a fairly low setup cost way of passing pg_dumps
over the network to our off site data store. Think Rsync, but on a file
content basis; just the content diff gets transmitted.
GiT works by compressing deltas of the contents of successive versions of file
systems under repository control. It treats binary objects as just another
object under control. The question is, are successive (compressed) dumps of
an altered database sufficiently similar to make the deltas small enough to
warrant this approach?
Comments? (not my my sanity, please)
It probably depends on the number of changes in the database. For
example, a vacuum followed by an insert could result in records that
were previously at the start of the dump being somewhere else -like the
middle of the dump (i.e., a dead tuple is marked as available, then the
space is "used" for an insert). In such a case, you would end up with a
row that was unchanged, but in a different location in the file. Would
GIT then back that up? I would think so. So in essence you'd be
getting "at least a diff, but likely more" . Of course, I'm assuming
you are just dumping the data in a table using pg_dump....once you start
talking about a dumpall, you might find that smaller changes (i.e., give
a user a new privilege) causes stuff to be offset more.... Add
compression into the mix and I think you could find that there are
little/no similarities..
On the other hand, if you were only doing inserts into an optimized (no
dead tuples) table, I would think that you'd get a much better result.
Perhaps you would be better off using PITR in such cases?
--
Chander Ganesan
Open Technology Group, Inc.
One Copley Parkway, Suite 210
Morrisville, NC 27560
919-463-0999/877-258-8987
http://www.otg-nc.com