Re: Update on SoC proposal: git-remote-svn

David Michael Barr <david.barr@xxxxxxxxxxxx> · Wed, 14 Apr 2010 22:52:02 +1000

Hi Steve,

> In reading this I wondered how a svn dump of one of the
> repositories monitor would size.  If I were to check out the svn
> root of that repository, I would use well over 3TB of disk space
> to have that checked out, I filled my 750GB drive with about a
> third of it checked out.  About 256MB of code with thousands
> of tags and hundreds of branches.

I encountered this issue with my first attempt to validate the output of 
my dump conversion tool. My case wasn't as dire, 350GB would
have sufficed but I was working in a 160GB partition.
Checking out tags side by side is a sure way to fill your disk.

> It looks like svnadmin dump defaults to dumping all data.
> Fortunately it has a delta option, which looks like it would be
> needed to dump this repository I am speaking of without filling
> up many hard drives.

The svn dump format is not quite that silly, even without deltification
it doesn't output blobs that are just an unaltered copy from a
previous revision.
Handling deltified dumps will greatly increase the complexity of the
import process. Blob content would have be computed from existing
blobs rather than simply passed through.

> This might also be helped if the dumps are chunked into ranges
> for many thousands of commits as well, this would keep the files
> more manageable

Being able to handle a dump stream reassembled from such
piecewise dumps is an important feature which I haven't finished
implementing yet.

> Just food for thought.

Thanks for the feed.

--
David Barr

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html