Re: Problem with large files on different OSes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Thu, 28 May 2009, Alan Manuel Gloria wrote:
> 
> If you'd prefer someone else to hack it, can you at least give me some
> pointers on which code files to start looking?  I'd really like to
> have proper large-file-packing support, where large file is anything
> much bigger than a megabyte or so.
> 
> Admittedly I'm not a filesystems guy and I can just barely grok git's
> blobs (they're the actual files, right? except they're named with
> their hash), but not packs (err, a bunch of files?) and trees (brown
> and green stuff you plant?).  Still, I can try to learn it.

The packs is a big part of the complexity.

If you were to keep the big files as unpacked blobs, that would be 
fairly simple - but the pack-file format is needed for fetching and 
pushing things, so it's not really an option.

For your particular case, the simplest approach is probably to just 
limit the delta search. Something like just saying "if the object is 
larger than X, don't even bother to try to delta it, and just pack it 
without delta compression". 

The code would still load that whole object in one go, but it sounds like 
you can handle _one_ object at a time. So for your case, I don't think you 
need a fundamental git change - you'd be ok with just an inefficient pack 
format for large files that are very expensive to pack otherwise.

You can already do that by using .gitattributes to not delta entries 
by name, but maybe it's worth doing explicitly by size too.

I realize that the "delta" attribute is apparently almost totally 
undocumented. But if your big blobs have a particular name pattern, what 
you should try is to do something like

 - in your '.gitattributes' file (or .git/info/attributes if you don't 
   want to check it in), add a line like

	*.img !delta

   which now sets the 'delta' attribute to false for all objects that 
   match the '*.img' pattern.

 - see if pack creation is now acceptable (ie do a "git gc" or try to push 
   somewhere)

Something like the following may also work, as a more generic "just don't 
even bother trying to delta huge files".

Totally untested. Maybe it works. Maybe it doesn't.

		Linus

---
 Documentation/config.txt |    7 +++++++
 builtin-pack-objects.c   |    9 +++++++++
 2 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2c03162..8c21027 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1238,6 +1238,13 @@ older version of git. If the `{asterisk}.pack` file is smaller than 2 GB, howeve
 you can use linkgit:git-index-pack[1] on the *.pack file to regenerate
 the `{asterisk}.idx` file.
 
+pack.packDeltaLimit::
+	The default maximum size of objects that we try to delta.
++
+Big files can be very expensive to delta, and if they are large binary
+blobs, there is likely little upside to it anyway. So just pack them
+as-is, and don't waste time on them.
+
 pack.packSizeLimit::
 	The default maximum size of a pack.  This setting only affects
 	packing to a file, i.e. the git:// protocol is unaffected.  It
diff --git a/builtin-pack-objects.c b/builtin-pack-objects.c
index 9742b45..9a0072b 100644
--- a/builtin-pack-objects.c
+++ b/builtin-pack-objects.c
@@ -85,6 +85,7 @@ static struct progress *progress_state;
 static int pack_compression_level = Z_DEFAULT_COMPRESSION;
 static int pack_compression_seen;
 
+static unsigned long pack_delta_limit = 64*1024*1024;
 static unsigned long delta_cache_size = 0;
 static unsigned long max_delta_cache_size = 0;
 static unsigned long cache_max_small_delta_size = 1000;
@@ -1270,6 +1271,10 @@ static int try_delta(struct unpacked *trg, struct unpacked *src,
 	if (trg_entry->type != src_entry->type)
 		return -1;
 
+	/* If we limit delta generation, don't even bother for larger blobs */
+	if (pack_delta_limit && trg_entry->size >= pack_delta_limit)
+		return -1;
+
 	/*
 	 * We do not bother to try a delta that we discarded
 	 * on an earlier try, but only when reusing delta data.
@@ -1865,6 +1870,10 @@ static int git_pack_config(const char *k, const char *v, void *cb)
 		pack_size_limit_cfg = git_config_ulong(k, v);
 		return 0;
 	}
+	if (!strcmp(k, "pack.packdeltalimit")) {
+		pack_delta_limit = git_config_ulong(k, v);
+		return 0;
+	}
 	return git_default_config(k, v, cb);
 }
 
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]