Leaving large binaries out of the packfile

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



 Hi.

I've been dealing with a Subversion repository that contains a lot of large binaries. Git generally seems to handle them reasonably enough, although it chokes under the pressure of a 'git gc' with this git-svn repository. The repository packs total 2.7 gigabytes. As it turns out, the 250 individual blob revisions worth of large binaries are about 2.4 gigabytes of that.

Sometimes, 'git gc' runs out of memory. I have to discover which file is causing the problem, so I can add it to .gitattributes with a '-delta' flag. Mostly, though, the repacking takes forever, and I dread running the operation.

As an experiment, I added a '-pack' flag to .gitattributes. This flag will leave the file type specified by the .gitattributes entry loose in the repository. During a 'git gc', instead of recopying gigabytes of data each time, the loose objects are used. The 'git gc' process runs very quick with this change.

The only issue I've found is in too_many_loose_objects(). gitk is always telling me the repository needs to be packed, obviously because of all the loose objects.

I haven't yet come up with a good idea for handling this. I thought about putting the forced loose objects in a separate directory. (This idea goes along with another that I want to build on top of this functionality, the ability to commit and have -pack binaries go to an alternates location.) I have also thought about writing out a file with the count of forced loose objects and using that to drive the guesstimate made by too_many_loose_objects() down.

Does anyone have any thoughts?

Thanks!

Josh

---
 builtin/pack-objects.c |   25 +++++++++++++++++++++++++
 1 files changed, 25 insertions(+), 0 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 214d7ef..f33a7fb 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -644,6 +644,28 @@ static int no_try_delta(const char *path)
     return 0;
 }

+static void setup_pack_attr_check(struct git_attr_check *check)
+{
+    static struct git_attr *attr_pack;
+
+    if (!attr_pack)
+        attr_pack = git_attr("pack");
+
+    check[0].attr = attr_pack;
+}
+
+static int must_pack(const char *path)
+{
+    struct git_attr_check check[1];
+
+    setup_pack_attr_check(check);
+    if (git_checkattr(path, ARRAY_SIZE(check), check))
+        return 1;
+    if (ATTR_FALSE(check->value))
+        return 0;
+    return 1;
+}
+
static int add_object_entry(const unsigned char *sha1, enum object_type type,
                 const char *name, int exclude)
 {
@@ -667,6 +689,9 @@ static int add_object_entry(const unsigned char *sha1, enum object_type type,
     if (!exclude && local && has_loose_object_nonlocal(sha1))
         return 0;

+    if (name && !must_pack(name))
+        return 0;
+
     for (p = packed_git; p; p = p->next) {
         off_t offset = find_pack_entry_one(sha1, p);
         if (offset) {
--
1.7.1.msysgit.3.1.g108b5.dirty

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]