Re: [RFC Patch] Preventing corrupt objects from entering the repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Feb 11, 2008 at 03:41:06PM -0500, Nicolas Pitre wrote:
> On Mon, 11 Feb 2008, Martin Koegler wrote:
> > Please remember, that --strict is used for pushing data.
> 
> And what exactly does it provide in terms of safetiness that index-pack 
> doesn't already do?  I'm not sure I fully understood the goal here.

The patch is about verifing the content of objects. My intented use is
for running a (restricted) server, which allows pushing via git-daemon
(or git-shell) from (untrusted) users. It can be also used to catch
corrupt objects, before they leave your repository, but this is for me
only a side effect.

receive-pack accepts any crap (anything, which you can pipe into
git-hash-object with -t parameter of your choice), if the pack file
format is correct.

But lots of code in git assums that the object content is welformd.

Having such objects somewhere reachable in your repository will
disturb cloning (In the best case you get a error messages, in the
worst a child process of upload-pack segfaults), gitweb, ... . To fix
it, you will need a person with native access to the repository in the
worst case. 

> > Maybe I have missed something, but all repack problems reported on the
> > git mailing list happen durring the deltifing phase. The problematic
> > files are mostly bigger blobs. I'm aware of these problems, so my
> > patch does not keep any blobs in memory.
> 
> This is not only repack, which is something that few people should run 
> anyway.
> 
> It is also index-pack which takes an awful amount of time on the OOO 
> repo (much less than the repack but still), and that's something that is 
> visible to anyone cloning it.
> 
> So the comparison with pack-objects is useless.  In absolute terms, 
> index-pack has to inprove, not regress.

The patch does not change the cloning/fetching behaviour:
@@ -18,6 +19,7 @@ struct object_entry
        unsigned int hdr_size;
        enum object_type type;
        enum object_type real_type;
+       struct object *obj;
 };

 union delta_base {
@@ -44,6 +49,7 @@ static int nr_deltas;
 static int nr_resolved_deltas;

 static int from_stdin;
+static int strict;
 static int verbose;

 static struct progress *progress;
@@ -325,8 +366,8 @@ static int find_delta_children(const union delta_base *base,
        return 0;
 }

-static void sha1_object(const void *data, unsigned long size,
-                       enum object_type type, unsigned char *sha1)
+static struct object* sha1_object(const void *data, unsigned long size,
+                                 enum object_type type, unsigned char *sha1)
 {
        hash_sha1_file(data, size, typename(type), sha1);
        if (has_sha1_file(sha1)) {
@@ -341,6 +382,29 @@ static void sha1_object(const void *data, unsigned long size,
                        die("SHA1 COLLISION FOUND WITH %s !", sha1_to_hex(sha1));
                free(has_data);
        }
+       if (strict) {
[..]
+       return NULL;
 }

 static void resolve_delta(struct object_entry *delta_obj, void *base_data,
@@ -361,7 +425,7 @@ static void resolve_delta(struct object_entry *delta_obj, void *base_data,
        free(delta_data);
        if (!result)
                bad_object(delta_obj->idx.offset, "failed to apply delta");
-       sha1_object(result, result_size, type, delta_obj->idx.sha1);
+       delta_obj->obj = sha1_object(result, result_size, type, delta_obj->idx.sha1);
        nr_resolved_deltas++;

        hashcpy(delta_base.sha1, delta_obj->idx.sha1);
@@ -420,7 +484,7 @@ static void parse_pack_objects(unsigned char *sha1)
                        delta->obj_no = i;
                        delta++;
                } else
-                       sha1_object(data, obj->size, obj->type, obj->idx.sha1);
+                       obj->obj = sha1_object(data, obj->size, obj->type, obj->idx.sha1);
                free(data);
                display_progress(progress, i+1);
        }
@@ -714,6 +778,8 @@ int main(int argc, char **argv)
                                from_stdin = 1;
                        } else if (!strcmp(arg, "--fix-thin")) {
                                fix_thin_pack = 1;
+                       } else if (!strcmp(arg, "--strict")) {
+                               strict = 1;
                        } else if (!strcmp(arg, "--keep")) {
                                keep_msg = "";
                        } else if (!prefixcmp(arg, "--keep=")) {
@@ -812,6 +878,8 @@ int main(int argc, char **argv)
                            nr_deltas - nr_resolved_deltas);
        }
        free(deltas);
+       if (strict)
+               check_objects();

        idx_objects = xmalloc((nr_objects) * sizeof(struct pack_idx_entry *));
        for (i = 0; i < nr_objects; i++)

Do you see any problem, if int strict is zero?

For pushing, you must distinguish two case:

* pushing some changes/new branches
  You will have a maximum of a few thousands of objects.

* pushing the whole content into an empty repository

  This is not an every day operation. 

  I implemented a config option to disable the checking. It can be
  used safe time/memory in such cases.
 
  To make this working for OOO repository, my patch will probable need
  more tuning.

> > So my conclusion is, that the memory usage of index-pack with --strict
> > should not be too worse compared to pack-objects.
> 
> Well, the comparison is flawed anyway.  Agressively repacking the OOO 
> repo is reported to take around 14 GB of RAM if not bounded, whereas 
> index-pack remained around 300MB.  But the buffer cache managed by the 
> kernel is also a big performance factor and anything that requires more 
> memory in user space will affect that cache.

The 14 GB is caused by the memory used in the deltifing phase. I
compared it to the counting object phase.

> Then remember that this repo has 2411764 objects.

How many blobs, tags, trees and commits? What is the uncompressed size
of all tags? What is the uncompressed size of all trees? What is the
uncompressed size of all commits?

This information would give me a clue, what bad effects my approach
would have.

mfg Martin Kögler
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux