On Sun, Feb 10, 2008 at 04:00:44PM -0800, Junio C Hamano wrote: > mkoegler@xxxxxxxxxxxxxxxxx (Martin Koegler) writes: > > * add --strict option to unpack-objects (patch 1,8,9) > > * add --strict option to index-pack (patch 8,10) > > > > Same as for unpack-objects, but without writting objects. > > > > * add config option for receive pack to enable checking (patch 11) > > If this patch is any good, I strongly suspect it should not be > just the default but should always be on. IOW no config is > necessary. That would make the series a bit simpler, I guess. Not very much. In patch 11, we would not need to parse a config variable and could pass --strict option unconditional to index-pack/unpack-objects. For the moment, I would like to keep it as an option (enabled by default). If somebody has a repository with a totally broken history, he would not be able to push any more, if there is no way to disable it. > > From 76e86fe55345e633c910d6b8fe166e27c23c5aaf Mon Sep 17 00:00:00 2001 > > From: Martin Koegler <mkoegler@xxxxxxxxxxxxxxxxx> > > Date: Fri, 8 Feb 2008 08:51:38 +0100 > > Subject: [PATCH 01/12] unpack-object: cache for non written objects > > > > Preventing objects with broken links entering the repository > > means, that write of some objects must be delayed. > > > > This patch adds a cache to keep the object data in memory. The delta > > resolving code must also search in the cache. > > I have to wonder what the memory pressure in real-life usage > will be like. > > When an object is proven to be good, we should be able to free > its buffer after writing it out, but would that be a good enough > optimization we can make later on this code to keep its memory > consumption manageable? This code is only used by unpack-objects, which is used for small packs. It only caches metadata (tree,commit,tag), no blobs. So the memory usage should not be a problem. > > diff --git a/fsck.c b/fsck.c > > new file mode 100644 > > index 0000000..089f775 > > --- /dev/null > > +++ b/fsck.c > > @@ -0,0 +1,84 @@ > > +#include "cache.h" > > +#include "object.h" > > +#include "blob.h" > > +#include "tree.h" > > +#include "tree-walk.h" > > +#include "commit.h" > > +#include "tag.h" > > +#include "fsck.h" > > + > > +static int fsck_walk_tree(struct tree* tree, fsck_walk_func walk, void* data) > > +{ > > + struct tree_desc desc; > > + struct name_entry entry; > > + > > + if(parse_tree(tree)) > > + return -1; > > It's a bit hard to see how these new set of functions relate to > the original code in this patch series, because you add the new > things that are initially not used anywhere independently, start > referring to them in a separate patch and then remove the old > related functions that are now unused. This style makes > reviewing easier and harder at the same time... Will try to restructure the patches. > > From 80b22c3f2c3e13c207790a49646020c55b34bba7 Mon Sep 17 00:00:00 2001 > > From: Martin Koegler <mkoegler@xxxxxxxxxxxxxxxxx> > > Date: Fri, 8 Feb 2008 09:01:50 +0100 > > Subject: [PATCH 03/12] fsck: move mark-reachable to fsck_walk > > > > Signed-off-by: Martin Koegler <mkoegler@xxxxxxxxxxxxxxxxx> > > --- > > builtin-fsck.c | 34 ++++++++++++++++++++++++---------- > > 1 files changed, 24 insertions(+), 10 deletions(-) > > ... > > +static int mark_object(struct object* obj, int type, void* data) > > +{ > > + if (!obj) > > + return 0; > > + if (obj->flags & REACHABLE) > > + return 0; > > + obj->flags |= REACHABLE; > > + if (!obj->parsed) > > + return 0; > > + return fsck_walk(obj, mark_object, data); > > +} > > Hmm. The return value 0 means Ok and negative is error? The > reason we can say success if obj is NULL or it is not parsed yet > is because...? In mark_objects, I don't care for the results. Its should only mark all reachable objects (without crashing). > > @@ -326,8 +344,6 @@ static int fsck_tree(struct tree *item) > > o_name = name; > > o_sha1 = sha1; > > } > > - free(item->buffer); > > - item->buffer = NULL; > > Hmm. The reason you still need the buffer after you checked the > contents of the tree in the loop is because you haven't actually > checked the referents are Ok. But I do not see a corresponding > free that releases this memory after you are actually done with > the verification with fsck_walk() yet, so we leak this in the > meantime? The tree is traversed multiple times: * to mark reachable objects * to mark it's childs used * to check for broken links The last use of the tree content is, after all objects are already in memory. > > @@ -375,8 +391,6 @@ static int fsck_commit(struct commit *commit) > > } > > if (memcmp(buffer, "author ", 7)) > > return objerror(&commit->object, "invalid format - expected 'author' line"); > > - free(commit->buffer); > > - commit->buffer = NULL; > > if (!commit->tree) > > return objerror(&commit->object, "could not load commit's tree %s", tree_sha1); > > if (!commit->parents && show_root) > > Likewise. Will change. > > From ce43251ef71962ff64fe138f1295c405ef6aaf65 Mon Sep 17 00:00:00 2001 > > From: Martin Koegler <mkoegler@xxxxxxxxxxxxxxxxx> > > Date: Fri, 8 Feb 2008 09:04:08 +0100 > > Subject: [PATCH 04/12] fsck: move reachable object check to fsck_walk > > > > It handles NULL pointers in object references without crashing. > > > > Signed-off-by: Martin Koegler <mkoegler@xxxxxxxxxxxxxxxxx> > > --- > > builtin-fsck.c | 49 +++++++++++++++++++++++++++++-------------------- > > 1 files changed, 29 insertions(+), 20 deletions(-) > > > > diff --git a/builtin-fsck.c b/builtin-fsck.c > > index 49e96ff..2c1e10f 100644 > > --- a/builtin-fsck.c > > +++ b/builtin-fsck.c > > @@ -81,13 +81,39 @@ static int objwarning(struct object *obj, const char *err, ...) > > return -1; > > } > > > > +static int check_reachable_object_childs(struct object *obj, int type, void *data) > > +{ > > + struct object *parent = data; > > + if (!obj) { > > + printf("broken link from %7s %s\n", > > + typename(parent->type), sha1_to_hex(parent->sha1)); > > + printf("broken link from %7s %s\n", > > + (type==OBJ_ANY?"unknown":typename(type)), "unknown"); > > Hmm? I am not sure what this part is reporting... If eg. the sha1 of a commit is stored as tree in a commit, you will find a null pointer in struct commit->tree. Such things will hit this check. > > From ee11f771be1ef1c29725cb56ab3eb8dfe61ca25a Mon Sep 17 00:00:00 2001 > > From: Martin Koegler <mkoegler@xxxxxxxxxxxxxxxxx> > > Date: Fri, 8 Feb 2008 09:07:33 +0100 > > Subject: [PATCH 06/12] create a common object checker code out of fsck > > > > The function provides a callback for reporting errors. > > The same "add unused new stuff independently, later use it and > then finally remove now unused old stuff" pattern is here. I am > neutral to that patch style but it is a bit harder to see what > is going on. > > Most of the changes seem to be straight and sane copy-and-paste though. I'll merge this patch with the next patch. > > From a8db4e754e717bac0b2462333d4145eac3452099 Mon Sep 17 00:00:00 2001 > > From: Martin Koegler <mkoegler@xxxxxxxxxxxxxxxxx> > > Date: Fri, 8 Feb 2008 09:14:14 +0100 > > Subject: [PATCH 09/12] unpack-objects: prevent writing of inconsistent objects > > > > This patch introduces a strict mode, which ensures that: > > - no malformed object will be written > > - no object with broken links will be written > > > > The patch ensures this by delaying the write of all non blob object. > > These object are written, after all objects they link to are written. > > > > An error can only result in unreferenced objects. > > > diff --git a/builtin-unpack-objects.c b/builtin-unpack-objects.c > > index f18c7e8..3e906e4 100644 > > --- a/builtin-unpack-objects.c > > +++ b/builtin-unpack-objects.c > > @@ -173,7 +250,6 @@ static void resolve_delta(unsigned nr, enum object_type type, > > die("failed to apply delta"); > > free(delta); > > write_object(nr, type, result, result_size); > > - free(result); > > } > > And this is freed later elsewhere? The free is the task of write_object. If (!strict), then it is freeed immediatly. Similar for blobs. Other objects are put in the cache (see patch 1). > > @@ -203,7 +279,8 @@ static void unpack_non_delta_entry(enum object_type type, unsigned long size, > > > > if (!dry_run && buf) > > write_object(nr, type, buf, size); > > - free(buf); > > + else if (buf) > > + free(buf); > > } > > You can always free NULL without checking. Will fix. > > @@ -356,6 +434,7 @@ static void unpack_all(void) > > if (!quiet) > > progress = start_progress("Unpacking objects", nr_objects); > > obj_list = xmalloc(nr_objects * sizeof(*obj_list)); > > + memset(obj_list, 0, nr_objects * sizeof(*obj_list)); > > Hmm, is this a fix to the 'master' independent from all the rest > of your patches, or a new requirement? This is, because the new field in obj_list must be initialized with zero. mfg martin Kögler - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html