Re: [RFC Patch] Preventing corrupt objects from entering the repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 13, 2008 at 01:20:15AM -0500, Shawn O. Pearce wrote:
>   tree.c:
> 
>     This is inside track_tree_refs and its a file mode we do
>     not recognize.  We spit out a warning, but try to lookup
>     the unknown object anyway, even though we're looking at what
>     should be a bogus tree.  This occurs only in parse_tree_buffer,
>     only if track_objects_refs is on, and only if we see a tree
>     that is actually not understood by this version of Git.  Hmm,
>     shouldn't happen unless the tree itself is corrupt.
> 
>     This code looked fishy enough to me to dig up the blame history
>     for it:
> 
>       commit e2ac7cb5fbcf1407003aa07cdcd14141527ea2e3
>       Author: Sam Vilain <sam.vilain@xxxxxxxxxxxxxxx>
>       Date: Wed Jun  6 06:25:17 2007
> 
>       Don't assume tree entries that are not dirs are blobs
> 
>       When scanning the trees in track_tree_refs() there is a "lazy" test
>       that assumes that entries are either directories or files.  Don't do
>       that.
> 
>     Sounds like it was around the time of S_ISGITLINK being
>     introduced.  Looking at this code again I have to wonder, why
>     the hell are we looking up and tracking an object inside of a
>     tree when we don't understand the mode?
> 
>     Lets say a new form of S_ISGITLINK gets introduced in the future.
>     By this I mean the mode says "hey, this SHA-1 isn't really in my
>     object pool".  We're going to cram a dummy object into the refs
>     table for this tree.  Why?  We don't do this for S_ISGITLINK.
> 
>     Lets say its a new object type (not tree/tag/commit/blob) but
>     it is in our storage pool.  If this Git doesn't know the mode,
>     it sure as heck won't know what the hell the loose object header
>     (or pack header!) says about that object.
> 
>     One of the key places where you might expect tracking an unknown
>     (but referenced by a tree) SHA-1 type would be in reachability,
>     rev-list --objects, packfile generation.  But the process_tree()
>     function in reachable.c doesn't have Sam's change above, so it
>     will assume anything new looking is a blob.
>     
>     Oh, and what a rabbit hole I just fell down.  The only caller
>     that seems to set "track_object_refs = 1" (and thus get into this
>     odd lookup_unknown_object() call) is fsck.  Everyone else sets
>     it to 0, including its default declaration.
> 
>     So we've got this nice baggage, and differing implementation,
>     so fsck can be happy how?  We've also got a whole lot of apps
>     setting "track_object_refs = 0", which is what it defaults to,
>     unless you managed to run fsck first.  Hmmph.
> 
>     Is it just me or is track_object_refs perhaps outlived its
>     usefulness?

See the patch at the start of this topic. I'm working on replacing
track-object-refs.

It is missing some frees and I copied the lookup_unknown_object call
(This should be replaced with an error message in my patch, as the the
verification of the tree content would reject such an invalid mode).

mfg Martin Kögler
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux