Re: [RFC Patch] Preventing corrupt objects from entering the repository

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 12 Feb 2008, Martin Koegler wrote:

> On Tue, Feb 12, 2008 at 11:02:06AM -0500, Nicolas Pitre wrote:
> > I think this is a good idea to always have some sanity checks on any 
> > incoming objects so to make sure they're well formed and valid before 
> > giving them a SHA1 value, and bail out as soon as any error is found.  
> > From my understanding that's what your patch is doing, right? (sorry I 
> > can't find them in my mailbox anymore). 
> 
> Yes. (=>http://marc.info/?l=git&m=120266631524947&w=2)
> 
> >  This can be done as objects are 
> > coming in just fine and requires no extra memory, and I would say this 
> > should be done unconditionally all the time.  After all, the Git 
> > coherency model is based on the SHA1 checksuming, and therefore it is a 
> > good idea to never validate any malformed objects with a SHA1.  So I'm 
> > all in favor of such validation always performed in index-pack and 
> > unpack-objects.
> 
> We will need some additional memory for struct blob/tree/tag/commit
> even for this check.

Why so?

Each received object is stored in memory when received, so why can't you 
simply validate it in place?  No need to keep trace of them afterward.

> > As to making sure those objects are well connected... well this is a 
> > technically different issue entirely, and I wonder if a special mode to 
> > fsck might not be a better solution.  For example, fsck could be made to 
> > validate object connectivity, starting from the new ref(s), and stopping 
> > object walking as soon as a reference to an object not included in the 
> > newly received pack is encountered.  This could be run from some hook to 
> > decide whether or not to update the new refs, and to delete the pack 
> > otherwise.
> 
> Do you really think, that this will need less memory? fsck loads first
> all objects and then verifies their connections.

Not all objects otherwise I wouldn't even be able to run it.

My point is that you can have fsck load only objects contained in the 
received pack (you can use the pack index to load them) and assume 
connectivity is good whenever an object in the pack reference an 
existing object outside of the pack.  At least this doesn't need to 
happen in parallel with pack indexing.


Nicolas
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux