Re: [PATCH] sha1_file: don't malloc the whole compressed result when writing out objects

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 21 Feb 2010, Junio C Hamano wrote:

> Nicolas Pitre <nico@xxxxxxxxxxx> writes:
> 
> >  	/* Then the data itself.. */
> >  	stream.next_in = buf;
> >  	stream.avail_in = len;
> >  	do {
> > +		unsigned char *in0 = stream.next_in;
> >  		ret = deflate(&stream, Z_FINISH);
> > +		git_SHA1_Update(&c, in0, stream.next_in - in0);
> 
> Actually, I have to take my earlier comment back.  This is not "paranoia".
> 
> I do not see anything that protects the memory area between in0 and
> stream.next_in from getting modified while deflate() nor SHA1_Update() run
> from the outside.

So what?

> Unless you copy the data away to somewhere stable at
> the beginning of each iteration of this loop and run deflate() and
> SHA1_Update(), you cannot have "paranoia".

No.

The whole point is to detect data incoherencyes.

So current sequence of events is as follows:

T0	write_sha1_file_prepare() is called
T1	start initial SHA1 computation on data buffer
T2	in the middle of initial SHA1 computation
T3	end of initial SHA1 computation -> object name is determined
T4	write_loose_object() is called
...	enter the write loop
T5+n	deflate() called on buffer n
T6+n	git_SHA1_Update(() called on the same buffer n
T7+n	deflated data written out
...
Tend	abort if result of T6+n doesn't match object name from T3

So... what can happen:

1) Data is externally modified before T5+n: deflated data and its CRC32 
   will be coherent with the SHA1 computed in T6+n, but incoherent with 
   the SHA1 used for the object name. Wrong data is written to the 
   object even if it will inflate OK. We really want to prevent that 
   from happening. The test in Tend will fail.

2) Data is externally modified between T5+n and T6+n: the deflated data 
   and CRC32 will be coherent with the object name but incoherent with 
   the parano_sha1.  Although written data will be OK, this is way too 
   close from being wrong, and the test in Tend will fail.  If there is 
   more than one round into the loop and the external modifications are 
   large enough then this becomes the same as case 1 above.

3) Data is externally modified in T2: again the test in Tend will fail.

So in all possible cases I can think of, the write will abort.  No copy 
buffer needed, no filesystem mtime required, etc.  If the whole data is 
not stable between T1 and Tend then the object is not added to the 
repository.  Of course it is possible that the data be modified at the 
beginning of the file while the loop in T[5-7] is passed that point.  
But still, there is no data inconsistency at that point.

> My comment about "trickier" is about determining the size of that buffer
> used as "somewhere stable".

We don't care about such buffer.


Nicolas
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]