Re: [PATCH] Teach "git add" and friends to be paranoid

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> Dmitry Potapov <dpotapov@xxxxxxxxx> writes:
>
>> ... I think it is possible
>> to avoid the overhead of being on the safe side in a few common cases.
>> Here is a patch. I have not had time to test it, but changes appear to
>> trivial.
>
> Yeah, these are obvious "paranoia not needed" cases.
>

Actually the "if it is smaller than 256k" part is not quite obvious.

>> @@ -2490,7 +2491,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st, int write_object,
>>  	size_t size = xsize_t(st->st_size);
>>  
>>  	flag = write_object ? INDEX_MEM_WRITE_OBJECT : 0;
>> -	if (!S_ISREG(st->st_mode)) {
>> +	if (!S_ISREG(st->st_mode) || size < 262144) {
>>  		struct strbuf sbuf = STRBUF_INIT;
>>  		if (strbuf_read(&sbuf, fd, 4096) >= 0)
>>  			ret = index_mem(sha1, sbuf.buf, sbuf.len,

INDEX_MEM_PARANOID is never given to index_mem() in this codepath, so
trade-offs look like this:

 * In non-paranoia mode, your conjecture is that between

   - malloc, read, SHA-1, deflate, and then free; and
   - mmap, SHA-1, deflate and then munmap

   the former is faster for small files that can fit in core without
   thrashing.

 * In paranoia mode, your conjecture is that between

   - malloc, read, SHA-1, deflate, and then free; and
   - mmap, SHA-1, SHA-1 and deflate in chunks, and then munmap

   the former is faster for small files that can fit in core without
   thrashing.

The "mmap" strategy has larger cost in paranoia mode compared to its cost
in non-paranoia mode.  The "read" strategy on the other hand has the same
cost in both modes.  If this "read small files" is good for non-paranoia
mode, it is obvious that it is also good (better) for paranoia mode.

Which means that this hunk addresses an unrelated issue.  "paranoid
avoidance" falls naturally as a side effect of doing this, but that is not
the primary effect of this change.

There needs some benchmarking to justify it, I think.

So I'd split this hunk out when queuing.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]