Re: Index format v5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> Thomas Gummerer <t.gummerer@xxxxxxxxx> writes:
>
>> I have been drafting the Version 5 of the index format over the past
>> few days with the help of Thomas Rast, Michael Haggerty, cmn and
>> barrbrain on IRC.
>
> Hrm, so if there is anything glaringly wrong below, should I reduce the
> "trustable reviewer karma point" from these people?  Or did you forget to
> say "but remaining errors are mine" ;-)?

Heh.

Partly it's my fault, I told Thomas to get this out tonight for a single
reason: Michael's CRC-over-stat idea is so radical that I wanted to know
whether there's anything wrong with it.  So the misunderstanding is
really that this is not anywhere near the final result.

But yeah:

> >   32-bit crc32 checksum over ctime seconds, ctime nanoseconds,
> >     ino, file size, dev, uid, gid (All stat(2) data except mtime) [7]
> 
> Giving occassional false positive to "did this change?" is acceptable, but
> any false negative is absolutely unacceptable.  How does this work with
> something like "racy git" situation (i.e. coming from "mtime happens to be
> the same as before") but due to crc32 collisions?
> 
> If there is no good answer to the above question, I would have to say that
> anybody who suggested or passed this through review loses all the
> accumulated reviewer karma points (if s/he has accumulated any, that is).

If this is a problem, then the fault is with me (and I do hope I have
some karma to lose...).

Note that the scenario you outlined is not an issue.  The entries other
than mtime and ctime are really only compared for equality, see
e.g. ce_match_stat_basic().  Comparisons for equality never have false
negatives with any hash function.  Collisions are false positives.

Upon closer reading I noticed that the ie_match_stat and ce_match_stat_*
family actually to distinguish between basically all the fields that can
change.  But this knowledge is never actually put to use, except that
there's an optimized code path where

* mode/type difference implies changed entry
* size difference implies changed entry

So that does constitute an argument to not put the size in the
stat-hash.  Mode and type aren't part of it in the proposed format
anyway.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]