Re: Index format v5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/03/2012 08:16 PM, Thomas Rast wrote:
Thomas Gummerer<t.gummerer@xxxxxxxxx>  writes:

   32-bit crc32 checksum over ctime seconds, ctime nanoseconds,
     ino, file size, dev, uid, gid (All stat(2) data except mtime) [7]
[...]
[7] Since all stat data (except mtime and ctime) is just used for
     checking if a file has changed a checksum of the data is enough.
     In addition to that Thomas Rast suggested ctime could be ditched
     completely (core.trustctime=false) and thus included in the
     checksum. This would save 24 bytes per index entry, which would
     be about 4 MB on the Webkit index.
     (Thanks for the suggestion to Michael Haggerty)

This is the part I'm most curious about.  Are we missing anything?
Michael brought it up on IRC: the stat() results are only used to test
whether they are still the same, with the exception of the mtime (which
also undergoes raciness checks).

As far as I can see, none of st_{ino,dev,uid,gid} are useful for
anything.  st_size might conceivably be used as a hint for a buffer
size, but nobody actually does that.  The ctime undergoes stricter
checks, but AFAICS it's also all about whether it has changed, and
besides that can be turned off.  We think all of those fields can be
replaced by an arbitrary hash/CRC and only tested for equality.  32 bits
should be plenty, probably even if we just xor the values together.

XOR is definitely *not* adequate; for example, changing uid=gid="you" to uid=gid="me" would not affect the XOR of the values (assuming, as is often the case, that each user has his own uid/gid with the same numerical values).

Which hash to use depends on some estimate of the likelihood that the hashes collide and simultaneously that the other metadata coincide. It seems to me that CRC-32 would be adequate. But if not, a longer hash could be used (albeit with less space savings).

Michael

--
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]