Re: Index format v5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/16/2012 11:54 PM, Thomas Gummerer wrote:
On 05/16, Michael Haggerty wrote:
I just reviewed version 1369bd855b86 of your script, and it is MUCH
better.  It's easy to read and review.  The functions that it
defines are now self-contained and could therefore be reused for
other purposes.  There are fewer magic numbers (though there are
still a few; I wonder if there is a way to get rid of those?)
You've done a nice job polishing up the code.

Thanks for the feedback! I could get rid of the magic numbers for
the crc code, but I'm not sure if it makes sense to replace the
others with constants, since they only occur once in the file. I
added comments instead explaining where those numbers come from
instead.

I think it is possible to remove the last magic number and also to make the CRC handling easier. I have pushed some suggested changes to github [1]:

1. With the current code, trying to read a file that is less than 24 bytes long would result in a struct.error (because it would try to unpack a string that is shorter than the struct) whereas the underlying error in this case should almost always be reported as a signature error. So it is correct to read the signature separately from the rest of the header, but it is even more correct to check the signature (including its length) before reading on.

2. I introduce a class CRC to hold checksums, so that (a) the code for handling checksums can be encapsulated, and (b) an instance of this class can be passed into functions and mutated in-place, which is less cumbersome than requiring functions to return (data, crc) tuples. This, in turn, makes possible...

3. ...a new function read_struct(f, s, crc), which reads the data for a struct.Struct from f, checksums it, and returns the unpacked data. This function is more convenient to use than the old read_calc_crc().

4. The checksum instance can also be made responsible for verifying that the next four bytes in the file agree with the expected checksum. This removes some more code duplication. (See CRC.matches().)

5. You read the extension offsets using CRC_STRUCT.size, which is technically correct but misleading. In fact, the extension offsets should be documented using their own EXTENSION_INDEX_STRUCT. Also, it makes more sense to store the unpacked integer offsets to extoffsets rather than the raw 4-byte strings.

6. With a couple of more minor changes it is possible to replace the magic number "24" in read_index_entries(). With this change the computation is documented very explicitly and is also (somewhat) robust against future changes in the format.

Look over my changes and take whatever you want.

Thanks, I have changed those. I added the docstrings for all read
functions, they however don't seem to make sense for the print
functions, since you're probably faster just reading the code for
them.

That's fine. If you document nontrivial functions you are already doing much better than the git project average.

One since I changed in addition is to those changes, is that I
gave the exceptions names to make them more meaningful.

Good.

Michael

[1] https://github.com/mhagger/git/tree/pythonprototype

--
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]