On 05/16/2012 11:54 PM, Thomas Gummerer wrote:
On 05/16, Michael Haggerty wrote:
I just reviewed version 1369bd855b86 of your script, and it is MUCH
better. It's easy to read and review. The functions that it
defines are now self-contained and could therefore be reused for
other purposes. There are fewer magic numbers (though there are
still a few; I wonder if there is a way to get rid of those?)
You've done a nice job polishing up the code.
Thanks for the feedback! I could get rid of the magic numbers for
the crc code, but I'm not sure if it makes sense to replace the
others with constants, since they only occur once in the file. I
added comments instead explaining where those numbers come from
instead.
I think it is possible to remove the last magic number and also to make
the CRC handling easier. I have pushed some suggested changes to github
[1]:
1. With the current code, trying to read a file that is less than 24
bytes long would result in a struct.error (because it would try to
unpack a string that is shorter than the struct) whereas the underlying
error in this case should almost always be reported as a signature
error. So it is correct to read the signature separately from the rest
of the header, but it is even more correct to check the signature
(including its length) before reading on.
2. I introduce a class CRC to hold checksums, so that (a) the code for
handling checksums can be encapsulated, and (b) an instance of this
class can be passed into functions and mutated in-place, which is less
cumbersome than requiring functions to return (data, crc) tuples. This,
in turn, makes possible...
3. ...a new function read_struct(f, s, crc), which reads the data for a
struct.Struct from f, checksums it, and returns the unpacked data. This
function is more convenient to use than the old read_calc_crc().
4. The checksum instance can also be made responsible for verifying that
the next four bytes in the file agree with the expected checksum. This
removes some more code duplication. (See CRC.matches().)
5. You read the extension offsets using CRC_STRUCT.size, which is
technically correct but misleading. In fact, the extension offsets
should be documented using their own EXTENSION_INDEX_STRUCT. Also, it
makes more sense to store the unpacked integer offsets to extoffsets
rather than the raw 4-byte strings.
6. With a couple of more minor changes it is possible to replace the
magic number "24" in read_index_entries(). With this change the
computation is documented very explicitly and is also (somewhat) robust
against future changes in the format.
Look over my changes and take whatever you want.
Thanks, I have changed those. I added the docstrings for all read
functions, they however don't seem to make sense for the print
functions, since you're probably faster just reading the code for
them.
That's fine. If you document nontrivial functions you are already doing
much better than the git project average.
One since I changed in addition is to those changes, is that I
gave the exceptions names to make them more meaningful.
Good.
Michael
[1] https://github.com/mhagger/git/tree/pythonprototype
--
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html