On 05/11/2012 07:12 PM, Thomas Gummerer wrote:
Thanks for your feedback! To get clearer code I've now written a
working reader for the v5 index format in Python. The full reader
would probably be to long for the mailing list, but here is the
interesting part:
[...]
The full reader can be found here:
https://github.com/tgummerer/git/blob/pythonprototype/git-read-index-v5.py
Good.
I tried to review your code 3fe08f9b:git-read-index-v5.py and compare it
to your file spec f858cf6a9:Index-format-v5.textile. I have the
following comments (some of them already discussed in IRC):
1. Your script seems to be reading a different version of the file than
described in the spec. [When I mentioned this on IRC you pushed a new
version a4ee558ea of the spec.]
2. Your script seems to assume that the index file has no extensions.
It would be better (for documentation purposes and to ensure that there
are no surprises) to make sure that the code knows how to handle extensions.
3. Please document briefly how the scripts should be used.
4. Please limit line length to 80 columns (like the main git project).
5. Python has a nicer way to initialize dictionaries whose keys are all
valid identifiers; for example:
- return dict({"signature": signature, "vnr": header[0], "ndir":
header[1], "nfile": header[2], "next": header[3]})
+ return dict(signature=signature, vnr=header[0], ndir=header[1],
+ nfile=header[2], next=header[3])
6. Some of your print statements are just begging to be written using
string interpolation; e.g.,
- print d["pathname"] + " " + str(d["flags"]) + " " +
str(d["foffset"]) + " " + str(d["cr"]) + " " + str(d["ncr"]) + " " +
str(d["nsubtrees"]) + " " + str(d["nfiles"]) + " " + str(d["nentries"])
+ " " + str(binascii.hexlify(d["objname"]))
+ print ("%(pathname)s %(flags)s %(foffset)s %(cr)s %(ncr)s "
+ "%(nsubtrees)s %(nfiles)s %(nentries)s " % d
+ + str(binascii.hexlify(d["objname"])))
printheader() can be rewritten similarly.
7. You have a couple of while loops that would be easier to read if
written as for loops.
8. There is no need to use global variables. Global variables have lots
of disadvantages, one of which is that it is hard to tell what functions
have side effects via the global variables. It is better to pass the
needed variables explicitly to functions that need them.
9. ...after you eliminate the global variables, you will see that the
checksums are mostly needed over limited areas of code then can be
discarded. Rewriting the checksum handling in this way would make it
easier to see exactly what range of bytes is included in a particular
checksum.
10. There is no need to keep track of all of the data that will go into
a checksum. The CRC32 checksum can be computed incrementally via the
second argument of binascii.crc32(data, crc). Therefore, you only need
to retain a 32-bit running checksum instead of the filedata array of
data strings.
11. It is bad style to generate output from within the
readindexentries() function. Given that it reads the whole array of
file entries anyway, it would be cleaner to return the array to the
caller and let the caller print out what it wants.
12. Your handling of checksum errors is inconsistent. In some places
you generate exceptions; in another you simply print an error to stdout
(not stderr!) and proceed to use the corrupt data.
13. It is probably clearer to unpack the tuples returned by
struct.unpack() directly into local variables with meaningful names
instead of carrying them around as a tuple; e.g.,
- header = struct.unpack('!IIII', checksum.add(f.read(16)))
+ (vnr, ndir, nfile, next) = struct.unpack('!IIII', fread(16))
14. It is more correct to check the file signature and version
explicitly before plowing into the rest of the file (that's what they're
there for!)
That's as far as I've got.
Michael
--
Michael Haggerty
mhagger@xxxxxxxxxxxx
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html