Hello, I'm considering working on Git for GSOC 2012, specifically in improving big file support, however I wanted to ask a few questions first, some about the low level operations of how Git handles diffs between files, and also a question or two regarding implementation. My first question is more of a question regarding low level functionality of how Git diffs files. The question is, in the diff process, does git just parse the file and see if there are diffs, or does it use something like hashing to first tell if the file has been modified at all, and then go to the diff process if the hash is different. An extension to this question is, in Git's internal database, does it set any kind of flag to say that a file is a binary if it is one. My thought process in implementation involves checking the hash, and if the hash is the same, skip it, if the hash is different, check the MIME type possibly using libmagic, and if it matches a known binary format, then just commit the new version, rather than trying to run a whole diff and load the whole file in the process. The thing I'm worried about is, would anything involved in this break existing Git functionality, or backward compatibility. I'd also greatly appreciate any feedback on my ideas. Thanks, Peter -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html