On Mon, Feb 22, 2010 at 8:31 PM, Zygo Blaxell <zblaxell@xxxxxxxxxxxxxxxxxxxx> wrote: > > If you're read()ing a chunk at a time into a fixed size buffer, and > doing sha1 and deflate in chunks, the data should be copied once into CPU > cache, processed with both algorithms, and replaced with new data from > the next chunk. Currently, we calculate SHA-1, then lookup whether the object with this SHA-1 exists, and if it does not, then deflate and write it to the object storage. So, we avoid deflate and write cost if this object already exists. Moreover, when we deflate data, we create the temporary file in the same directory where the target object will be stored, thus avoiding cross-directory rename (which is important for some reason, but I don't remember why). So, creating the temporary file requires the knowledge first two digits of SHA-1, which you cannot know without calculation SHA-1. So, the idea of processing file in chunks is very attractive, but it has two drawbacks: 1. extra cost (deflating+writing) when the object is already stored 2. some issues with cross-directory renaming Dmitry -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html