On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote: > A real question is how much performance gain, relative to ".cloning" > thing, this approach gives us. If it gives us 80% or more of the > gain compared to doing no checking, I'd say we have a clear winner. I've tested Jeff's loose-object-cache patch and the performance is within error bounds of my .cloning patch. A git clone of the same repo as in my initial tests: .cloning -> 10m04 loose-object-cache -> 9m59 Jeff's patch does a little more work (256 readdir() calls, which in case of an empty repo translate into 256 LOOKUP calls that return NFS4ERR_NOENT) but that appears to be insignificant. I agree that the loose-object-cache approach is preferred as it applies to more git commands and also benefits performance when there are loose objects already in the repository. As pointed out in the thread, maybe the default cache size should be some integer times gc.auto. I believe the loose-object-cache approach would have a performance regression when you're receiving a small pack file and there's many loose objects in the repo. Basically you're trading off MIN(256, num_objects_in_pack / dentries_per_readdir) * readdir_latency against num_loose_objects * stat_latency On Amazon EFS (and I expect on other NFS server implementations too) it is more efficient to do readdir() on a large directory than to stat() each of the individual files in the same directory. I don't have exact numbers but based on a very rough calculation the difference is roughly 10x for large directories under normal circumstances. As an example, this means that when you're recieving a pack file with 1K objects in a repository with 10K loose objects that the loose-object-cache patch has roughly the same performance as the current git. I'm not sure if this is something to worry about as I'm not sure people run repos with this many loose files. If it is a concern, there could be a flag to turn the loose object cache on/off.