On Fri, Oct 14, 2016 at 10:39:52AM -0700, Junio C Hamano wrote: > Jeff King <peff@xxxxxxxx> writes: > > > So it's certainly better. But 7500 packs is just silly, and squeezing > > out ~400ms there is hardly worth it. If you repack this same case into a > > single pack, the command drops to 5ms. So yes, there's close to an order > > of magnitude speedup here, but you get that _and_ another order of > > magnitude just by repacking. > > "7500 is silly" equally applies to the "quick" (and sloppy, if I am > reading your "Failing in this direction doesn't make me feel great." > correctly) approach, I think, which argues for not taking either > change X-<. I wouldn't quite agree that they're the same. 7500 packs is silly because a bunch of _other_ things are going to get equally or more slow before the prepare_packed_git() slowdown is noticeable. And it's in your power to clean up, and we should encourage users to do so. Whereas having a bunch of unfetched tags is a state that may linger indefinitely, and be outside the user's control. I _am_ open to the argument that calling reprepare_packed_git() over and over doesn't really matter much if you have a sane number of packs. If you tweak my perf test like so: diff --git a/t/perf/p5550-fetch-tags.sh b/t/perf/p5550-fetch-tags.sh index a5dc39f..7e7ae24 100755 --- a/t/perf/p5550-fetch-tags.sh +++ b/t/perf/p5550-fetch-tags.sh @@ -86,7 +86,7 @@ test_expect_success 'create child packs' ' cd child && git config gc.auto 0 && git config gc.autopacklimit 0 && - create_packs 500 + create_packs 10 ) ' you get: Test origin quick -------------------------------------------------------- 5550.4: fetch 0.06(0.02+0.02) 0.02(0.01+0.00) -66.7% Still an impressive speedup as a percentage, but negligible in absolute terms. But that's on a local filesystem on a Linux machine. I'd worry much more about a system with a slow readdir(), e.g., due to NFS. Somebody's real-world NFS case[1] was what prompted us to do 0eeb077 (index-pack: avoid excessive re-reading of pack directory, 2015-06-09). It looks like I _did_ look into optimizing this into a single stat() call in the thread at [1]. I completely forgot about that. I did find there that naively using stat_validity() on a directory is racy, though I wonder if we could do something clever with gettimeofday() instead. IOW, the patches there only bothered to re-read when they saw the mtime on the directory jump, which suffers from 1-second precision problems. But if we instead compared the mtime to the current time, we could err in favor of re-reading the packs, and get false positives for at most 1 second. [1] http://public-inbox.org/git/7FAE15F0A93C0144AD8B5FBD584E1C5519758FC3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > I agree that the fallout from the inaccuracy of "quick" approach is > probably acceptable and the next "fetch" will correct it anyway, so > let's do the "quick but inaccurate" for now and perhaps cook it in > 'next' for a bit longer than other topics? I doubt that cooking in 'next' for longer will turn up anything useful. The case we care about is the race between a repack and a fetch. We lived with the "quick" version of has_sha1_file() everywhere for 8 years. I only noticed the race on a hosting cluster which puts through millions of operations per day (I could in theory test the patch on that same cluster, but we actually don't do very many fetches). -Peff