On Wed, Jun 06 2018, Ævar Arnfjörð Bjarmason wrote: > On Mon, Jan 08 2018, Derrick Stolee wrote: > >> On 1/7/2018 5:42 PM, Ævar Arnfjörð Bjarmason wrote: >>> >>> On Sun, Jan 07 2018, Derrick Stolee jotted: >>> >>>> git log --oneline --raw --parents >>>> >>>> Num Packs | Before MIDX | After MIDX | Rel % | 1 pack % >>>> ----------+-------------+------------+--------+---------- >>>> 1 | 35.64 s | 35.28 s | -1.0% | -1.0% >>>> 24 | 90.81 s | 40.06 s | -55.9% | +12.4% >>>> 127 | 257.97 s | 42.25 s | -83.6% | +18.6% >>>> >>>> The last column is the relative difference between the MIDX-enabled repo >>>> and the single-pack repo. The goal of the MIDX feature is to present the >>>> ODB as if it was fully repacked, so there is still room for improvement. >>>> >>>> Changing the command to >>>> >>>> git log --oneline --raw --parents --abbrev=40 >>>> >>>> has no observable difference (sub 1% change in all cases). This is likely >>>> due to the repack I used putting commits and trees in a small number of >>>> packfiles so the MRU cache workes very well. On more naturally-created >>>> lists of packfiles, there can be up to 20% improvement on this command. >>>> >>>> We are using a version of this patch with an upcoming release of GVFS. >>>> This feature is particularly important in that space since GVFS performs >>>> a "prefetch" step that downloads a pack of commits and trees on a daily >>>> basis. These packfiles are placed in an alternate that is shared by all >>>> enlistments. Some users have 150+ packfiles and the MRU misses and >>>> abbreviation computations are significant. Now, GVFS manages the MIDX file >>>> after adding new prefetch packfiles using the following command: >>>> >>>> git midx --write --update-head --delete-expired --pack-dir=<alt> >>> >>> (Not a critique of this, just a (stupid) question) >>> >>> What's the practical use-case for this feature? Since it doesn't help >>> with --abbrev=40 the speedup is all in the part that ensures we don't >>> show an ambiguous SHA-1. >> >> The point of including the --abbrev=40 is to point out that object >> lookups do not get slower with the MIDX feature. Using these "git log" >> options is a good way to balance object lookups and abbreviations with >> object parsing and diff machinery.[...] > > [snip] > >> [...]And while the public data shape I shared did not show a >> difference, our private testing of the Windows repository did show a >> valuable improvement when isolating to object lookups and ignoring >> abbreviation calculations. > > Replying to this old thread since I see you're prepearing the MIDX for > submission again and this seemed like the best venue. > > Your WIP branch (github.com/git/derrickstolee/midx/upstream) still only > references the speedups in abbreviation calculations, but here you > allude to other improvements. It would be very nice to have some summary > of that in docs / commit messages when you submit this. > > I've been meaning to get around to submitting something like I mentioned > in https://public-inbox.org/git/87efn0bkls.fsf@xxxxxxxxxxxxxxxxxxx/ > i.e. a way to expand the abbrev mode to not check disambiguations, which > would look something like: > > core.abbrev = 20 > core.validateAbbrev = false > > Or: > > core.abbrev = +2 > core.validateAbbrev = false > > So, using the example from the above referenced E-Mail +2 would make > linux.git emit hashes of 14 characters, without any abbreviation > checking (just trusting in statistics to work in your favor). > > As noted by JS in this thread that wouldn't be acceptable for your > use-case, but there's plenty of people (including me) who'd appreciate > the speedup without being a 100% sure we're emitting unambiguous hashes, > since that trade-off is better than time spent generating another index > on-disk. So I see it as a complimentary & orthogonal feature. > > But with that implemented I wouldn't get any benefit from things that > use the MIDX that aren't abbreviations, so what are those? I won't have time to finish this today, but it's already in a shape that I think is useful for discussion to see what others think. I still need to make this be supported by --abbrev=* and have e.g. --abbrev=+2 work. I got as far as this with that: diff --git a/parse-options-cb.c b/parse-options-cb.c index 0f9f311a7a..7cc9d3dfe6 100644 --- a/parse-options-cb.c +++ b/parse-options-cb.c @@ -16,13 +16,23 @@ int parse_opt_abbrev_cb(const struct option *opt, const char *arg, int unset) if (!arg) { v = unset ? 0 : DEFAULT_ABBREV; } else { + const char *origarg = arg; v = strtol(arg, (char **)&arg, 10); if (*arg) return opterror(opt, "expects a numerical value", 0); - if (v && v < MINIMUM_ABBREV) + if (*origarg == '+' || *origarg == '-') { + if (v == 0) { + return opterror(opt, "relative abbrev must be non-zero", 0); + } else { + default_abbrev_relative = v; + v = -1; + } + } else if (v && v < MINIMUM_ABBREV) { v = MINIMUM_ABBREV; - else if (v > 40) + } else if (v > 40) { v = 40; + } } *(int *)(opt->value) = v; return 0; But e.g. blame would print 40 character SHA-1s on +2, I didn't have time to dig into why. This'll also need tests, I haven't added any yet, and finally it's probably a good idea to split off the core.abbrev=[+-]NUM feature into a seperate patch from core.validateAbbrev, although with my 2/2 the two can be used in isolation, or together. Ævar Arnfjörð Bjarmason (2): config.c: use braces on multiple conditional arms sha1-name: add core.validateAbbrev & relative core.abbrev Documentation/config.txt | 46 ++++++++++++++++++++++++++++++++++++++++ cache.h | 2 ++ config.c | 18 ++++++++++++++-- environment.c | 2 ++ sha1-name.c | 15 +++++++++++++ 5 files changed, 81 insertions(+), 2 deletions(-) -- 2.17.0.290.gded63e768a