I believe I've addressed all issues raised except the question about how to exhaustively validate the changes. I was under the impression that the combination of the existing 'git cat-file' test cases and my new cases were enough, but I have thus far been unable to get code coverage working locally to validate that myself, due to spurious failures in other tests. I've kicked off another run with most unrelated tests removed to see if that helps. Please note that this is my first contribution to git. I've tried to follow the instructions about how to correctly submit a patch (I'm using GitGitGadget as getting Outlook to do plain text e-mail correctly seems impossible), but please let me know if I've missed something. My motivation for making this change is purely performance. We have a large repository that we enable the sparse index for, and I am developing a pre-commit hook that (among other things) uses git cat-file to get the staged contents of certain files. Without this change, getting the contents of a single small file from the index can take upwards of 10 seconds due to the index expansion. After this change, it only takes ~0.3 seconds unless the file is outside of the sparse index. Kevin Lyles (2): t1092: allow run_on_* functions to use standard input builtin/cat-file: mark 'git cat-file' sparse-index compatible builtin/cat-file.c | 3 ++ t/t1092-sparse-checkout-compatibility.sh | 50 +++++++++++++++++++++--- 2 files changed, 48 insertions(+), 5 deletions(-) base-commit: 4590f2e9412378c61eac95966709c78766d326ba Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-git-1770%2Fklylesatepic%2Fkl%2Fmark-cat-file-sparse-index-compatible-v4 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-git-1770/klylesatepic/kl/mark-cat-file-sparse-index-compatible-v4 Pull-Request: https://github.com/git/git/pull/1770 Range-diff vs v3: 1: b310593aec2 ! 1: 73fe71abcd5 Allow using stdin in run_on_* functions @@ Metadata Author: Kevin Lyles <klyles@xxxxxxxx> ## Commit message ## - Allow using stdin in run_on_* functions + t1092: allow run_on_* functions to use standard input - The 'run_on_sparse' and 'run_on_all' functions previously did not work - correctly for commands accepting standard input. This also indirectly - affected 'test_all_match' and 'test_sparse_match'. + The 'run_on_sparse' and 'run_on_all' functions do not work correctly for + commands accepting standard input, because they run the same command + multiple times and the first instance consumes it. This also indirectly + affects 'test_all_match' and 'test_sparse_match'. - Now, we accept standard input and will send it to each command that we - run. This does not impact using the functions for commands that don't - need standard input. + To allow these functions to work with commands accepting standard input, + first slurp standard input to a temporary file, and then run the command + with its standard input redirected from the temporary file. This ensures + that each command sees the same contents from its standard input. + + Note that this does not impact commands that do not read from standard + input; they continue to ignore it. Additionally, existing uses of the + run_on_* functions do not need to do anything differently, as the + standard input of the test environment is already connected to + /dev/null. + + We do not explicitly clean up the input files because they are cleaned + up with the rest of the test repositories and their contents may be + useful for figuring out which command failed when a test case fails. Signed-off-by: Kevin Lyles <klyles@xxxxxxxx> @@ t/t1092-sparse-checkout-compatibility.sh: init_repos_as_submodules () { } run_on_sparse () { -+ cat >run_on_sparse-input && ++ cat >run-on-sparse-input && + ( cd sparse-checkout && GIT_PROGRESS_DELAY=100000 "$@" >../sparse-checkout-out 2>../sparse-checkout-err - ) && -+ ) <run_on_sparse-input && ++ ) <run-on-sparse-input && ( cd sparse-index && GIT_PROGRESS_DELAY=100000 "$@" >../sparse-index-out 2>../sparse-index-err - ) -+ ) <run_on_sparse-input ++ ) <run-on-sparse-input } run_on_all () { -+ cat >run_on_all-input && ++ cat >run-on-all-input && + ( cd full-checkout && GIT_PROGRESS_DELAY=100000 "$@" >../full-checkout-out 2>../full-checkout-err - ) && - run_on_sparse "$@" -+ ) <run_on_all-input && -+ run_on_sparse "$@" <run_on_all-input ++ ) <run-on-all-input && ++ run_on_sparse "$@" <run-on-all-input } test_all_match () { 2: f4d1461b993 ! 2: ac913257309 Mark 'git cat-file' sparse-index compatible @@ Metadata Author: Kevin Lyles <klyles+github@xxxxxxxx> ## Commit message ## - Mark 'git cat-file' sparse-index compatible + builtin/cat-file: mark 'git cat-file' sparse-index compatible This change affects how 'git cat-file' works with the index when specifying an object with the ":<path>" syntax (which will give file contents from the index). - 'git cat-file' will expand a sparse index to a full index when needed, - but is currently marked as needing a full index (or rather, not marked - as not needing a full index). This results in much slower 'git cat-file' - operations when working within the sparse index, since we expand the - index whether it's needed or not. + 'git cat-file' expands a sparse index to a full index any time contents + are requested from the index by specifying an object with the ":<path>" + syntax. This is true even when the requested file is part of the sparse + index, and results in much slower 'git cat-file' operations when working + within the sparse index. Mark 'git cat-file' as not needing a full index, so that you only pay the cost of expanding the sparse index to a full index when you request -- gitgitgadget