"Victoria Dye via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes: > From: Victoria Dye <vdye@xxxxxxxxxx> > > Move 'read_index_info()' into a new header 'index-info.h' and generalize the > function to call a provided callback for each parsed line. Update > 'update-index.c' to use this generalized 'read_index_info()', adding the > callback 'apply_index_info()' to verify the parsed line and update the index > according to its contents. > > The input parsing done by 'read_index_info()' is similar to, but more > flexible than, the parsing done in 'mktree' by 'mktree_line()' (handling not > only 'git ls-tree' output but also the outputs of 'git apply --index-info' > and 'git ls-files --stage' outputs). To make 'mktree' more flexible, a later > patch will replace mktree's custom parsing with 'read_index_info()'. "git apply --index-info"? That is a blast from the past. It no longer exists since 7a988699 (apply: get rid of --index-info in favor of --build-fake-ancestor, 2007-09-17). As to the scriptability, supporting "ls-files -s" and "ls-tree -r" output as our input do help, but the third one is not natively emitted and it is very unlikely that there are third-party tools that give output in that format. After all these years, I suspect that it is sufficient to say "update-index --index-info" and "mktree" both read information necessary to eventually build trees, but having two separate parsers is a maintenance burden, so we are massaging the code from the former to be reusable. without mentioning where the old third format comes from. > diff --git a/builtin/update-index.c b/builtin/update-index.c > index d343416ae26..77df380cb54 100644 > --- a/builtin/update-index.c > +++ b/builtin/update-index.c > @@ -11,6 +11,7 @@ > #include "gettext.h" > #include "hash.h" > #include "hex.h" > +#include "index-info.h" > #include "lockfile.h" > #include "quote.h" > #include "cache-tree.h" > @@ -509,100 +510,29 @@ static void update_one(const char *path) > report("add '%s'", path); > } > > +static int apply_index_info(unsigned int mode, struct object_id *oid, int stage, > + const char *path_name, void *cbdata UNUSED) > { > + if (!verify_path(path_name, mode)) { > + fprintf(stderr, "Ignoring path %s\n", path_name); > + return 0; > + } > > + if (!mode) { > + /* mode == 0 means there is no such path -- remove */ > + if (remove_file_from_index(the_repository->index, path_name)) > + die("git update-index: unable to remove %s", path_name); This changes the error message. We used to feed "ptr" (no longer visible to this function, as the caller unquotes before calling us) that pointed at the original the user gave to the program; now we report the path_name which is the result of the unquoting. > + } > + else { > + /* mode ' ' sha1 '\t' name > + * ptr[-1] points at tab, > + * ptr[-41] is at the beginning of sha1 > */ > + if (add_cacheinfo(mode, oid, path_name, stage)) > + die("git update-index: unable to update %s", path_name); But this side used to report the path_name as the result of unquoting in the original. So the above change would probably be OK in the name of consistency? 973d6a20 (update-index --index-info: adjust for funny-path quoting., 2005-10-16) was the origin of the unquoting, and looking at that commit, I have a feeling that the "ptr" thing above (i.e., the one I pointed out as changing the behaviour) was simply forgotten (as opposed to deliberately made to report the original) while updating the code to deal with quoted original into unquoted paths. So I think the change is more than OK. It is a very welcome (belated) bugfix for 973d6a20 ;-). > } > + > + return 0; > } It looks a bit disappointing that we die in the callback like above, when the main parser loop that moved to the other file to be more reusable is now capable of returning to the caller with an error, but at this step, it is a good place to stop. A refactor that does not change the behaviour. Nicely done. > diff --git a/t/t2107-update-index-basic.sh b/t/t2107-update-index-basic.sh > index cc72ead79f3..29696ade0d0 100755 > --- a/t/t2107-update-index-basic.sh > +++ b/t/t2107-update-index-basic.sh > @@ -142,4 +142,31 @@ test_expect_success '--index-version' ' > test_must_be_empty actual > ' > > +test_expect_success '--index-info fails on malformed input' ' > + # empty line > + echo "" | > + test_must_fail git update-index --index-info 2>err && > + grep "malformed input line" err && Using "test_grep" would make it easier to diagnose when test breaks. A failing "grep" will be silent. A failing "test_grep" will tell us "I was told to find THIS, but didn't find any in THAT". > + # bad whitespace > + printf "100644 $EMPTY_BLOB A" | > + test_must_fail git update-index --index-info 2>err && > + grep "malformed input line" err && > + > + # invalid stage value > + printf "100644 $EMPTY_BLOB 5\tA" | > + test_must_fail git update-index --index-info 2>err && > + grep "malformed input line" err && > + > + # invalid OID length > + printf "100755 abc123\tA" | > + test_must_fail git update-index --index-info 2>err && > + grep "malformed input line" err && > + > + # bad quoting > + printf "100644 $EMPTY_BLOB\t\"A" | > + test_must_fail git update-index --index-info 2>err && > + grep "bad quoting of path name" err > +' > + > test_done