On Thu, May 26 2022, Jiang Xin wrote: > On Thu, May 26, 2022 at 2:23 PM Junio C Hamano <gitster@xxxxxxxxx> wrote: >> >> Jiang Xin <worldhello.net@xxxxxxxxx> writes: >> >> >> ) >> >> -FOUND_SOURCE_FILES := $(shell $(SOURCES_CMD)) >> >> +FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD))) >> >> >> >> FOUND_C_SOURCES = $(filter %.c,$(FOUND_SOURCE_FILES)) >> >> FOUND_H_SOURCES = $(filter %.h,$(FOUND_SOURCE_FILES)) >> >> >> > >> > If I disabled the git-ls-files command like below, >> > >> > @@ -846,7 +846,7 @@ generated-hdrs: $(GENERATED_H) >> >... >> > This is because the three generated header files (defined in >> > $(GENERATED_H)) are also included in the result of "SOURCES_CMD". We >> > can fix this by sorting LOCALIZED_C: >> > >> > -LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \ >> > - $(GENERATED_H) >> > +LOCALIZED_C = $(sort $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) >> > $(SCALAR_SOURCES) \ >> > + $(GENERATED_H)) >> >> If you make FOUND_SOURCE_FILES unique upfront, the at least there >> wouldn't be any duplicates there. Do you mean that some of what is >> in FOUND_SOURCE_FILES appear in either SCALAR_SOURCES or GENERATED_H? > > Yes. If find source files using "git-ls-files", the generated headers > are not included in FOUND_SOURCE_FILES, but this is not the case for > the find command. > >> If not, I think deduplicating near the source of the issue, i.e. >> >> FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD))) > > If we pass the "--deduplicate" option to git-ls-files, we can make > sure files are unique in FOUND_SOURCE_FILES. So sorting > FOUND_SOURCE_FILES becomes unnecessary. But FOUND_SOURCE_FILES may > contain generated files if using find instead of git-ls-files in > SOURCES_CMD, that means sort FOUND_SOURCE_FILES is not enough. Instead > of sorting it, we may want to filter-out the generated files from it, > like: > > FOUND_SOURCE_FILES := $(filter-out $(GENERATED_H),$(shell $(SOURCES_CMD))) > > Exclude the several generated headers by add extra fixed options to > find command is not good, but we may need to exclude the ".build" > directory from the find command in SOURCES_CMD, in case we may > duplicate C files in it in future version. > > > --- a/Makefile > +++ b/Makefile > @@ -846,7 +846,7 @@ generated-hdrs: $(GENERATED_H) > ## Exhaustive lists of our source files, either dynamically generated, > ## or hardcoded. > SOURCES_CMD = ( \ > - git ls-files \ > + git ls-files --deduplicate \ > '*.[hcS]' \ > '*.sh' \ > ':!*[tp][0-9][0-9][0-9][0-9]*' \ > @@ -857,12 +857,13 @@ SOURCES_CMD = ( \ > -o \( -name '[tp][0-9][0-9][0-9][0-9]*' -prune \) \ > -o \( -name contrib -type d -prune \) \ > -o \( -name build -type d -prune \) \ > + -o \( -name .build -type d -prune \) \ This change is good and something we should do in any case, I think (and this is my fault) that due to this we'd have broken other things, e.g. included these in the TAGS file if we don't have "git ls-files". > -o \( -name 'trash*' -type d -prune \) \ > -o \( -name '*.[hcS]' -type f -print \) \ > -o \( -name '*.sh' -type f -print \) \ > | sed -e 's|^\./||' \ > ) > -FOUND_SOURCE_FILES := $(shell $(SOURCES_CMD)) > +FOUND_SOURCE_FILES := $(filter-out $(GENERATED_H),$(shell $(SOURCES_CMD))) > > FOUND_C_SOURCES = $(filter %.c,$(FOUND_SOURCE_FILES)) > FOUND_H_SOURCES = $(filter %.h,$(FOUND_SOURCE_FILES)) > >> may be sufficient. Deduplicating near the consumer, like >> LOCALIZED_C, may force us to dedup all the consumers of it (e.g. >> LOCALIZED_C is not the sole consumer of FOUND_C_SOURCES; you'd need >> to sort the input to COCCI_SOURCES, for example). > > If we apply the above patch, sorting LOCALIZED_C is not necessary. Per earlier feedback in https://lore.kernel.org/git/220519.86tu9l6fw4.gmgdl@xxxxxxxxxxxxxxxxxxx/ this all seesm a bit too complex, especially now. I pointed out then that with --sort-by-file added we: * Don't group the translations by C/SH/Perl anymore * Change the sort order within files, to be line/sorted instead of line/order (i.e. first occurring translations first) I suggested then to just use $(sort) on the respective lists. So why not just: 1. Switch to the $(FOUND_C_SOURCES) (good) 2. Filter that by C/Perl/SH as before (just a simple $(filter) 3. $(sort) that (which as noted, also de-dupes it) Then we don't have any of the behavior change of --sort-by-file, and we don't have to carefully curate the ls-files/find commands to not include duplicates (although as seen here that seems to have been a useful canary in the "find" case).