Re: [PATCH] Makefile: dedup git-ls-files output to prevent duplicate targets

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 26 2022, Jiang Xin wrote:

> On Thu, May 26, 2022 at 2:23 PM Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>
>> Jiang Xin <worldhello.net@xxxxxxxxx> writes:
>>
>> >>         )
>> >> -FOUND_SOURCE_FILES := $(shell $(SOURCES_CMD))
>> >> +FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD)))
>> >>
>> >>  FOUND_C_SOURCES = $(filter %.c,$(FOUND_SOURCE_FILES))
>> >>  FOUND_H_SOURCES = $(filter %.h,$(FOUND_SOURCE_FILES))
>> >>
>> >
>> > If I disabled the git-ls-files command like below,
>> >
>> >     @@ -846,7 +846,7 @@ generated-hdrs: $(GENERATED_H)
>> >...
>> > This is because the three generated header files (defined in
>> > $(GENERATED_H)) are also included in the result of "SOURCES_CMD". We
>> > can fix this by sorting LOCALIZED_C:
>> >
>> >     -LOCALIZED_C = $(FOUND_C_SOURCES) $(FOUND_H_SOURCES) $(SCALAR_SOURCES) \
>> >     -             $(GENERATED_H)
>> >     +LOCALIZED_C = $(sort $(FOUND_C_SOURCES) $(FOUND_H_SOURCES)
>> > $(SCALAR_SOURCES) \
>> >     +               $(GENERATED_H))
>>
>> If you make FOUND_SOURCE_FILES unique upfront, the at least there
>> wouldn't be any duplicates there.  Do you mean that some of what is
>> in FOUND_SOURCE_FILES appear in either SCALAR_SOURCES or GENERATED_H?
>
> Yes. If find source files using "git-ls-files", the generated headers
> are not included in FOUND_SOURCE_FILES, but this is not the case for
> the find command.
>
>> If not, I think deduplicating near the source of the issue, i.e.
>>
>>   FOUND_SOURCE_FILES := $(sort $(shell $(SOURCES_CMD)))
>
> If we pass the "--deduplicate" option to git-ls-files, we can make
> sure files are unique in FOUND_SOURCE_FILES. So sorting
> FOUND_SOURCE_FILES becomes unnecessary. But FOUND_SOURCE_FILES may
> contain generated files if using find instead of git-ls-files in
> SOURCES_CMD, that means sort FOUND_SOURCE_FILES is not enough. Instead
> of sorting it, we may want to filter-out the generated files from it,
> like:
>
>     FOUND_SOURCE_FILES := $(filter-out $(GENERATED_H),$(shell $(SOURCES_CMD)))
>
> Exclude the several generated headers by add extra fixed options to
> find command is not good, but we may need to exclude the ".build"
> directory from the find command in SOURCES_CMD, in case we may
> duplicate C files in it in future version.
>
>
>     --- a/Makefile
>     +++ b/Makefile
>     @@ -846,7 +846,7 @@ generated-hdrs: $(GENERATED_H)
>     ## Exhaustive lists of our source files, either dynamically generated,
>     ## or hardcoded.
>     SOURCES_CMD = ( \
>     -       git ls-files \
>     +       git ls-files --deduplicate \
>                    '*.[hcS]' \
>                    '*.sh' \
>                    ':!*[tp][0-9][0-9][0-9][0-9]*' \
>     @@ -857,12 +857,13 @@ SOURCES_CMD = ( \
>                    -o \( -name '[tp][0-9][0-9][0-9][0-9]*' -prune \) \
>                    -o \( -name contrib -type d -prune \) \
>                    -o \( -name build -type d -prune \) \
>     +               -o \( -name .build -type d -prune \) \

This change is good and something we should do in any case, I think (and
this is my fault) that due to this we'd have broken other things,
e.g. included these in the TAGS file if we don't have "git ls-files".

>                    -o \( -name 'trash*' -type d -prune \) \
>                    -o \( -name '*.[hcS]' -type f -print \) \
>                    -o \( -name '*.sh' -type f -print \) \
>                    | sed -e 's|^\./||' \
>            )
>     -FOUND_SOURCE_FILES := $(shell $(SOURCES_CMD))
>     +FOUND_SOURCE_FILES := $(filter-out $(GENERATED_H),$(shell $(SOURCES_CMD)))
>
>     FOUND_C_SOURCES = $(filter %.c,$(FOUND_SOURCE_FILES))
>     FOUND_H_SOURCES = $(filter %.h,$(FOUND_SOURCE_FILES))
>
>> may be sufficient.  Deduplicating near the consumer, like
>> LOCALIZED_C, may force us to dedup all the consumers of it (e.g.
>> LOCALIZED_C is not the sole consumer of FOUND_C_SOURCES; you'd need
>> to sort the input to COCCI_SOURCES, for example).
>
> If we apply the above patch, sorting LOCALIZED_C is not necessary.

Per earlier feedback in
https://lore.kernel.org/git/220519.86tu9l6fw4.gmgdl@xxxxxxxxxxxxxxxxxxx/
this all seesm a bit too complex, especially now.

I pointed out then that with --sort-by-file added we:

 * Don't group the translations by C/SH/Perl anymore
 * Change the sort order within files, to be line/sorted instead of
   line/order (i.e. first occurring translations first)

I suggested then to just use $(sort) on the respective lists.

So why not just:

 1. Switch to the $(FOUND_C_SOURCES) (good)
 2. Filter that by C/Perl/SH as before (just a simple $(filter)
 3. $(sort) that (which as noted, also de-dupes it)

Then we don't have any of the behavior change of --sort-by-file, and we
don't have to carefully curate the ls-files/find commands to not include
duplicates (although as seen here that seems to have been a useful
canary in the "find" case).



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux