Re: [PATCH 32/36] pathspec: allow querying for attributes

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 09 Nov 2016 14:25:13 -0800

Stefan Beller <sbeller@xxxxxxxxxx> writes:

> On Wed, Nov 9, 2016 at 1:45 AM, Duy Nguyen <pclouds@xxxxxxxxx> wrote:
>> On Fri, Oct 28, 2016 at 1:29 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>>> ...
>>> The strategy this round takes to make it unnecessary to punt
>>> preloading (i.e. dropping "pathspec: disable preload-index when
>>> attribute pathspec magic is in use" patch the old series had) is to
>>> make the attribute subsystem thread-safe.  But that thread-safety in
>>> the initial round is based on a single Big Attribute Lock, so it may
>>> turn out that the end result performs better for this codepath if we
>>> did not to make any call into the attribute subsystem.
>>
>> It does sound good and I want to say "yes please do this", but is it
>> making pathspec api a bit more complex (to express "assume all
>> attr-related criteria match")? I guess we can have an api to simply
>> filter out attr-related magic (basically set attr_match_nr back to
>> zero) then pass a safe (but more relaxing) pathspec to the threaded
>> code. That would not add big maintenance burden.
>
> So with the current implementation, we already have the shortcut as:
>
>      if (item->attr_match_nr && !match_attrs(name, namelen, item))
>
> i.e. if attr_match_nr is zero, we do not even look at the mutexes and such,
> so I am not sure what you intend to say in this email?

I am not sure what relevance the "we call into attribute subsystem
only when there is any need to check attributes" obvious short-cut
has to what is being discussed.

The issue is specific to what preloading is about.  It is merely an
attempt to run cheap checks that could be easily multi-threaded with
multiple threads early in the program that we _know_ we would need
to eventually refresh the index before doing some interesting work.
A full refresh_index() will be done eventually, and because it needs
to trigger thread-unsafe part of the API, it needs to be done in the
main thread.  Doing the preload allows us to mark index entries that
do not have to be scanned again in the upcoming refresh_index()
call.  It is OK for preload-index.c::preload_thread() to skip and
not mark some index entries (iow, its sole purpose is to leave a
note in each index entry "this is fresh, you do not need to look at
it again", and it can choose to skip an entry, which essentially
means "this I didn't check, so you, refresh_index(), need to check
yourself").

preload_thread() for example skips index entries that needs to
trigger "racy Git avoidance" logic that is heavyweight (it has to go
to the filesystem and the object store), and it is a sensible thing
to skip because they are rather rare.

The message by Duy you are responding to was his response to me who
wondered if the attribute based pathspec match also falls into the
same category.  Just like racy Git code was deemed too heavyweight
to be called from preloading codepath and CE_MATCH_RACY_IS_DIRTY bit
was added as a way to ask ie_match_stat() API to avoid it (and hence
we are skipping, the caller is also telling "if you suspect a racy,
without checking for real, just answer 'I cannot say it is
clean/fresh'"), if we invent a new flag and pass it through
match_pathspec() down to match_pathspec_item() and have that if()
statement you quoted also skip match_attrs() for a pathspec element
with attribute based narrowing (as we are skipping, the flag may
also have say "instead of checking the attributes, just pretend that
the path did not satisfy the attribute narrowing"), would it benefit
the overall performance?

To answer Duy's "would it make sense to force the caller to create a
new pathspec from an existing one by filtering out pathspec elements
with attr-based narrowing?" question, I do not think it does.