Stefan Beller <sbeller@xxxxxxxxxx> writes: > On Wed, Nov 9, 2016 at 1:45 AM, Duy Nguyen <pclouds@xxxxxxxxx> wrote: >> On Fri, Oct 28, 2016 at 1:29 AM, Junio C Hamano <gitster@xxxxxxxxx> wrote: >>> ... >>> The strategy this round takes to make it unnecessary to punt >>> preloading (i.e. dropping "pathspec: disable preload-index when >>> attribute pathspec magic is in use" patch the old series had) is to >>> make the attribute subsystem thread-safe. But that thread-safety in >>> the initial round is based on a single Big Attribute Lock, so it may >>> turn out that the end result performs better for this codepath if we >>> did not to make any call into the attribute subsystem. >> >> It does sound good and I want to say "yes please do this", but is it >> making pathspec api a bit more complex (to express "assume all >> attr-related criteria match")? I guess we can have an api to simply >> filter out attr-related magic (basically set attr_match_nr back to >> zero) then pass a safe (but more relaxing) pathspec to the threaded >> code. That would not add big maintenance burden. > > So with the current implementation, we already have the shortcut as: > > if (item->attr_match_nr && !match_attrs(name, namelen, item)) > > i.e. if attr_match_nr is zero, we do not even look at the mutexes and such, > so I am not sure what you intend to say in this email? I am not sure what relevance the "we call into attribute subsystem only when there is any need to check attributes" obvious short-cut has to what is being discussed. The issue is specific to what preloading is about. It is merely an attempt to run cheap checks that could be easily multi-threaded with multiple threads early in the program that we _know_ we would need to eventually refresh the index before doing some interesting work. A full refresh_index() will be done eventually, and because it needs to trigger thread-unsafe part of the API, it needs to be done in the main thread. Doing the preload allows us to mark index entries that do not have to be scanned again in the upcoming refresh_index() call. It is OK for preload-index.c::preload_thread() to skip and not mark some index entries (iow, its sole purpose is to leave a note in each index entry "this is fresh, you do not need to look at it again", and it can choose to skip an entry, which essentially means "this I didn't check, so you, refresh_index(), need to check yourself"). preload_thread() for example skips index entries that needs to trigger "racy Git avoidance" logic that is heavyweight (it has to go to the filesystem and the object store), and it is a sensible thing to skip because they are rather rare. The message by Duy you are responding to was his response to me who wondered if the attribute based pathspec match also falls into the same category. Just like racy Git code was deemed too heavyweight to be called from preloading codepath and CE_MATCH_RACY_IS_DIRTY bit was added as a way to ask ie_match_stat() API to avoid it (and hence we are skipping, the caller is also telling "if you suspect a racy, without checking for real, just answer 'I cannot say it is clean/fresh'"), if we invent a new flag and pass it through match_pathspec() down to match_pathspec_item() and have that if() statement you quoted also skip match_attrs() for a pathspec element with attribute based narrowing (as we are skipping, the flag may also have say "instead of checking the attributes, just pretend that the path did not satisfy the attribute narrowing"), would it benefit the overall performance? To answer Duy's "would it make sense to force the caller to create a new pathspec from an existing one by filtering out pathspec elements with attr-based narrowing?" question, I do not think it does.