Junio C Hamano <gitster@xxxxxxxxx> writes: > Kjetil Barvik <barvik@xxxxxxxxxxxx> writes: > >> +static inline void >> +update_path_cache(unsigned int ret_flags, unsigned int track_flags, >> + int last_slash) >> +{ >> + /* Max 3 different path types can be cached for the moment! */ >> + unsigned int save_flags = >> + ret_flags & track_flags & (LSTAT_DIR| >> + LSTAT_NOTDIR| >> + LSTAT_SYMLINK); >> + if (save_flags && last_slash > 0 && last_slash < PATH_MAX) { >> + cache_flags = save_flags; >> + cache_len = last_slash; >> + } else { >> + cache_flags = 0; >> + cache_len = 0; >> + } >> +} > > I personally found this inline function with a single call site > distracting in following the logic. It does not make the indentation > level shallower, either. Also, the else part should probably call > clear_lstat_cache() to protect it from possible future enhancements to add > more state variables. Ok, I will remove the function and update the else-part. >> + >> +/* >> + * Check if name 'name' of length 'len' has a symlink leading >> + * component, or if the directory exists and is real, or not. >> + * >> + * To speed up the check, some information is allowed to be cached. >> + * This is indicated by the 'track_flags' argument. >> + */ >> +unsigned int >> +check_lstat_cache(int len, const char *name, unsigned int track_flags) >> +{ >> + int match_len, last_slash, max_len; >> + unsigned int match_flags, ret_flags; >> + struct stat st; >> + >> + /* Check if match from the cache for 2 "excluding" path types. >> + */ >> + match_len = last_slash = >> + greatest_common_path_cache_prefix(len, name); >> + match_flags = >> + cache_flags & track_flags & (LSTAT_NOTDIR| >> + LSTAT_SYMLINK); >> + if (match_flags && match_len == cache_len) >> + return match_flags; > > Let me see if I understand the logic behind this caching. When you have > checked A/B/C earlier and you already know B is a symlink, you remember > that A/B was a symlink.. You can fill a request to check A/B/$whatever > (assuming A/B does not change --- otherwise the caller should clear the > cache) from the cached data, because no matter what $whatever is, it will > result in the same "has-leading-symlink". > > Similarly, if you know A/B is not a directory from an earlier test, you > know that a request to check A/B/$whatever will result in the same ENOTDIR > no matter what $whatever is, so you can return early. > > The above "return match_flags" will not trigger if the cached path does > not have any leading symlink. So we know the matched part are all good > directories when we start lstat() loop. > > Am I following you so far? Yes you do! >> + /* Okay, no match from the cache so far, so now we have to >> + * check the rest of the path components and update the cache. >> + */ >> + ret_flags = LSTAT_DIR; >> + max_len = len < PATH_MAX ? len : PATH_MAX; >> + while (match_len < max_len) { >> + do { >> + cache_path[match_len] = name[match_len]; >> + match_len++; >> + } while (match_len < max_len && name[match_len] != '/'); > > You take one component from the input, and append it to the part that is > already known to be true directory (i.e. cached part and the part earlier > iteration of the loop checked so far), to be tested by lstat()... > >> + if (match_len >= max_len) >> + break; > > ... but you are not interested in the full input. If the lengt of name is larger than PATH_MAX all lstat() calls would fail with an ENAMETOOLONG error, at least on my Linux box, so I thought that it was not nessarry to test further. But maybe we should emdiatly return an error if name is too long? > We are only checking the leading path (e.g. check for "A/B/C" may > lstat() "A", "A/B" but not "A/B/C"). That is correct, and it is the same logic as in the has_symlink_leading_path() function. >> + last_slash = match_len; >> + cache_path[last_slash] = '\0'; >> + >> + if (lstat(cache_path, &st)) { >> + ret_flags = LSTAT_LSTATERR; >> + if (errno == ENOENT || errno == ENOTDIR) >> + ret_flags |= LSTAT_NOTDIR; > > If you tested "A/B" here and got ENOENT back, you know "A/B" does not > exist; you cache this knowledge as "A/B is not a directory" Correct. > (I also think you could use it as a cached knowledge that "A exists > and is a directory". I am not sure if you are taking advantage of > that). It does take adavantage of this fact. It will also do simmilar things with a symlink cached path. > What I do not understand about this code is the ENOTDIR case. If you > tested "A/B" and got ENOTDIR back, what you know is that "A" is not a > directory (if the path tested this round were deeper like "X/Y/A/B", you > know "X/Y/A" is not a directory, and you know "X" and "X/Y" are true > directories; otherwise the loop would have exited before this round when > you tested "X" or "X/Y" in the earlier rounds). Since the cache is supoosed to start from a known existing directory, and is testing each path component when the lstat("A/B") calls returns ENOTDIR, we should know the fact that the directory "A" exists, and that "A/B/" does not exists. > So as far as I can think of, ENOENT case and ENOTDIR case would give you > different information (ENOENT would say "A is a dir, A/B is not"; ENOTDIR > would say "A is not a dir"). I am confused how you can cache the same > path and same flag between these two cases here. I admit that I used copy-and-paste from other similar test's from the sourcecode: kjetil@localhost ~/git/git $ grep -Hn ENOENT.*ENOTDIR *.c builtin-apply.c:2364: else if ((errno != ENOENT) && (errno != ENOTDIR)) builtin-update-index.c:74: * - missing file (ENOENT or ENOTDIR). That's ok if we're builtin-update-index.c:81: if (err == ENOENT || err == ENOTDIR) diff-lib.c:30: if (errno != ENOENT && errno != ENOTDIR) lstat_cache.c:81: if (errno == ENOENT || errno == ENOTDIR) setup.c:191: if (errno != ENOENT && errno != ENOTDIR) From the 'man lstat' page I read the following for this 2 error codes: ENOENT A component of the path _path_ does not exist, or the path is an empty string. ENOTDIR A component of the path is not a directory. I would have guessed that for what we is looking for, it would be correct to threat both these error codes as the same. Is this wrong? -- kjetil -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html