Re: [PATCH v3 2/4] path: optimize common dir checking

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 08/12/2015 11:57 PM, David Turner wrote:
> Instead of a linear search over common_list to check whether
> a path is common, use a trie.  The trie search operates on
> path prefixes, and handles excludes.
> 
> Signed-off-by: David Turner <dturner@xxxxxxxxxxxxxxxx>
> ---
> 
> Probably overkill, but maybe we could later use it for making exclude
> or sparse-checkout matching faster (or maybe we have to go all the way
> to McNaughton-Yamada for that to be truly worthwhile).

Let's take a step back.

We have always had a ton of code that uses `git_path()` and friends to
convert abstract things into filesystem paths. Let's take the
reference-handling code as an example:

`git_path("refs/heads/master")` returns something like
".git/refs/heads/master", which happens to be the place where we would
store a loose reference with that name. But in reality,
"refs/heads/master" is a reference name, not a fragment of a path. It's
just that the reference code knows that the transformation done by
`git_path()` *accidentally* happens to convert a reference name into the
name of the path of the corresponding loose reference file.

In fact, the reference code is even smarter than that. It knows that
within submodules, `git_path()` does *not* do the right mapping. In
those cases it calls `git_path_submodule()` instead.

But now we have workspaces, and things have become more complicated.
Some references are stored in `$GIT_DIR`, while others are stored in
`$GIT_COMMON_DIR`. Who should know all of these details?

The current answer is that the reference-handling code remains (mostly)
ignorant of workspaces. It just stupidly calls `git_path()` (or
`git_path_submodule()`) regardless of the reference name. It is
`git_path()` that has grown the global insight to know which files are
now stored in `$GIT_COMMON_DIR` vs `$GIT_DIR`. Now it helpfully
transforms "refs/heads/master" into "$GIT_COMMON_DIR/refs/heads/master"
but transforms "refs/worktree/foo" into "$GIT_DIR/refs/worktree/foo". It
has developed similar insight into lots of other file types. IT KNOWS
TOO MUCH. And because of that, it become a lot more complicated and
might even be a performance problem.

This seems crazy to me. It is the *reference* code that should know
whether a particular reference should be stored under `$GIT_DIR` or
`$GIT_COMMON_DIR`, or indeed whether it should be stored in a database.

We should have two *stupid* functions, `git_workspace_path()` and
`git_common_path()`, and have the *callers* decide which one to call.

The only reason to retain a knows-everything `git_path()` function is as
a crutch for 3rd-party applications that think they are clever enough to
grub around in `$GIT_DIR` at the filesystem level. But that should be
highly discouraged, and we should make it our mission to provide
commands that make it unnecessary.

Michael

-- 
Michael Haggerty
mhagger@xxxxxxxxxxxx

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]