Re: [PATCH v2 5/5] sparse-index: improve lstat caching of sparse paths

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 6/27/24 5:14 PM, Junio C Hamano wrote:
"Derrick Stolee via GitGitGadget" <gitgitgadget@xxxxxxxxx> writes:
+/**
+ * Return the length of the largest common substring that ends in a

"largest" here is understandable (it means longest).

+ * slash ('/') to indicate the largest common parent directory. Returns

here I find it a bit confusing.  It is the deepest common parent
directory between the two (or "the common parent directory with the
longest path"), but my initial reaction was "largest common parent
directory?  wouldn't the root level by definition the 'largest'
(having the largest number of paths underneath) directory that is
common?)".

I'm not sure why my brain got stuck on "largest" here when "longest"
would be a better choice.

+ * zero if no common directory exists.
+ */
+static size_t max_common_dir_prefix(const char *path1, const char *path2)
+{
+	size_t common_prefix = 0;
+	for (size_t i = 0; path1[i] && path2[i]; i++) {
+		if (path1[i] != path2[i])
+			break;
+
+		/*
+		 * If they agree at a directory separator, then add one
+		 * to make sure it is included in the common prefix string.
+		 */
+		if (path1[i] == '/')
+			common_prefix = i + 1;
+	}
+
+	return common_prefix;
+}

Looking good.  I assume that these two paths are relative to the
top-level of the working tree (in other words, they do not begin
with a slash)?

Yes, they do not begin with a slash. In this specific application, they
are paths from cache entries in the index. If this were generalized for
use elsewhere then paths beginning with a slash should be considered.

  static int path_found(const char *path, struct path_found_data *data)
  {
...
+	 * At this point, we know that 'path' doesn't exist, and we know that
+	 * the parent directory of 'data->dir' does exist. Let's set 'data->dir'
+	 * to be the top-most non-existing directory of 'path'. If the first
+	 * parent of 'path' exists, then we will act as though 'path'
+	 * corresponds to a directory (by adding a slash).
  	 */
-	newdir = strrchr(path, '/');
-	if (!newdir)
-		return 0; /* Didn't find a parent dir; just return 0 now. */
+	common_prefix = max_common_dir_prefix(path, data->dir.buf);
...
+	strbuf_setlen(&data->dir, common_prefix);
+	while (1) {

Oooh, nice.  So you learned /a/b/c/d did not exist, so check /a/b/c,
and then /a/b/ and stop, because you know /a does exist already.
With luck, our next query is for /a/b/c/e or for /a/b/f, and knowing
that /a/b/ does not exist would allow us to say "no, they do not
exist" without having to lstat().  OK.

I probably should have led with an example like this, as it makes the
point more clearly than my abstract description.

Thanks,
-Stolee




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux