Re: [PATCH] Define a version of lstat(2) specially for copy operation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Sixt <j.sixt@xxxxxxxxxxxxx> writes:

> The "underlying system" in this case is Cygwin, and it *does* have an
> executable bit.
>
> But the FS gymnastics that implement it are slow and affect all lstat()
> calls, so we have replaced lstat() with a simpler and faster
> implementation. Only that the replacement doesn't know about the X bit
> anymore; it always returns mode 0666.
>
> Therefore, if a file is created whose mode is influenced by the fast
> lstat(), then it will always be non-X. The access(, X_OK) call on the hook
> script would do the right thing if only the script were created with the
> correct mode. access(, X_OK) fails because the file was created with non-X
> permissions.

OK.  To make sure that I understand the situation better, allow me to
paraphrase what you said.

I've always thought core.filemode was about "On FAT the filesystem does
not have x-bit and Cygwin fakes it by looking at .exe extension and using
other heuristics", but this lstat() "trick" is not about that.  It is "On
Windows you need to issue multiple FS API calls in order to get full
information to fill struct stat, which is expensive.  We omit the one to
learn about the x-bit; it won't make a difference because most people run
with core.filemode."

If that is what is going on, I have a few quick questions:

 (1) How much of "struct stat" information can you get with the easiest
     and quickest FS API call on Windows?  Does it give you big enough
     subset of what we store in the cache entry to detect modification
     reliably?

 (2) When you do an equivalent of "chmod +x path" on Windows, does it
     change one of the fields in your answer to (1)?  In other words, can
     you detect such a change with the fastest and reliable-enough FS API
     call?

If answers to both questions are "yes", then I suspect we might be able to
improve the situation without sacrificing performance too much.

In the generic part, we use lstat(2) for various purposes:

 * To learn existence and type of a FS entity, in order to decide if we
   need to descend into a directory or treat it as a blob;

 * To learn the current "status" that can be compared with what is stored
   in the cache entry, in order to decide if the FS entity hasn't been
   modified;

 * To learn the last-modified timestamp, in order to implement the
   racy-git avoidance logic;

 * To learn x-bit, so that we can update it in the cache entry, and give
   the appropriate x-bit to a file that we create (Alex's
   lstat_for_copy() is about this usage).

If the first three can be learned with a fast-and-reliable-enough FS API
call, and the x-bit and perhaps something else that do not fall into the
first three need a lot more work to figure out, we could split lstat()
into two.  The result from a "fast" variant of lstat() is used for the
first three and allow platform implementation of it to be incomplete
(i.e. the Cygwin "trick" to omit x-bit is OK for that purpose), and add
another "slow" variant to complement it:

	int lstat_fast(const char *path, struct stat *st);
        void lstat_remainder(const char *path, struct stat *st);

On POSIX systems, we implement it as

        #define lstat_fast(path,st) lstat((path),(st))
        #define lstat_remainder(path,st) do { ; /* nothing */ } while (0)
 
but on Windows and Cygwin, we might implement the format to get
good-enough information for comparison purposes and implement latter to
fill the x-bit and some other parts that are expensive to learn.

Most of the callers that are only interested in seeing if a path has
changed can use the lstat_fast(), while the codepath that actually does
use the information for modified path can augment the information a
previous call to "fast" variant obtained with an additional call to the
"slow" lstat_remainder().

The attached patch to convert a part of read-cache.c is only for
illustration purposes.

The refresh_cache_ent() function first calls lstat_fast() to get cheap
information that is good enough, gives it to ie_match_stat(), to see if
anything has changed, and let fill_stat_cache_info() to update the cached
information for later comparison.

We do not have to touch ce_match_stat_basic() that does care about x-bit
but assuming that the answer to question (2) is "yes", we would not miss
MODE_CHANGED report from it; other information such as timestamps would
have changed in such a case as well, and ie_match_stat() that called
ce_match_stat_basic() will still say "changed" for such an entry.

The add_file_to_index() function checks if the path exists with the "fast"
one, gives the result to add_to_index() that wants the full mode bits
hence invokes the "slow" one to fill in the remainder.

If you have to pass information between "fast" and "slow" variant other
than what you can put in "struct stat" in order for "slow" one to take
advantage of what "fast" one already has done, we would need to introduce
something like

	struct gitstat {
        	struct stat st;
		some other info for platform;
	};

and pass that around instead, which would become a larger change, though.
I am hoping it won't be the case.

And if we do not need such an extra "struct gitstat", then there is no
reason for us to introduce lstat_fast().  We can just use lstat(), keep
the "fast and incomplete" Cygwin one as lstat(), but insert calls to
lstat_remainder() at strategic places.  Immediately before copy_file() is
called in builtin-init-db.c::copy_templates_1() would be one of those
strategic places.



diff --git a/read-cache.c b/read-cache.c
index 3f58711..d12d11b 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -562,7 +562,7 @@ static void record_intent_to_add(struct cache_entry *ce)
 int add_to_index(struct index_state *istate, const char *path, struct stat *st, int flags)
 {
 	int size, namelen, was_same;
-	mode_t st_mode = st->st_mode;
+	mode_t st_mode;
 	struct cache_entry *ce, *alias;
 	unsigned ce_option = CE_MATCH_IGNORE_VALID|CE_MATCH_RACY_IS_DIRTY;
 	int verbose = flags & (ADD_CACHE_VERBOSE | ADD_CACHE_PRETEND);
@@ -571,6 +571,8 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 	int add_option = (ADD_CACHE_OK_TO_ADD|ADD_CACHE_OK_TO_REPLACE|
 			  (intent_only ? ADD_CACHE_NEW_ONLY : 0));
 
+	lstat_remainder(path, st);
+	st_mode = st->st_mode;
 	if (!S_ISREG(st_mode) && !S_ISLNK(st_mode) && !S_ISDIR(st_mode))
 		return error("%s: can only add regular files, symbolic links or git-directories", path);
 
@@ -637,7 +639,7 @@ int add_to_index(struct index_state *istate, const char *path, struct stat *st,
 int add_file_to_index(struct index_state *istate, const char *path, int flags)
 {
 	struct stat st;
-	if (lstat(path, &st))
+	if (lstat_fast(path, &st))
 		die("%s: unable to stat (%s)", path, strerror(errno));
 	return add_to_index(istate, path, &st, flags);
 }
@@ -1013,7 +1015,7 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate,
 		return ce;
 	}
 
-	if (lstat(ce->name, &st) < 0) {
+	if (lstat_fast(ce->name, &st) < 0) {
 		if (err)
 			*err = errno;
 		return NULL;


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux