Clemens Buchacher <drizzd@xxxxxx> writes: > On Mon, Feb 01, 2016 at 10:17:24AM -0800, Junio C Hamano wrote: >> >> Your proposal is to redefine "is the working tree dirty?"; it would >> check if "git checkout -f" would change what is in the working tree. > > I like this definition. Sounds obviously right. So this is an illustration. The change to ce_compare_data() has a room for improvement in that it assumes that it can always slurp the whole blob in-core; it should try to use the streaming interface when it makes sense. Otherwise we would not be able to handle a blob that we used to be able to (as index_fd() streams), which would be a regression. The change to t0023 is merely an example that shows that existing tests assume the convert_to_git() way of defining the dirtyness of the working tree. It used to be OK to have core.autocrlf set to true, have LF terminated file on the working tree and add it to the index, and the resulting state was "We just added it to the index, and nobody touched the index nor the working tree file--by definition the working tree IS CLEAN". With your updated semantics, that no longer is true. "We just added it, but if we check it out, we would normalize the line ending to be CRLF on the working tree, so the working tree is dirty" is what happens. There are tons of tests that would break the same way all of which needs to be looked at and fixed if we were to go in this direction. read-cache.c | 53 +++++++++++++++++++++++++++++++++++++++++++++++++---- t/t0023-crlf-am.sh | 2 +- 2 files changed, 50 insertions(+), 5 deletions(-) diff --git a/read-cache.c b/read-cache.c index 84616c8..c284f78 100644 --- a/read-cache.c +++ b/read-cache.c @@ -156,16 +156,61 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st) ce_mark_uptodate(ce); } +/* + * Compare the data in buf with the data in the file pointed by fd and + * return 0 if they are identical, and non-zero if they differ. + */ +static int compare_with_fd(const char *input, ssize_t len, int fd) +{ + for (;;) { + char buf[1024 * 16]; + ssize_t chunk_len, read_len; + + chunk_len = sizeof(buf) < len ? sizeof(buf) : len; + read_len = xread(fd, buf, chunk_len ? chunk_len : 1); + + if (!read_len) + /* EOF on the working tree file */ + return !len ? 0 : -1; + + if (!len) + /* we expected there is nothing left */ + return -1; + + if (memcmp(buf, input, read_len)) + return -1; + input += read_len; + len -= read_len; + } +} + +/* + * Does the file in the working tree match what is in the index? + * That is, do we lose any data from the working tree copy if we + * did a new "git checkout" of that path out of the index? + */ static int ce_compare_data(const struct cache_entry *ce, struct stat *st) { int match = -1; int fd = open(ce->name, O_RDONLY); if (fd >= 0) { - unsigned char sha1[20]; - if (!index_fd(sha1, fd, st, OBJ_BLOB, ce->name, 0)) - match = hashcmp(sha1, ce->sha1); - /* index_fd() closed the file descriptor already */ + enum object_type type; + unsigned long size; + void *data = read_sha1_file(ce->sha1, &type, &size); + + if (type == OBJ_BLOB) { + struct strbuf worktree = STRBUF_INIT; + if (convert_to_working_tree(ce->name, data, size, + &worktree)) { + free(data); + data = strbuf_detach(&worktree, &size); + } + if (!compare_with_fd(data, size, fd)) + match = 0; + } + free(data); + close(fd); } return match; } diff --git a/t/t0023-crlf-am.sh b/t/t0023-crlf-am.sh index f9bbb91..5c086b4 100755 --- a/t/t0023-crlf-am.sh +++ b/t/t0023-crlf-am.sh @@ -27,7 +27,7 @@ EOF test_expect_success 'setup' ' git config core.autocrlf true && - echo foo >bar && + printf "%s\r\n" foo >bar && git add bar && test_tick && git commit -m initial -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html