David Kastrup <dak@xxxxxxx> writes: > Junio C Hamano <gitster@xxxxxxxxx> writes: > >> No objections as long as a patch is cleanly made without >> regression. It's just nobody agreed that it is "quite serious" >> yet so far, and no fundamental reason against it. > > Thanks. It certainly is not serious for the Linux kernel source, but > seems awkward for quite a few situations. Anyway, what is your take > on the situation I described? Didn't I say I do not have an objection for somebody who wants to track empty directories, already? I probably would not do that myself but I do not see a reason to forbid it, either. The right approach to take probably would be to allow entries of mode 040000 in the index. Traditionally, we allowed only 100644 (blobs as regular files) and 120000 (blobs as symlinks). We recently added 160000 (commit from outer space, aka subproject). And we do that for all directories, not just empty ones. So if you have fileA, empty/, sub/fileB tracked, your index would probably have these four entries, immediately after read-tree of an existing tree object: 100644 15db6f1f27ef7a... 0 fileA 040000 4b825dc642cb6e... 0 empty 040000 e125e11d3b63e3... 0 sub 100644 52054201c2a872... 0 sub/fileB Making sure that empty/ directory exists in the working tree is probably done in entry.c; we have been touching that area in an unrelated thread in the past few days. If you add sub/fileC, with "update-index" (and "add"), you invalidate the SHA-1 object name you stored for "sub" (because there is no point recomputing the tree object until you know you need a subtree for "sub" part, which does not happen until the next "write-tree"), and end up with something like: 100644 15db6f1f27ef7a... 0 fileA 040000 4b825dc642cb6e... 0 empty 040000 00000000000000... 0 sub 100644 52054201c2a872... 0 sub/fileB 100644 705bf16c546f32... 0 sub/fileC These "missing" SHA-1 would need to be recomputed on-demand. We have had necessary infrastructure to do this "keeping untouched tree object names in the index" for quite some time, but it is not a part of the index proper (it is stored in an extension section in the index file, to keep the index compatible with older versions of git). Having made it sound so easy, here are the issues I would expect to be nontrivial (but probably not rocket surgery either). * unpack-trees, which is the workhorse for twoway merge (aka "switching branches") and threeway merge, has a convoluted logic to avoid D/F conflicts; it can probably be cleaned up once we do the above conversion so that the index starts saying "Hey, I have a directory here" more explicitly. The end result would probably be a code easier to follow. * status, update-index --refresh, and diff-files cares about the information cached in the index from the last time lstat(2) is run on each entry. What we should store there for "tree" entries is very unclear to me, but probably we should teach them to ignore the stat-matching logic for these entries. * diff-index walks the index and a tree in parallel but does not currently expect to see a tree object in the index. It needs to be taught to ignore these "tree" entries. * merge-recursive and merge-index walk the index, coming up with the merge results one path at a time. They also need to be taught to ignore these "tree" entries. * diff-index and "read-tree -m" should be taught to take advantage of the "tree" entries in the index. For example, if diff-index finds the "tree" entry in the index and the subtree found from the tree object exactly match, it does not even have to descend into the tree, which would be a huge performance win (because you do not have to open the subtree and its subtrees from the tree side; you already have read everything on the index side, and still have to skip the entries in the directory). "read-tree -m" also should be able to optimize two identical subtrees in the 2 or 3 trees involved. Even if we follow the "lazy invalidate" strategy to maintain the "tree" entries in the normal codepath, we could have a special operation that says "now update all the tree entries by recomputing the tree object names as needed". Perhaps we might want to initiate such an operation before "read-tree -m" automatically. - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html