Nguyen Thai Ngoc Duy <pclouds@xxxxxxxxx> writes: > add_to_index and remove_index_entry_at seem good places for the cut. > But do we need to redefine the location? That is one of the things we need to think about carefully. Of course, if add_to_index() just takes a pathname out of the ce the caller wants to add, you can define remove_from_index() that takes a pathname (and possibly a stage), finds the ce with that pathname in the index and removes it. But that would unnecessrily penalize the callers that follow "see if there is such an entry (i.e. "locate"), optionally inspect the entry and then decide to remove", especially if the "locate" is expensive. If a nonsignificant number of callers follow such a pattern, then having a separate "locate" API that can return a "location" that is expressed either as the number of entries to skip from the beginning (in a flat in-core array) or a pair of in-core "directory" structure and the index in the directory to let the caller find the entry quickly, and then later pass it to "remove", would be more appropriate. add_index_entry_at() may also not a bad thing to have if many callers turn out to follow a similar access pattern (i.e. locate, decide to or not to replace when there already is one, and then add it). > - for 3-5 years since v5 is released, we support v2 and v5 in > parallel. Other code can take advantage of v5, but it must neither > sacrifice v2 performance, compatibility nor maintainability > - after that, we deprecate v2. v2 is automatically converted to v5 in > memory. v2 perf may suffer but at that point we don't care any more as > the majority of users should have been migrated to v5 (*) As long as the performance of Git on a working tree that used to get certain performance back when it was using v2 does not degrade when it is converted to v5 or later, I think the above is a good way forward. > If the long term plan is actually that, we will need to stick to flat > array implementation for forseeable future as moving from it most > likely impacts v2 performance. I do not see why we need to "stick to"; I do not see why it is necessarily a bad thing if we end up choosing to "stick to" if the reason we choose it is because the flat in-core performs better. If the workload we _care_ about is served better by using an API that works over an in-core tree-shaped index data structure, I do not think it is unreasonable to read the v2 on-disk format and represent it as a tree-shaped index while we read it. Of course, there are things that are not as effective when reading from the flat v2 on-disk format (e.g. path limited reading will have to at least _scan_ the whole thing, even though it may process only the entries that match the pathspec) compared to reading from a tree-shaped on-disk format, but I doubt that the difference between the cost of reading into a flat array and the cost of reading and forming whatever non-flat data structure you seem to think is better is so big that it would negate the benefit of using a better in-core structure. > This might not be the best way forward as v2 incompatible features > (like keeping empty directories in index, what else?) may never come > until v2 is deprecated. I do not think "empty directories" matter to begin with, but even if it did, I do not think v2 is inherently incapable of being enhanced to record one if you really wanted to. Either you come up with a new "mode" bits and add it as a regular cache entry, or record the fact that there is a directory in a new index extension. The real issue to solve is to decide what semantics you want (e.g. What to do when you earlier have added an empty directory, added a file in it and then removed the file, making it empty again? What if that happened during a merge?), to verify the semantics you define are sane, to add "keep_empty_directory()" function to read-cache.c, and to sprinkle callers to the API function as needed. These have to be done regardless of the actual on-disk format. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html