On Thu, Apr 5, 2012 at 14:49, Thomas Rast <trast@xxxxxxxxxxx> wrote: > Thomas Gummerer <italyhockeyfeed@xxxxxxxxx> writes: > >> -- Proposed solution -- >> The proposed solution is to redesign the index to a B-tree based format. This >> allows changes to the index in O(log(n)) time, with n being the number of >> entries in the index. > >> -- Solutions that were also considered -- >> - Append-only data structure >> - Database format >> - Padded structure > > This is quite complete already, which I think is great, but it's still > missing one "obvious" approach: a directory-tree based layout that uses > "flat" storage. That is, the entries grouped by directory and thus > arranged into the "natural" tree, so as to allow parsing only part of > it. But not pulling any tricks to make it easy to change; a nontrivial > change would mean a rewrite. How good do you think that could be? I have been wondering this myself. Aren't most updates to the index just updating the stat information of an existing entry? If so we could structure the index as flat lists for each directory similar to a canonical tree, but with a wider field to hold not just the SHA-1 but also the stat information of each file. If the entry is just the component name ("foo.c" and not "src/foo.c") and the SHA-1 and stat data, you can probably protect the entire entry with a CRC-32 for each entry. Updates can be performed in place by taking the write lock with index.lock as an empty file, then overwriting the SHA-1 and stat field followed, by updating the CRC-32. Readers that see the wrong CRC-32 for an entry can sleep for a short period, retry the read, and fail after some number of attempts if they cannot get a valid read of that entry. Adding a new file or deleting an existing file from the index would require a full rewrite. Within a single tree/directory entry, it probably doesn't have to be binary searchable. Canonical trees aren't. Linear scans through a directory is OK, so long as the scans are broken up by the directory tree structure just like they are in canonical trees. Dealing with the conflict stages during merges (1, 2, 3) could be handled by appending the conflict data at the end of the index file, when conflicts are resolved this tail region could be truncated back to the real end of the file. A bit could be set on the normal entry in the trees to denote there is a conflict, and additional stage data is expected in the tail region of the file. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html