Re: [PATCH/RFC v2 0/16] Introduce index file format version 5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <gitster@xxxxxxxxx> writes:

> Thomas Rast <trast@xxxxxxxxxxxxxxx> writes:
>
>> I like the general idea, too, but I think there is a long way ahead, and
>> we shouldn't hold up v5 on this.
>
> We shouldn't rush, only to keep some deadline, and regret it later
> that we butchered the index format without thinking things through.
> When this was added to the GSoC idea page, I already said upfront
> that this was way too big a topic to be a GSoC project, didn't I?

Let me spell out my concern.  There are two v5s here:

* The extent of the GSoC task.

* The eventual implementation of index-v5 that goes into Git mainline.

IMHO this thread is mixing up the two.  There indeed must not be any
rush in the final implementation of index-v5.  However, the GSoC ends in
less than two weeks, and I have to evaluate Thomas on whatever is
finished until then.

AFAIK Thomas is now cleaning up the existing code to be in readable
shape, using your feedback, which is great.  However, the above
suggestion is such a fuzzily-specified task that there is no way to even
find out what needs to be done within the next two weeks.  Perhaps it
makes sense, at this point, to wrap anything that ended up having _v[25]
suffixes in an index_ops like Duy did.  That's a long way from actually
following through on the idea, though.

> [...] "The new on-disk format is different from
> the current one, and as it is different from the current one, we can
> easily enhance it even more by hooking anything interesting to it!"
> does not sound like a valid argument.  
>
>> For example, for v5 it
>> would be far better if conflicted and resolve-undo entries were a
>> property of the normal index entry, instead of something that so happens
>> to be consecutive entries and in a completely different place,
>> respectively.
>
> I am not sure I am convinced.  Conflicts are already expressed by an
> attribute on a normal index entry (it is called "stage"), and
> because we check for "is the index fully merged" fairly often, it
> makes sense to have it in each entry.  Actually having an unmerged
> entry is a rare event (happens only during a mergy operation that
> gave control back to you), so we do not lose much by expressing them
> as consecutive entries.  Resolve-undo is far less often used, and is
> not an essential feature, so it makes perfect sense to have it as an
> optional index extension to allow versions of Git that are unaware
> of it to still use an index file that has it.

I picked this example because in the big picture, the current code goes
to silly contortions to shuffle data around.  Conflicts and resolve-undo
entries are two faces of the same coin, but the code does not express
this at all.  Whenever the user resolves a conflict, it removes the
existing index entries (consecutive in a flat table) and inserts them in
the resolve-undo tree (tree-shaped where every entry has all stages
embedded).  When using 'checkout -m' to recover the conflict, it goes
the other way.

v5 would simplify this: the difference between a conflict and a
resolve-undo entry is only one bit.  But because it needs to maintain v2
compatiblity, it first untangles the mixed conflict/resolve-undo data
and puts them in the right format, then later reassembles them.

So "v5 could do it faster if all the code were written for it" is only
half of it.  v5's data layout would also result in simpler data flow,
but as long as it is not allowed to exploit this, it's actually *more*
layers of complexity.

I think the part you snipped

>> the loops that iterate over the index [...] either
>> skip unmerged entries or specifically look for them.  There are subtle
>> differences between the loops on many points: what do they do when they
>> hit an unmerged entry?  Or a CE_REMOVED or CE_VALID one?

is a symptom of the same general problem: the data structures are sound,
but they are leaking all over the code and now we have lots of
complexity to do even simple operations like "for each unmerged entry".

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]