Re: [PATCH v2 12/19] read-cache: read index-v5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jul 17, 2013 at 3:11 PM, Thomas Gummerer <t.gummerer@xxxxxxxxx> wrote:
> Duy Nguyen <pclouds@xxxxxxxxx> writes:
>
> [..]
>
>>> +static int read_entries(struct index_state *istate, struct directory_entry **de,
>>> +                       unsigned int *entry_offset, void **mmap,
>>> +                       unsigned long mmap_size, unsigned int *nr,
>>> +                       unsigned int *foffsetblock)
>>> +{
>>> +       struct cache_entry *head = NULL, *tail = NULL;
>>> +       struct conflict_entry *conflict_queue;
>>> +       struct cache_entry *ce;
>>> +       int i;
>>> +
>>> +       conflict_queue = NULL;
>>> +       if (read_conflicts(&conflict_queue, *de, mmap, mmap_size) < 0)
>>> +               return -1;
>>> +       for (i = 0; i < (*de)->de_nfiles; i++) {
>>> +               if (read_entry(&ce,
>>> +                              *de,
>>> +                              entry_offset,
>>> +                              mmap,
>>> +                              mmap_size,
>>> +                              foffsetblock) < 0)
>>> +                       return -1;
>>> +               ce_queue_push(&head, &tail, ce);
>>> +               *foffsetblock += 4;
>>> +
>>> +               /*
>>> +                * Add the conflicted entries at the end of the index file
>>> +                * to the in memory format
>>> +                */
>>> +               if (conflict_queue &&
>>> +                   (conflict_queue->entries->flags & CONFLICT_CONFLICTED) != 0 &&
>>> +                   !cache_name_compare(conflict_queue->name, conflict_queue->namelen,
>>> +                                       ce->name, ce_namelen(ce))) {
>>> +                       struct conflict_part *cp;
>>> +                       cp = conflict_queue->entries;
>>> +                       cp = cp->next;
>>> +                       while (cp) {
>>> +                               ce = convert_conflict_part(cp,
>>> +                                               conflict_queue->name,
>>> +                                               conflict_queue->namelen);
>>> +                               ce_queue_push(&head, &tail, ce);
>>> +                               conflict_part_head_remove(&cp);
>>> +                       }
>>> +                       conflict_entry_head_remove(&conflict_queue);
>>> +               }
>>
>> I start to wonder if separating staged entries is a good idea. It
>> seems to make the code more complicated. The good point about conflict
>> section at the end of the file is you can just truncate() it out.
>> Another way is putting staged entries in fileentries, sorted
>> alphabetically then by stage number, and a flag indicating if the
>> entry is valid. When you remove resolve an entry, just set the flag to
>> invalid (partial write), so that read code will skip it.
>>
>> I think this approach is reasonably cheap (unless there are a lot of
>> conflicts) and it simplifies this piece of code. truncate() may be
>> overrated anyway. In my experience, I "git add <path>" as soon as I
>> resolve <path> (so that "git diff" shrinks). One entry at a time, one
>> index write at a time. I don't think I ever resolve everything then
>> "git add -u .", which is where truncate() shines because staged
>> entries are removed all at once. We should optimize for one file
>> resolution at a time, imo.
>
> Thanks for your comments.  I'll address the other ones once we decided
> to do with the conflicts.
>
> It does make the code quite a bit more complicated, but also has one
> advantage that you overlooked.

I did overlook, although my goal is to keep the code simpler, not more
comlicated. The thinking is if we can find everything in fileentries
table, the code here is simplified, so..

> We wouldn't truncate() when resolving
> the conflicts.  The resolve undo data is stored with the conflicts and
> therefore we could just flip a bit and set the stage of the cache-entry
> in the main index to 0 (always once we got partial writing).  This can
> be fast both in case we resolve one entry at a time and when we resolve
> a lot of entries.  The advantage is even bigger when we resolve one
> entry at a time, when we otherwise would have to re-write the index for
> each conflict resolution.

If I understand it correctly, filentries can only contain stage-0 or
stage-1 entries, "stage > 0" entries are stored in conflict data. Once
a conflict is solved, you update the stage-1 entry in fileentries,
turning it to stage-0 and recalculate the entry checksum. Conflict
data remains there to function as the old REUC extension. Correct?

First of all, if that's true, we only need 1 bit for stage in fileentries table.

Secondly, you may get away with looking up to conflict data in this
function by storing all stages in fileentries (now we need 2-bit
stage), replicated in conflict data for reuc function. When you
resolve conflict, you flip stage-1 to stage-0, and flip (a new bit) to
mark stage-2 entry invalid so the code knows to skip it. Next time the
index is rewritten, invalid entries are removed, but we still have old
stage entries in conflict data. The flipping business is pretty much
what you plan anyway, but the reading code does not need to look at
both fileentries and conflict data at the same time.

What do you think?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]