Re: [PATCH 13/22] documentation: add documentation of the index-v5 file format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Duy Nguyen <pclouds@xxxxxxxxx> writes:

> On Thu, Jul 11, 2013 at 6:39 PM, Thomas Gummerer <t.gummerer@xxxxxxxxx> wrote:
>>> Question about the possibility of updating index file directly. If git
>>> updates a few fields of an entry (but not entrycrc yet) and crashes,
>>> the entry would become corrupt because its entrycrc does not match the
>>> content. What do we do? Do we need to save a copy of the entry
>>> somewhere in the index file (maybe in the conflict data section), so
>>> that the reader can recover the index? Losing the index because of
>>> bugs is big deal in my opinion. pre-v5 never faces this because we
>>> keep the original copy til the end.
>>>
>>> Maybe entrycrc should not cover stat fields and statcrc. It would make
>>> refreshing safer. If the above happens during refresh, only statcrc is
>>> corrupt and we can just refresh the entry. entrycrc still says the
>>> other fields are good (and they are).
>>
>> The original idea was to change the lock-file for partial writing to
>> make it work for this case.  The exact structure of the file still has
>> to be defined, but generally it would be done in the following steps:
>>
>>   1. Write the changed entry to the lock-file
>>   2. Change the entry in the index
>>   3. If we succeed delete the lock-file (commit the transaction)
>>
>> If git crashes, and leaves the index corrupted, we can recover the
>> information from the lock-file and write the new information to the
>> index file and then delete the lock-file.
>
> Ah makes sense. Still concerned about refreshing though. Updated files
> are usually few while refreshed files could be a lot more, increasing
> the cost at #1.

Any idea how common refreshing a big part of the cache is?  If it's not
to common, I'd prefer to leave the stat data and stat crc in the
entrycrc, as we can inform the user if something is wrong with the
index, be it from git failing, or from disk corruption.

On the other hand if refresh_cache is relatively common and usually
changes a big part of the index we should leave them out, as git can
still run correctly with incorrect stat data, but takes a little longer,
because it may have to check the file contents.  That will be trade-off
to make here.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]