Re: [PATCH 13/22] documentation: add documentation of the index-v5 file format

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jul 11, 2013 at 7:26 PM, Thomas Gummerer <t.gummerer@xxxxxxxxx> wrote:
> Duy Nguyen <pclouds@xxxxxxxxx> writes:
>
>> On Thu, Jul 11, 2013 at 6:39 PM, Thomas Gummerer <t.gummerer@xxxxxxxxx> wrote:
>>>> Question about the possibility of updating index file directly. If git
>>>> updates a few fields of an entry (but not entrycrc yet) and crashes,
>>>> the entry would become corrupt because its entrycrc does not match the
>>>> content. What do we do? Do we need to save a copy of the entry
>>>> somewhere in the index file (maybe in the conflict data section), so
>>>> that the reader can recover the index? Losing the index because of
>>>> bugs is big deal in my opinion. pre-v5 never faces this because we
>>>> keep the original copy til the end.
>>>>
>>>> Maybe entrycrc should not cover stat fields and statcrc. It would make
>>>> refreshing safer. If the above happens during refresh, only statcrc is
>>>> corrupt and we can just refresh the entry. entrycrc still says the
>>>> other fields are good (and they are).
>>>
>>> The original idea was to change the lock-file for partial writing to
>>> make it work for this case.  The exact structure of the file still has
>>> to be defined, but generally it would be done in the following steps:
>>>
>>>   1. Write the changed entry to the lock-file
>>>   2. Change the entry in the index
>>>   3. If we succeed delete the lock-file (commit the transaction)
>>>
>>> If git crashes, and leaves the index corrupted, we can recover the
>>> information from the lock-file and write the new information to the
>>> index file and then delete the lock-file.
>>
>> Ah makes sense. Still concerned about refreshing though. Updated files
>> are usually few while refreshed files could be a lot more, increasing
>> the cost at #1.
>
> Any idea how common refreshing a big part of the cache is?

No, probably not common. Anyone who does "find|xargs touch" deserves
to be punished. Files can be edited, then reverted by an editor, but
there should not be many of those. The only sensible case is "git
checkout <path>" with lots of modified files. But that can't happen
often.

> If it's not to common, I'd prefer to leave the stat data and stat crc in the
> entrycrc, as we can inform the user if something is wrong with the
> index, be it from git failing, or from disk corruption.
>
> On the other hand if refresh_cache is relatively common and usually
> changes a big part of the index we should leave them out, as git can
> still run correctly with incorrect stat data, but takes a little longer,
> because it may have to check the file contents.  That will be trade-off
> to make here.



--
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]