On Thu, Jul 11, 2013 at 7:26 PM, Thomas Gummerer <t.gummerer@xxxxxxxxx> wrote: > Duy Nguyen <pclouds@xxxxxxxxx> writes: > >> On Thu, Jul 11, 2013 at 6:39 PM, Thomas Gummerer <t.gummerer@xxxxxxxxx> wrote: >>>> Question about the possibility of updating index file directly. If git >>>> updates a few fields of an entry (but not entrycrc yet) and crashes, >>>> the entry would become corrupt because its entrycrc does not match the >>>> content. What do we do? Do we need to save a copy of the entry >>>> somewhere in the index file (maybe in the conflict data section), so >>>> that the reader can recover the index? Losing the index because of >>>> bugs is big deal in my opinion. pre-v5 never faces this because we >>>> keep the original copy til the end. >>>> >>>> Maybe entrycrc should not cover stat fields and statcrc. It would make >>>> refreshing safer. If the above happens during refresh, only statcrc is >>>> corrupt and we can just refresh the entry. entrycrc still says the >>>> other fields are good (and they are). >>> >>> The original idea was to change the lock-file for partial writing to >>> make it work for this case. The exact structure of the file still has >>> to be defined, but generally it would be done in the following steps: >>> >>> 1. Write the changed entry to the lock-file >>> 2. Change the entry in the index >>> 3. If we succeed delete the lock-file (commit the transaction) >>> >>> If git crashes, and leaves the index corrupted, we can recover the >>> information from the lock-file and write the new information to the >>> index file and then delete the lock-file. >> >> Ah makes sense. Still concerned about refreshing though. Updated files >> are usually few while refreshed files could be a lot more, increasing >> the cost at #1. > > Any idea how common refreshing a big part of the cache is? No, probably not common. Anyone who does "find|xargs touch" deserves to be punished. Files can be edited, then reverted by an editor, but there should not be many of those. The only sensible case is "git checkout <path>" with lots of modified files. But that can't happen often. > If it's not to common, I'd prefer to leave the stat data and stat crc in the > entrycrc, as we can inform the user if something is wrong with the > index, be it from git failing, or from disk corruption. > > On the other hand if refresh_cache is relatively common and usually > changes a big part of the index we should leave them out, as git can > still run correctly with incorrect stat data, but takes a little longer, > because it may have to check the file contents. That will be trade-off > to make here. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html