Re: git fsck not identifying corrupted packs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:

> On Mon, 19 Oct 2009, Johannes Sixt wrote:
>
>> Sergio Callegari schrieb:
>> > Is there a means to have fsck to a truly full check on the sanity of a 
>> > repo?
>> 
>> git fsck --full
>> 
>> RTFM, please.
>
> Now, now.
>
> If you were to test a new filesystem, say, wonderfulfs, and wanted to 
> check its integrity, would you not just run "fsck-wonderfulfs" if that 
> exists, rather than reading the fantamagastic manual?  Would you not 
> expect that it Does The Right Thing?  Would you not expect that it 
> follows the Law Of Minimal Surprise?
>
> So FWIW I can see where Sergio is coming from.

Linus and other git developers from the early days trained their fingers
to type the command, every once in a while even without thinking, to check
the consistency of the repository back when the lower core part of the git
was still being developed.  Developers who wanted to make sure that git
correctly dealt with packfiles could deliberately trigger their creation
and checked them after they were created carefully, but loose objects are
the ones that are written by various commands from random codepaths.  It
made some technical sense to have a mode that checked only loose objects
from the debugging point of view for that reason.

    Side note.  I think the help description of --full option is wrong (or
    at least stale).  We always look at alternate object store these days
    since e15ef66 (fsck: check loose objects from alternate object stores
    by default, 2009-01-30).  It probably should read "check packed
    objects fully" or something.

The above paragraph is merely a historical background, and in this case
the "history" refers to early-to-mid 2005.  Even for git developers there
no longer is any reason to type "git fsck" in fear of some newly created
objects might be corrupt due to recent change to git these days.

The reason we did not make "--full" the default is probably we trust our
filesystems a bit too much.  At least, we trusted filesystems more than we
trusted the lower core part of git that was under development ;-)

Once a packfile is created and we always use it read-only, there didn't
seem to be much point in suspecting that the underlying filesystems or
disks may corrupt them in such a way that is not caught by the SHA-1
checksum over the entire packfile and per object checksum.  That trust in
the filesystems might have been a good tradeoff between fsck performance
and reliability on platforms git was initially developed on and for, but
it might not be true anymore as we run on more platforms these days.

It probably makes sense to ship 1.7.0 with a version of "fsck" in which
"--full" is the default; it would still accept "--full" but it would be a
no-op.  This would be a backward incompatible change, but the difference
is primarily about performance ("it takes a lot longer than before!"), and
not correctness, so we probably can live with it.  As I already said,
there is not much reason to run "fsck" every five minutes anymore to begin
with (unless your filesystem is so unreliable that it might eat one file
every five minutes, that is).

It probably is also a good idea to add a "--loose" option that does what
"fsck" currently does without "--full".  It is a good name because (1) to
people who do not know the internal of git, it means "check only loosely",
which would discourage them from running "fack" with that option to begin
with, and (2) to others, it exactly tells what the option makes the
command check.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]