Re: reducing prune sync()s

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Fri, 30 May 2008, Frank Ch. Eigler wrote:
> 
> On Thu, May 29, 2008 at 05:27:35PM -0700, Linus Torvalds wrote:
> > [...]
> > >	  Or perhaps having the blanket sync be replaced a
> > > list of fsync()s for only the relevant git repository files?
> > [...]
> > Soemthing like this *may* work. THIS IS TOTALLY UNTESTED. And when I say 
> > "TOTALLY UNTESTED", I mean it. Zero testing. None. Nada. Zilch. Testing is 
> > for people who are actually interested in the feature (hint, hint).
> 
> The patch does add an fsync or two into the mix, a "git gc" or 
> "git repack -a" still goes through the "git-repack" shell script, which
> still did its "sync".

Yes.

But I actually think there is a simpler and more straightforward approach.

Instead of being careful when removing objects (whether old packs or loose 
objects that are made redundant by a new pack), the simpler approach is to 
just always fsync() the new pack when creating it.

I was always very careful to *not* make git depend on any serialized IO, 
but the reason for that was literally the fact that I wanted to make sure 
that I could batch up things efficiently, and do any serialization (if I 
wanted to) later. So it was literally always about the whole "apply 
several hundred patches in one go" kind of thing.

And the thing is, the repacking phase *is* the "serialize things later (if 
you want)" thing, so doing things synchronously at that point is actually 
perfectly fine.

And every single "let's remove objects" operation is literally always 
about the fact that we have a new better pack-file, making old objects 
redundant, so if we just create those new pack-files stably on disk, then 
any subsequent action pretty much by definition doesn't need any sync. 
Because we know that the only thing we can really care about *is* stable.

So this is a conceptually much more direct approach. Creating pack-files 
really is the special occasion, since it's (a) literally the event that 
causes other objects to potentially be stale (b) fairly rare and (c) not 
normally limited by disk-IO anyway (ie a "git fetch" will create a new 
pack-file, but it's normally limited by the network overhead or the cost 
of creating the pack-file, not by adding a fsync() to make sure that the 
end result is stable).

So I'll follow up with a two-patch series (the first to create pack-files 
and their indexes stably on disk, the second to just remove the now 
unnecessary 'sync()' calls). I'll give it *some* basic testing first, 
though.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux