Re: [PATCH 5/7] tmp-objdir: new API for creating and removing primary object dirs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 30, 2021 at 03:16:19PM +0200, Ævar Arnfjörð Bjarmason wrote:

> I also wonder how much if anything writing out the one file v.s. lots of
> loose objects is worthwhile on systems where we could write out those
> loose objects on a ramdisk, which is commonly available on e.g. Linux
> distros these days out of the box. If you care about performance but not
> about your transitory data using a ramdisk is generally much better than
> any other potential I/O optimization.

I'd think in general we won't be using a ramdisk, because tmp_objdir is
putting its directory inside $GIT_DIR/objects. It doesn't _have_ to, but
using a ramdisk works against its original purpose (which was to store
potentially quite a lot of data from an incoming push, and to be able to
rename it cheaply into its final resting place).

It would probably not be too hard to provide a flag that indicates the
intended use, though (and then we decide where to create the temporary
directory based on that).

> Finally, and I don't mean to throw a monkey wrench into this whole
> discussion, so take this as a random musing: I wonder how much faster
> this thing could be on its second run if instead of avoiding writing to
> the store & cleaning up, it just wrote to the store, and then wrote
> another object keyed on the git version and any revision paramaters
> etc., and then pretty much just had to do a "git cat-file -p <that-obj>"
> to present the result to the user :)
> 
> I suppose that would be throwing a lot more work at an eventual "git gc"
> than we ever do now, so maybe it's a bit crazy, but I think it might be
> an interesting direction in general to (ab)use either the primary or
> some secondary store in the .git dir as a semi-permanent cache of
> resolved queries from the likes of "git log".

I don't think it's crazy to just write the objects to the main object
store. We already generate cruft objects for some other operations
(Junio asked elsewhere in the thread about virtual trees for recursive
merges; I don't know the answer offhand, but I'd guess we do there).
They do get cleaned up eventually.

I'm not sure it helps performance much by itself. In a merge (or even
just writing a tree out from the index), by the time you realize you
already have the object, you've done most of the work to generate it.

I think what you're describing is to make some kind of cache structure
on top. That might be sensible (and indeed, the index already does this
with the cachetree extension). But it can also easily come later if the
objects are just in the regular odb.

-Peff



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux