Re: [PATCH v5 5/5] builtin/merge-tree.c: implement support for `--write-pack`

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 13, 2023 at 05:34:42PM -0500, Taylor Blau wrote:

> > It might not be too hard to just let in-process callers access the
> > objects we've written. A quick and dirty patch is below, which works
> > with the test you suggested (the test still fails because it finds a
> > conflict, but it gets past the "woah, I can't find that sha1" part).
> 
> That's a very slick idea, and I think that this series has some legs to
> stand on now as a result.

I'm glad people seem to like it. ;)

I was mostly throwing it out there as an illustration, and I don't plan
on polishing it up further. But in case somebody else wants to work on
it, here are random extra thoughts on the topic:

  - rather than teach packfile.c about bulk checkin, I think it might be
    cleaner to let packed_git structs have a function pointer for
    accessing the index (and if NULL, naturally we'd open the .idx file
    in the usual way). And then bulk checkin could just register the
    "fake" packed_git

  - the flip side, though, is: would it be weird for other parts of the
    code to ever see the fake bulk packed_git in the list? It doesn't
    represent a real packfile the way the other ones do. So maybe it is
    better to keep it separate

  - I suspect this scheme may violate some bounds-checking of the
    packfiles. For example, we haven't written the hashfile trailer yet in
    the still-open packfile. But I think use_pack() assumes we always
    have an extra the_hash_algo->rawsz bytes of padding at the end. I'm
    not sure if that would ever cause us to accidentally read past the
    end of the file.

  - as you mentioned, an oidmap would be an obvious replacement for the
    existing unsorted list

  - the existing already_written() function faces the same problem. I
    don't think anybody cared because we'd usually only bulk checkin a
    few huge objects. But with --write-pack, we might write a lot of
    such objects, and the linear scan in already_written() makes it
    accidentally quadratic.

  - speaking of already_written(): it calls repo_has_object_file() to
    see if we can elide the write. It probably should be using
    freshen_*_object() in the usual way. Again, this is an existing bug
    but I suspect nobody noticed because the bulk checkin code path
    hardly ever kicks in.

-Peff




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux