Re: RFC on packfile URIs and .gitmodules check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Jonathan Tan <jonathantanmy@xxxxxxxxxx> writes:

> We wouldn't be OK, actually. Suppose we have a separate packfile
> containing only the ".gitmodules" blob - when we call fsck_finish(), we
> would not have downloaded the other packfile yet. Git processes the
> entire fetch response by piping the inline packfile (after demux) into
> index-pack (which is the one that calls fsck_finish()) before it
> downloads any of the other packfile(s).

Is that order documented as a requirement for implementation?

Naïvely, I would expect that a CDN offload would be to relieve
servers from the burden of having to repack ancient part of the
history all the time for any new "clone" clients and that is what
the "here is a URI, go fetch it because I won't give you objects
that already appear there" feature is about.  Because we expect that
the offloaded contents would not be up-to-date, the traditional
packfile transfer would then is used to complete the history with
objects necessary for the parts of the history newer than the
offloaded contents.

And from that viewpoint, it sounds totally backwards to start
processing the up-to-the-minute fresh packfile that came via the
traditional packfile transfer before the CDN offloaded contents are
fetched and stored safely in our repository.

We probably want to finish interaction with the live server as
quickly as possible---it would go counter to that wish if we force
the live part of the history hang in flight, unprocessed, while the
client downloads offloaded bulk from CDN and processes it, making
the server side stuck waiting for some write(2) to go through.

But I still wonder if it is an option to locally delay the
processing of the up-to-the-minute-fresh part.

Instead of feeding what comes from them directly to "index-pack
--fsck-objects", would it make sense to spool it to a temporary, so
that we can release the server early, but then make sure to fetch
and process packfile URI material before coming back to process the
spooled packdata.  That would allow the newer part of the history to
have newer trees that still reference the same old .gitmodules that
is found in the frozen packfile that comes from CDN, no?

Or can there be a situation where some objects in CDN pack are
referred to by objects in the up-to-the-minute-fresh pack (e.g. a
".gitmodules" blob in CDN pack is still unchanged and used in an
updated tree in the latest revision) and some other objects in CDN
pack refer to an object in the live part of the history?  If there
is such a cyclic dependency, "index-pack --fsck" one pack at a time
would not work, but I doubt such a cycle can arise.

Thanks.




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux