Re: [PATCH 0/6] [RFC] partial-clone: add ability to refetch with expanded filter

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Junio

Thanks for your input. Hopefully some of the other partial-clone
interested folks will chime in too.

On Tue, 1 Feb 2022 at 20:13, Junio C Hamano <gitster@xxxxxxxxx> wrote:
>
> It sounds like a useful thing to have such a "refetch things"
> option.

Any improved suggestions on the argument name? I thought of
--refetch but `fetch --refetch` seemed more confusing to explain.

> Makes me wonder how well these two features work together (or if
> they are mutually exclusive, that is fine as well as a starting
> point).

I don't see any particular reason they can't work together - as you say,
the filtering is orthogonal to shallow on a conceptual level. I haven't
added a test for that scenario yet but will do for a v1.

> If you update the filter specification to make it narrower (e.g. you
> start from blob:limit=1m down to blob:limit=512k), would we transfer
> nothing (which would be ideal), or would we end up refetching
> everything that are smaller than 512k?

As you spot, the latter. I can't see a straightforward way of telling the
server "I have these trees/blobs already" without generating (one way
or the other) a list of millions of oids, then transferring & negotiating
with it.

> ... it is not smart enough to stell them to exclude what we _ought_
> to have by telling them what the _old_ filter spec was.  That's OK
> for a starting point, I guess.

The client doesn't really know what the local repository *has* —
potentially several filters could have been applied and used for fetches
at different points in the commit history, as well as objects dynamically
fetched in. Even a filter set in the config only applies to subsequent
fetches, and only if --filter isn't used to override it.

> Hopefully, at the end of this
> operation, we should garbage collect the duplicated objects by
> default (with an option to turn it off)?

I haven't explicitly looked into invoking gc yet, but yes, it'd be a bit of
a waste if it wasn't kicked off by default. Maybe reusing gc.auto

> In other words, a repository that used to be a partial clone can
> become a full clone by using the option _and_ not giving any filter.

For that specific case I think you can already do it by removing the
promisor flag in the remote config, potentially adding it back if you
wanted to keep it partial again from that point forward.

> I think that is an intuitive enough behaviour and a natural
> consequence to the extreme of what the feature is.  Compared to
> making a full "git clone", fetching from the old local (and narrow)
> repository into it and then discarding the old one, it would not
> have any performance or storage advantage, but it probably is more
> convenient.

It's certainly cleaner than abusing --deepen, or temporarily moving pack
files out of the way, or starting over with a fresh clone & copying config.

Thanks,

Rob :)




[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux