On Tue, Jan 14, 2020 at 11:05 AM Jonathan Tan <jonathantanmy@xxxxxxxxxx> wrote: > > > That is, would it be sufficient if every replaced file were replaced > > with the exact text "me caga en la leche" instead of a custom hand- > > crafted replacement? I guess it's a bit complicated because while > > that's a reasonable blob, it's not a valid commit. So maybe this > > mechanism would be limited to blobs. I thought about whether we could > > a different flavor of replacement for commits, but those generally have > > to be custom because they each have different parents. > > Since the original email just discussed blobs, I'll confine myself to > discussing blobs. (Commits are trickier, as you said.) > > > And if that would be sufficient, could promisors be used for this? I > > don't know how those interact with fsck and the other commands that > > you're worried about. Basically, the idea would be to use most of the > > existing promisor code, and then have a mode where instead of visiting > > the promisor, we just always return "me caga en la leche" (and this > > does not have its SHA checked, of course). Maybe; it doesn't necessarily need to be the same object returned, and these replacements could be user-specified via replace refs... > Missing promisor objects do not prevent fsck from passing - this is part > of the original design (any packfiles we download from the specifically > designated promisor remote are marked as such, and any objects that the > objects in the packfile refer to are considered OK to be missing). Is there ever a risk that objects in the downloaded packfile come across as deltas against other objects that are missing/excluded, or does the partial clone machinery ensure that doesn't happen? (Because this was certainly the biggest pain-point with my "fake cheap clone" hacks.) > Currently, when a missing object is read, it is first fetched (there are > some more details that I can go over if you have any specific > questions). What you're suggesting here is to return a fake blob with > wrong hash - I haven't looked at all the callers of read-object > functions in detail, but I don't think all of them are ready for such a > behavioral change. git-replace already took care of that for you and provides that guarantee, modulo the --no-replace-objects & fsck & prune & fetch & whatnot cases that ignore replace objects as Kaushik mentioned. I took advantage of this to great effect with my "fake cheap clone" hacks. Based in part on your other email where you made a suggestion about promisors, I'm starting to think a pretty good first cut solution might look like the following: * user manually adds a bunch of replace refs to map the unwanted big blobs to something else (e.g. a README about how the files were stripped, or something similar to this) * a partial clone specification that says "exclude objects that are referenced by replace refs" * add a fake promisor to the downloaded promisor pack so that if anyone runs with --no-replace-objects or similar then they get an error saying the specified objects don't exist and can't be downloaded. Anyone see any obvious problems with this? > Maybe it would be sufficient to just make this work > in a more limited scope (e.g. checkout only - and if we need different > replacement blobs for different object IDs, maybe we could have > something similar to the clean/smudge filters). > > > This could work together with some sort refs/blacklist mechanism to > > enable the server to choose which objects the client replaces. > > In the original email, Kaushik mentioned objects larger than a certain > size - we already have support for that (--filter=blob:limit=1000000, > for example). Having said that, Git is already able to tolerate any > exclusion (of tree or blob) from the server - we already need this in > order to support changing of filters, for example.