Hi all, Sorry, life got in the way at an unfortunate moment. And it should very much be tagged "RFC" — thanks Ævar and Bagas for reading. Here's the additional background you could have used earlier on — I've bundled it together, but I'll happily follow up specific questions individually. I've CCed in a couple of other people who might find it interesting too. So Andrew & my motivation here is to provide some specialised filtering at clone/fetch time. In Kart[1] datasets are organised (simplistically) by primary key, but for spatial data we want to provide an orthogonal spatial extent filter which isn't part of the tree path, so we can't reuse the work done in the sparse filters. For a fetch obviously the server-side will require support for any indexing and ultimately deciding whether a particular blob should be part of the tree or not. In the original filter implementation [2], various "profiles" were alluded to as a case where the server operator might know a lot more about how the developer would want to use the repository than the client does, and a named profile for the server to interpret would be a reasonably clean approach. Referred again in [3]. Sparse filters, subject to the performance issues hopefully being improved by the cone-mode changes, cater to a lot of them. The existing built-in filters are fairly simple and there's a relatively simple interface for them to implement, extending them seems like a reasonable approach to me — potentially allowing people doing interesting things with partial clones to take it and run in a general way without too much effort. So the key element to clarify/understand for this proposal is that the main change to Git is the ability to use `--filter=extension:<name>[=<param>]` which passes through to git-upload-pack on the server side, to rev-list, which looks up / validates the filter name/parameter and applies it. So if you want to offer a custom filter, you build & set it up on the server and any Git client (if this is merged) can make use of it without any additional code. Wrt IPC, my very first proof of concept used an external process that rev-list launched, passed a series of oids/types via stdin, receiving yes/no responses via stdout. Even after quite a lot of OS-specific efforts to optimise the data flow across the pipes it was slow for non-trivial sized repositories (where it matters) — essentially boiling down to too much context switching between processes. Reorganising the existing filtering approach to do batching with deferred responses and parallelising the filtering into threads seemed like an awful lot of effort for potentially little gain, in a niche use case. Moving it in-process made it perform well: CPU use moves into the "deciding whether this object is in or out" phase rather than burning it in IPC & context-switching. I did build up a basic runtime-loadable plugin approach, but there was a reasonable amount of the internal git API that the filters need/touched (even things like hash sizes add a pile of complexity to it) unless it was reduced back to passing oids+types. My approach for plugins was basically "could I potentially implement the existing filters?" Without more of the git API I don't think this would be feasible. Plus Git would have to agree on and support a public ABI going forward, which for a potentially niche use case didn't seem reasonable to propose. Hence compile time: simpler; no ABI issues; the internal API doesn't change that much wrt things that filters are likely to do — if someone creates a plugin then it's on them to keep it building across git upgrades on their server; platform support is simpler; and if others find exciting uses for it then a runtime-loadable plugin API is always possible in future. And only the server ever needs any custom binaries. Licensing — yes, any filters would need to be GPL-licensed since they're compiled with Git. Only the server operator needs to concern themselves with complying with this (& associated licensing for any external libraries/etc a plugin might need) since that's where the plugin code is linked & runs. With the usual issue around internal use within an organisation not qualifying as "distribution" under the GPL. FWIW, for Kart we'll be GPL-licensing the server-side spatial filter plugin code for anyone who's interested. Hope this clarifies a bit. Rob :) [1] https://kartproject.org — building on Git to version geospatial datasets. Not sure if the videos ever got released (thanks Covid), but I did a talk at Git Merge 2020 on it when we released the first alpha. [2] https://public-inbox.org/git/1488999039-37631-1-git-send-email-git@xxxxxxxxxxxxxxxxx/ [3] https://public-inbox.org/git/79b06312-75ca-5a50-c337-dc6715305edb@xxxxxxxxxxxxxxxxx/