Hi, On Mon, 21 Feb 2022 at 03:11, Taylor Blau <me@xxxxxxxxxxxx> wrote: > > we would still be leaving repository > corruption on the table, just making it marginally more difficult to > achieve. While reviewing John's patch I initially wondered if a better approach might be something like `git repack -a -d --exclude-stdin`, passing a list of specific objects to exclude from the new pack (sourced from rev-list via a filter, etc). To me this seems like a less dangerous approach, but my concern was it doesn't use the existing filter capabilities of pack-objects, and we end up generating and passing around a huge list of oids. And of course any mistakes in the list generation aren't visible until it's too late. I also wonder whether there's a race condition if the repository gets updated? If you're moving large objects out in advance, then filtering the remainder there's nothing to stop a new large object being pushed between those two steps and getting dropped. My other idea, which is growing on me, is whether repack could generate two valid packs: one for the included objects via the filter (as John's change does now), and one containing the filtered-out objects. `git repack -a -d --split-filter=<filter>` Then a user could then move/extract the second packfile to object storage, but there'd be no way to *accidentally* corrupt the repository by using a bad option. With this approach the race condition above goes away too. $ git repack -a -d -q --split-filter=blob:limit=1m pack-37b7443e3123549a2ddee31f616ae272c51cae90 pack-10789d94fcd99ffe1403b63b167c181a9df493cd First pack identifier being the objects that match the filter (ie: commits/trees/blobs <1m), and the second pack identifier being the objects that are excluded by the filter (blobs >1m). An astute --i-know-what-im-doing reader could spot that you could just delete the second packfile and achieve the same outcome as the current proposed patch, subject to being confident the race condition hadn't happened to you. Thanks, Rob :)