On Tue, Jul 25, 2017 at 2:13 PM, Jeff King <peff@xxxxxxxx> wrote: > On Tue, Jul 25, 2017 at 01:52:46PM -0700, Junio C Hamano wrote: > >> Jeff King <peff@xxxxxxxx> writes: >> >> > As you can see, core.bigfilethreshold is a pretty blunt instrument. It >> > might be nice if .gitattributes understood other types of patterns >> > besides filenames, so you could do something like: >> > >> > echo '[size > 500MB] delta -diff' >.gitattributes >> > >> > or something like that. I don't think it's come up enough for anybody to >> > care too much about it or work on it. >> >> But attributes is about paths, at which a blob may or may not exist, >> so it is a bad fit to add conditionals that are based on sizes and >> types. > > Do attributes _have_ to be about paths? In practice we often use them to > describe objects, and paths are just the only mechanism we give to refer > to objects. But it is not actually a correct or rigorous mechanism in > some cases. For example, imagine I have a .gitattributes with: > > foo -delta > bar delta > > and then imagine I have a tree with both "foo" and "bar" pointing to the > same blob. When I run pack-objects, it wants to know whether to delta > the object. What should it do? > > The delta decision is really a property of the object. But the only > mechanism we give for selecting an object is by path, which we know is > not a one-to-one mapping with objects. So the results you get will > depend on which name we happened to see the object under first while > traversing. > > I think the case you are getting at is something like clean filters, > where we might not have an object at all. In that case I would argue > that a property of an object could never be satisfied (so neither > "size > 500" nor "size <= 500" could match). Whether object properties > are meaningful is in the eye of the code that is looking up the value. > Or more generally, the set of properties to be matched is in the eye of > the caller. So looking up a clean filter might want to define the size > property based no the working tree size. > > -Peff I recall a similar discussion on the different "big repo" approaches. Looking at the interface of LFS, there are things such as: git lfs fetch --recent git lfs fetch --all git lfs fetch [--exclude] <pathspec> so LFS provides both the way to address objects via time or by path, maybe even combined "I want everything from <pathspec 1> but only 'recent' things from <pathspec 2>". attributes can already be queried from pathspecs, and I think when designing from scratch we might put it the other way round: delta: bar everything <500m -delta foo binaries So in the far future, attributes may learn about more than just pathspecs that we currently use to assign labels, but could * include size * properties derived from the 'file' utility * be specific about certain objects (historic paths)