Re: Should I store large text files on Git LFS?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jul 25, 2017 at 2:13 PM, Jeff King <peff@xxxxxxxx> wrote:
> On Tue, Jul 25, 2017 at 01:52:46PM -0700, Junio C Hamano wrote:
>
>> Jeff King <peff@xxxxxxxx> writes:
>>
>> > As you can see, core.bigfilethreshold is a pretty blunt instrument. It
>> > might be nice if .gitattributes understood other types of patterns
>> > besides filenames, so you could do something like:
>> >
>> >   echo '[size > 500MB] delta -diff' >.gitattributes
>> >
>> > or something like that. I don't think it's come up enough for anybody to
>> > care too much about it or work on it.
>>
>> But attributes is about paths, at which a blob may or may not exist,
>> so it is a bad fit to add conditionals that are based on sizes and
>> types.
>
> Do attributes _have_ to be about paths? In practice we often use them to
> describe objects, and paths are just the only mechanism we give to refer
> to objects.  But it is not actually a correct or rigorous mechanism in
> some cases.  For example, imagine I have a .gitattributes with:
>
>   foo -delta
>   bar delta
>
> and then imagine I have a tree with both "foo" and "bar" pointing to the
> same blob. When I run pack-objects, it wants to know whether to delta
> the object. What should it do?
>
> The delta decision is really a property of the object. But the only
> mechanism we give for selecting an object is by path, which we know is
> not a one-to-one mapping with objects. So the results you get will
> depend on which name we happened to see the object under first while
> traversing.
>
> I think the case you are getting at is something like clean filters,
> where we might not have an object at all. In that case I would argue
> that a property of an object could never be satisfied (so neither
> "size > 500" nor "size <= 500" could match). Whether object properties
> are meaningful is in the eye of the code that is looking up the value.
> Or more generally, the set of properties to be matched is in the eye of
> the caller. So looking up a clean filter might want to define the size
> property based no the working tree size.
>
> -Peff

I recall a similar discussion on the different "big repo" approaches.
Looking at the interface of LFS, there are things such as:

  git lfs fetch --recent
  git lfs fetch --all
  git lfs fetch [--exclude] <pathspec>

so LFS provides both the way to address objects via time or by path,
maybe even combined "I want everything from <pathspec 1> but only
'recent' things from <pathspec 2>".

attributes can already be queried from pathspecs, and I think when
designing from scratch we might put it the other way round:

    delta:
        bar
        everything <500m
    -delta
        foo
        binaries

So in the far future, attributes may learn about more than just
pathspecs that we currently use to assign labels, but could
* include size
* properties derived from the 'file' utility
* be specific about certain objects (historic paths)



[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux