Re: [PATCH 0/2] Controversial blob munging series

Junio C Hamano <junkio@xxxxxxx> · Mon, 23 Apr 2007 10:13:06 -0700

Johannes Schindelin <Johannes.Schindelin@xxxxxx> writes:

> On Sat, 21 Apr 2007, Junio C Hamano wrote:
>
>> This is on top of 'next' I'll push out after I am done with
>> v1.5.1.2 I am preparing today.
>> 
>> [1/2] Add 'filter' attribute and external filter driver definition.
>> [2/2] Add 'ident' conversion.
>
> I think this is great work! And it is useful, too. Let me describe a usage 
> scenario I have in mind.
>
> Being stuck with Pine, which still does not do Maildir, and wanting 
> to be able to read my mails as distributed as I am working on documents 
> and software projects, I always dreamt of having all my mail in Git.
>
> With filters, it should be relatively easy to do that. Before checking in, 
> the individual mailbox files are split, the contents are put into the 
> object database, and the mailbox file is replaced by a text file 
> consisting of the SHA1s of the mails.
>
> Ideally, I would eventually not only teach Pine to understand Maildir 
> format, but read and store the mails in a Git backend. Alas, I am way too 
> lazy for that.
>
> So, with filters I'd do the cheap and easy thing.
>
> You may not be able to appreciate the advantages of my scenario, but this 
> kind of flexibility is what makes Git so useful.

An earlier message $gmane/44896 from Linus comes to my mind.  An excerpt:

   The thing is, it's easy enough (although potentially _very_ expensive) to 
   run some per-file script at each commit and at each checkout. But there 
   are some fundamental operations that are even more common:

    - checking for "file changed", aka the "git status" kind of thing

      Anything we do would have to follow the same "stat" rules, at a 
      minimum. You can *not* afford to have to check the file manually.

      So especially if you combine several pieces into one, or split one file 
      into several pieces, your index would have to contain the entry 
      that matches the _filesystem_ (because that's what the index is all 
      about), but then the *tree* would contain the pieces (or the single 
      entry that matches several filesystem entries).

and I am inclined to think that this is quite fundamental.  I
think you just fell into category who want "extended semantics"
Linus talked about in $gmane/45214:

  I suspect that this gets some complaining off our back, but I *also* 
  suspect that people will actually end up really screwing themselves with 
  something like this and then blaming us and causing a huge pain down the 
  line when we've supported this and people want "extended semantics" that 
  are no longer clean.

which is kind of dissapointing.

Even if you somehow solved the issue of "stat" rule, I do not
know what your plans are to manage the blobs that you drop in
the object store.  The list of object names in the mail-index
file you are generating do not count as connectivity for the
purpose of fetch/push/fsck/prune.

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html