On Thu, Dec 06 2018, Jeff King wrote: > On Thu, Dec 06, 2018 at 10:08:57AM +0900, Junio C Hamano wrote: > >> Jeff King <peff@xxxxxxxx> writes: >> >> > In my opinion this feature is so contrary to Git's general assumptions >> > that it's likely to create a ton of information leaks of the supposedly >> > protected data. >> > ... >> >> Yup, with s/implemented/designed/, I agree all you said here >> (snipped). > > Heh, yeah, I actually scratched my head over what word to use. I think > Git _could_ be written in a way that is both compatible with existing > repositories (i.e., is still recognizably Git) and is careful about > object access control. But either way, what we have now is not close to > that. > >> > Sorry I don't have a more positive response. What you want to do is >> > perfectly reasonable, but I just think it's a mismatch with how Git >> > works (and because of the security impact, one missed corner case >> > renders the whole thing useless). >> >> Yup, again. >> >> Storing source files encrypted and decrypting with smudge filter >> upon checkout (and those without the access won't get keys and will >> likely to use sparse checkout to exclude these priviledged sources) >> is probably the only workaround that does not involve submodules. >> Viewing "diff" and "log -p" would still be a challenge, which >> probably could use the same filter as smudge for textconv. > > I suspect there are going to be some funny corner cases there. I use: > > [diff "gpg"] > textconv = gpg -qd --no-tty > > which works pretty well, but it's for files which are _never_ decrypted > by Git. So they're encrypted in the working tree too, and I don't use > clean/smudge filters. > > If the files are already decrypted in the working tree, then running > them through gpg again would be the wrong thing. I guess for a diff > against the working tree, we would always do a "clean" operation to > produce the encrypted text, and then decrypt the result using textconv. > Which would work, but is rather slow. > >> I wonder (and this is the primary reason why I am responding to you) >> if it is common enough wish to use the same filter for smudge and >> textconv? So far, our stance (which can be judged from the way the >> clean/smudge filters are named) has been that the in-repo >> representation is the canonical, and the representation used in the >> checkout is ephemeral, and that is why we run "diff", "grep", >> etc. over the in-repo representation, but the "encrypted in repo, >> decrypted in checkout" abuse would be helped by an option to do the >> reverse---find changes and look substrings in the representation >> used in the checkout. I am not sure if there are other use cases >> that is helped by such an option. > > Hmm. Yeah, I agree with your line of reasoning here. I'm not sure how > common it is. This is the first I can recall it. And personally, I have > never really used clean/smudge filters myself, beyond some toy > experiments. > > The other major user of that feature I can think of is LFS. There Git > ends up diffing the LFS pointers, not the big files. Which arguably is > the wrong thing (you'd prefer to see the actual file contents diffed), > but I think nobody cares in practice because large files generally don't > have readable diffs anyway. I don't use this either, but I can imagine people who use binary files via clean/smudge would be well served by dumping out textual metadata of the file for diffing instead of showing nothing. E.g. for a video file I might imagine having lines like: duration-seconds: 123 camera-model: Shiny Thingamabob Then when you check in a new file your "git diff" will show (using normal diff view) that: - duration-seconds: 123 + duration-seconds: 321 camera-model: Shiny Thingamabob etc.