On Mon, Nov 12, 2018 at 10:08:10AM -0800, Elijah Newren wrote: > > I would do: > > > > git log --raw $( > > git cat-file --batch-check='%(objectsize:disk) %(objectname)' --batch-all-objects | > > sort -rn | head -3 | > > awk '{print "--find-object=" $2 }' > > ) > > > > I'm not sure how renames enter into it at all. > > How did I miss objectsize:disk?? Especially since it is right next to > objectsize in the manpage to boot? That's awesome, thanks for that > pointer. > > I do have a separate cat-file --batch-check --batch-all-objects > process already, since I can't get sizes out of either log or > fast-export. However, I wouldn't use your 'head -3' since I'm not > looking for the N biggest, but reporting on _all_ objects (in reverse > size order) and letting the user look over the report and deciding > where to stop reading. So, this is a big and expensive log command. > Granted, we will need a big and expensive log command, but let's keep > in mind that we have this one. It is an expensive log command, but it's the same expense as running fast-export, no? And I think maybe that is the disconnect. I am looking at this problem as "how do you answer question X in a repository". And I think you are looking at as "I am receiving a fast-export stream, and I need to answer question X on the fly". And that would explain why you want to get extra annotations into the fast-export stream. Is that right? > > There I think you'd want to assemble the list with something like "git > > log --follow --name-only paths-of-interest" except that --follow sucks > > too much to handle more than one path at a time. > > > > But if you wanted to do it manually, then: > > > > git log --diff-filter=R --name-only > > > > would be enough to let you track it down, wouldn't it? > > Without a -M you'd only catch 100% renames, right? Those aren't the > only ones I'd want to catch, so I'd need to add -M. You are right > that we could get basic renames this way, but it doesn't cover > everything I need. Let's use this as a starting point, though, and > build up to what I need... No, renames are on by default these days, and that includes inexact renames. That said, if you're scripting you probably ought to be doing: git rev-list HEAD | git diff-tree --stdin and there yes, you'd have to enable "-M" yourself (you touched on scripting and formatting below; diff-tree can accept the format options you'd want). > I also want to know when files were deleted. I've generally found > that people are more okay with purging parts of history [corresponding > to large ojbects] that were deleted longer ago than more recent stuff, > for a variety of reasons. So we could either run yet another log, or > modify the command to: > > git log -M --diff-filter=RD --name-status > > However, I don't just want to know when files were deleted, I'd like > to know when directories are deleted. I only knew how to derive that > from knowing what files existed within those directories, so that > would take me to: > > git log -M --diff-filter=RAD --name-status > > [Edit: I just saw your other email and for the first time learned > about the -t rev-list option which might simplify this a little, > although "need to worry about deleted files being reinstated" below > might require the 'A' anyway.] Yeah, I think "-t" would help your tree deletion problem. > At this point, let's remember that we had another full git-log > invocation for mapping object sizes to filenames. We might as well > coalesce the two log commands into one, by extending this latest one > to: > > git log -M --diff-filter=RAMD --no-abbrev --raw What is there besides RAMD? :) > I could potentially switch to using this and drop patch 10/10. So I'm still not _entirely_ clear on what you're trying to do with 10/10. I think maybe the "disconnect" part I wrote above explains it. If that's correct, then I think framing it in terms of the operations that you'd be able to perform _without running a separate traverse_ would make it more obvious. > Anyway, I hope it makes a little more sense why I created this patch. > Does it, or have I just made things even more confusing? Some of both, I think. > ...and if you've read this far, I'm impressed. Thanks for reading. I'll admit I skimmed near the end. ;) -Peff