On Fri, Dec 24 2021, Philip Oakley wrote: > On 21/12/2021 23:36, Ævar Arnfjörð Bjarmason wrote: >> On Tue, Dec 21 2021, Philip Oakley wrote: >> >>> Sorry for the late comment.. >>> >>> On 10/12/2021 14:31, Johannes Schindelin wrote: >>>> Hi Ævar, >>>> >>>> On Thu, 9 Dec 2021, Ævar Arnfjörð Bjarmason wrote: >>>> >>>>> The difference between "master" and "git-for-windows/main" is large >>>>> enough that comparing the two will segfault on my system. This is >>>>> because the range-diff code does some expensive calculations and will >>>>> overflow the "int" type. >>>> You are holding this thing wrong. >>>> >>>> The `main` branch of Git for Windows uses merging rebases, therefore you >>>> need to use a commit range like >>>> `git-for-windows/main^{/^Start.the.merging}..git-for-windows/main` and >>>> compare it to `git-for-windows/main..master`. >>> I'm not sure that a Git repo has an established way of indicating to how >>> it's branching/merging/releasing workflow is set up, especially for >>> projects with non-normative use cases, such as Git for Windows. We don't >>> have a git document for covering the different workflows in common use >>> for easy reference and consistent terminology. >>> >>> The merging rebase flow, with 'fake' merge does solve a problem that >>> git.git doesn't have but could easily be a common process for 'friendly >>> forks' that follow an upstream with local patches. The choice of >>> '{/^Start.the.merging}' is currently specific to the Git-for-Windows >>> case making it harder to discover this useful maintainer method. >> Yes, but let's not get lost in the weeds here. As I noted I just picked >> GFW as a handy example of a large history & that command as a handy >> example of something that segfaults on "master". > > Had you already experienced the segfault locally, without using the GFW > example? How many commits were present in that case? Yes, I ran into it "orginally" just range-diffing as part of a local build process. I could dig up what revision range it was exactly, but does it matter? > The GFW example seems like it's taken the discussion in the wrong direction. > > For me: > $ git log git/master..origin/main --pretty=oneline | wc -l > 62105 > > That's a lot of commits to have in a range diff. It's almost as big as > the whole of git/master > > $ git log git/master --pretty=oneline | wc -l > 65400 > > Personally I'd like a way of trimming 'deadheads' that's a bit easier > that needing to remember Dscho's magic string [1], but time will tell. There are some repos that move forward by 500-1k commits/day, and people do cherry-pick patches etc. So wanting to range-diff after a couple of months is something you might do... >> So the point really isn't to say that we should fix range-diff becase >> it'll allow us to run this practically useful command on a git.git fork. >> >>> I fully agree that the range-diff should probably have a patch limit at >>> some sensible value. >> Why would it? If I'm willing to spend the CPU to produce a range-diff of >> an absurdly large range and I've got the memory why shouldn't we support >> it? > > There will always be a limit somewhere, and if it's not commit count or > other easily explained & checked limit it will be hard to rationalise > about why Git suddenly fails with an error (or segfault) in those > humungous case. I think it's fairly easy to explain the "your system wouldn't let us malloc more, we're dying" that we get from xmalloc(), st_*() and the like. >> >> We don't in cases like xdiff where it's not trivial to just raise the >> limits, but here it seems relatively easy. >> >> I think limits to save users from spending CPU time they didn't expect >> are reasonable, but then we can handle them like the diff/merge rename >> detection limits, i.e. print a warning/advice, and allow the user to >> opt-out. >> >> That also doesn't really apply here since "diff/merge" will/might still >> do something useful in those scenarios, whereas range-diff would just >> have truncated output. >> >>> The 'confusion' between the types size_t, long and int, does ripple >>> through a lot of portable code, as shown in the series. Not an easy problem. >> Yes, although here we're not just casting and overflowing types, but >> overflowing on multiplication and addition, whereas usually we'd just >> overflow on "nr" being too big for "int" or similar. > I've been very slowly looking at the `long` limits on GFW which have > very similar arithmetic issues for pointers, often with no clear answers. Right, that's to do with the whole "long" or whatever use in the object.c and related code, but I don't think that's applicable here, is it?