Re: [PATCH 2/2] merge-ort: fix slightly overzealous assertion for rename-to-self

Elijah Newren <newren@xxxxxxxxx> · Wed, 12 Mar 2025 15:36:07 -0700

On Wed, Mar 12, 2025 at 3:00 PM Taylor Blau <me@xxxxxxxxxxxx> wrote:
>
> On Thu, Mar 06, 2025 at 03:30:27PM +0000, Elijah Newren via GitGitGadget wrote:
> > From: Elijah Newren <newren@xxxxxxxxx>
> >
> > merge-ort has a number of sanity checks on the file it is processing in
> > process_renames().  One of these sanity checks was slightly overzealous
> > because it indirectly assumed that a renamed file always ended up at a
> > different path than where it started.  That is normally an entirely fair
> > assumption, but directory rename detection can make things interesting.
> >
> > As a quick refresher, if one side of history renames directory A/ -> B/,
> > and the other side of history adds new files to A/, then directory
> > rename detection notices and suggests moving those new files to B/.  A
> > similar thing is done for paths renamed into A/, causing them to be
> > transitively renamed into B/.  But, if the file originally came from B/,
> > then this can end up causing a file to be renamed back to itself.
> >
> > It turns out the rest of the code following this assertion handled the
> > case fine; the assertion was just an extra sanity check, not a rigid
> > precondition.  Therefore, simply adjust the assertion to pass under this
> > special case as well.
>
> Wow, that is a very interesting edge case from Dmitry. The explanation
> and fix makes sense, and yeah, looks like just an over-zealous
> assertion.
>
> As a meta-comment, I vaguely remember past discussions on the list about
> when to assert() versus when to BUG(). I recall that where we landed was
> that:
>
>     if (foo)
>         BUG("something");
>
> was preferable to:
>
>     assert(foo);

I disagree that we landed on that; at least the last time it was
brought up to me there was no ending statement of such --
https://lore.kernel.org/git/CABPp-BFLyYTboiGkP=sc7S+stRZvozAqnJPnza+M0q2nakvD2Q@xxxxxxxxxxxxxx/,
and then I proceeded to continue submitting and getting merged several
more assert statements.

If there was a project-wide directive to remove assert() statements,
then I'd quickly create something else (e.g. "BUG_ON(foo)") and
convert to that rather than "if (foo) BUG(bar)".  And I'd probably
make it depend on NDEBUG too...

> I know that we don't usually define NDEBUG, but I think there are a
> couple of cases where we do, like for nedmalloc stuff when building with
> USE_NED_ALLOCATOR, and in Windows builds if DEBUG is undefined.

If the statement is important to have in production builds, I have
used "if (foo) BUG(bar)", including in merge-ort.c.  But I think there
are different classes of notes-for-other-developers.  Code comments
for the lightest end, if (foo) BUG (bar) for the heavier end, and
assertions for somewhere in the middle.  I think I've used a higher
density of checks in merge-ort than tend to be used elsewhere, in part
because of the number of optimizations added to the code and the
difficulty of keeping track of all the preconditions and assumptions
with the extra data structures feeding those.  Lots of those checks
might have just been code comments somewhere else; they aren't
necessarily worth a real if-check.  They'd be fine to compile out
generally.  In some cases, they might go to the full "if (foo)
BUG(bar)", but I think there's a reasonable middle ground.  assert()
statements survive and are kept up-to-date better than a code comment,
they tend to be clearer than a code comment, and they catch more bugs
while refactoring (or originally developing the code) than code
comments.  And they are much more brief and ergonomic than the "if
(foo) BUG(bar)" construct (why do I have to come up with a separate
bar, when it's already embedded in foo?)

> So I don't think it makes a huge practical difference, but it might be
> nice to put in CodingGuidelines or similar.

Maybe I should put something in there so I don't have to look up the
old conversations and point out my disagreements with this suggestion
yet again...  ;-)

But it might be worth mentioning that having side-effects in
assertions is a huge no-no, and I understand that when folks have to
debug one of those (I had to in a separate project years ago which was
kind of nasty), that they sometimes jump to the conclusion that
assertions are bad.  I understand being bitten, but disagree with the
conclusion.