Re: [PATCH v3 1/3] clang-format: re-adjust line break penalties

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Kyle Lippincott <spectral@xxxxxxxxxx> writes:

> On Tue, Oct 15, 2024 at 5:37 AM karthik nayak <karthik.188@xxxxxxxxx> wrote:
>>
>> Kyle Lippincott <spectral@xxxxxxxxxx> writes:
>>
>> > On Fri, Oct 11, 2024 at 6:50 PM Karthik Nayak <karthik.188@xxxxxxxxx> wrote:
>> >>
>> >> In 42efde4c29 (clang-format: adjust line break penalties, 2017-09-29) we
>> >> adjusted the line break penalties to really fine tune what we care about
>> >> while doing line breaks. Modify some of those to be more inline with
>> >> what we care about in the Git project now.
>> >>
>> >> We need to understand that the values set to penalties in
>> >> '.clang-format' are relative to each other and do not hold any absolute
>> >> value. The penalty arguments take an 'Unsigned' value, so we have some
>> >> liberty over the values we can set.
>> >>
>> >> First, in that commit, we decided, that under no circumstances do we
>> >> want to exceed 80 characters. This seems a bit too strict. We do
>> >> overshoot this limit from time to time to prioritize readability.
>> >
>> > I think that attempting to get the weights right so as to avoid cases
>> > where there was an intentional affordance for readability is going to
>> > be essentially impossible. Areas where there's an intentional
>> > disregard for the clang-format-generated formatting should disable the
>> > formatter for that line/region, instead of trying to find a way to
>> > adjust the rules to produce something that's going to end up being
>> > context dependent.
>> >
>>
>> To some extent I agree. But the issue is that clang-format is still not
>> enforced within the code base. So expecting users to add:
>>     // clang-format off
>> will not hold, at least for _now_.
>>
>> So the next best thing we can do is to get the format rules as close as
>> we can to the current styling, so the actual errors thrown by the CI job
>> is something we can look at without containing too many false positives.
>>
>> > Example: In ref-filter.c, there's 13 lines when initializing the
>> > `valid_atom` array that are >80 characters, and 20 lines that are >80
>> > columns (when using 8-space tabs). Line breaking in that block of code
>> > may be undesirable, so just disable clang-format there. I don't think
>> > there's a consistent set of penalties you could establish that would
>> > handle that well without mishandling some other section of code.
>>
>> While true, we can quantify if it is better or not:
>>
>>       ❯ ci/run-style-check.sh @~50 | wc -l
>>       4718 (master)
>>
>>       ❯ ci/run-style-check.sh @~53 | wc -l
>>       4475 (with these patches)
>>
>> And looking through the other changes, those look like violations which
>> should have been fixed.
>>
>> > It's also not clear what the reason for the overshoot is in many cases.
>> > - difference between "80 characters" and "80 columns"?
>> >     - (1394 >80char lines in *.{h,c}, 4849 >80col lines in the same files)
>> > - intentional for readability?
>> > - refactorings pushed originally compliant lines out of compliance?
>> > - no one caught it and it was just added without any intentional decision?
>> >
>>
>> I agree with your inference here, but I'm not sure there is a smooth way
>> to have this information. Either we go full in and say we enable the
>> formatting and every patch must conform to it, or we simply keep the
>> clang-format as a warning system. Currently we do neither. I'd say we
>> should be in a state to reach the latter and then we can gradually think
>> of how to move to the former.
>>
>> >> So
>> >> let's reduce the value for 'PenaltyExcessCharacter' to 10. This means we
>> >> that we add a penalty of 10 for each character that exceeds the column
>> >> limit. By itself this is enough to restrict to column limit. Tuning
>> >> other penalties in relation to this is what is important.
>> >>
>> >> The penalty `PenaltyBreakAssignment` talks about the penalty for
>> >> breaking an assignment operator on to the next line. In our project, we
>> >> are okay with this, so giving a value of 5, which is below the value for
>> >> 'PenaltyExcessCharacter' ensures that in the end, even 1 character over
>> >> the column limit is not worth keeping an assignment on the same line.
>> >>
>> >> Similarly set the penalty for breaking before the first call parameter
>> >> 'PenaltyBreakBeforeFirstCallParameter' and the penalty for breaking
>> >> comments 'PenaltyBreakComment' and the penalty for breaking string
>> >> literals 'PenaltyBreakString' also to 5.
>> >>
>> >> Finally, we really care about not breaking the return type into its own
>> >> line and we really care about not breaking before an open parenthesis.
>> >> This avoids weird formatting like:
>> >>
>> >>    static const struct strbuf *
>> >>           a_really_really_large_function_name(struct strbuf resolved,
>> >>           const char *path, int flags)
>> >
>> > Is this how it'd be indented without the penalties, or would it do
>> > this, with the function name indented the same amount as the return
>> > type (which is, in C, probably going to be the 0th column most times):
>> >
>> > static const struct strbuf *
>> > a_really_really_large_function_name(struct strbuf resolved,
>> >         const char *path, int flags)
>> >
>>
>> It will be indented, so not the 0th column.
>
> I'm not getting that behavior when I try it. Is this only indented
> with your updated penalties?
>
> Currently on ef8ce8f3d4344fd3af049c17eeba5cd20d98b69f, git status is
> clean, added this line (no line breaks, just in case my email client
> makes a mess of this) to top of path.h (chosen arbitrarily):
>
> static const struct strbuf *a_really_really_large_function_name(struct
> strbuf resolved, const char *path, int flags);
>
> and ran `clang-format path.h | head -n20` and got this (where `int
> flags` is indented to align with the opening `(`, but tabs cause
> problems yet again):
>
> static const struct strbuf *
> a_really_really_large_function_name(struct strbuf resolved, const char *path,
>     int flags);
>
> `clang-format --version` shows it's a google-internal build, but it
> still respects the .clang-format file, so this shouldn't matter? I'm
> assuming it's a relatively recent (within the past 1 month) commit
> that it's based off of.
>

You're absolutely right, this is also what I get. Sorry for the
confusion, but I assumed you were talking about the third line, i.e.
`const char *path, int flags)`.

I also ran it on the CI to make it easier on the eyes (specifically with
the tabs): [1]

>>
>> >>
>> >> or
>> >>
>> >>    static const struct strbuf *a_really_really_large_function_name(
>> >>             struct strbuf resolved, const char *path, int flags)
>> >
>> > Personal opinion: I prefer this over the version that has a single
>> > argument on the first line. My preference for reading functions is:
>> >
>> > return_type func_name(arg1, arg2,
>> >                       arg3, arg4,
>> >                       arg5, arg6, ...);
>> >
>> > Or
>> >
>> > return_type func_name(
>> >         arg1, arg2, arg3, arg4,
>> >         arg5, arg6, ...);
>> >
>>
>> I'm mostly basing my changes on the current state of the 'clang-format'
>> and our code base. I'm happy to change it if everyone agrees on this :)
>
> I'm wondering if tabs have caused some confusion here... [thought
> continued below]
>
>>
>> > or, in some cases, putting every argument on their own line (typically
>> > when the majority of the arguments are already on their own line, not
>> > having one "hiding" somewhere is preferable, but at this point if
>> > that's not what my formatter does, I don't fight it).
>> >
>> > For functions that accept an obvious first parameter, such as
>> > `strbuf_add`, maybe having the first parameter on the first line is
>> > acceptable/desirable, since it's "obvious" what it is/does. But for
>> > many functions that's not the case, and needing to read the end of the
>> > first line, potentially beyond the 80th column, feels weird.
>> >
>> >>
>> >> to instead have something more readable like:
>> >>
>> >>    static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved,
>> >>           const char *path, int flags)
>
> Does this have `const char *path` aligned with the opening `(`? if so,
> then that matches my first example and I'm fine with it.

Yes, it is. Here is a CI job to demonstrate. I hope this makes it
clearer: [2]


> If this is
> truly "first argument immediately after (, other arguments indented
> one indentation level", then my original comment stands: I don't find
> this readable at all, and I don't see evidence of this being an
> acceptable format according to our CodingGuidelines document.
>
> I also don't understand how the penalties would produce t
>

The penalties define when the linebreak should happen. The alignment is
handled by the `AlignAfterOpenBracket: Align` rule we have in
'.clang-format'.

>> >>
>> >> This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine'
>> >> and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a
>> >> few characters above the 80 column limit to make code more readable.
>> >
>> > A few examples, such as by formatting the code using the current rules
>> > (since much of the codebase does not currently comply), and then
>> > changing the penalties and seeing what changes, might be nice?
>> >
>>
>> You mean apart from the example above?
>
> The example above is hypothetical, but I think I was thinking about
> this incorrectly. Our existing codebase isn't formatted with
> clang-format, and formatting it first and then adjusting the penalties
> doesn't really provide much useful information.
>
> Setting the penalties as you have them in this patch, and running it
> on a copy of the line you have there, produces this for me:
>
> static const struct strbuf *
> a_really_really_large_function_name(struct strbuf resolved, const char *path,
>                                     int flags);
>
> The 80 column limit is still _strictly_ adhered to, it won't even go
> over by 1 character:
>
> static const struct strbuf *
> a_really_really_large_function_named_Ed(struct strbuf resolved,
>                                         const char *path, int flags);
>
> (note: switched tabs to spaces because tabs are difficult to use
> during discussions like this)
>
> Specifically:
> - 80 column hard limit applied
> - return type on its own line
> - continuation arguments are aligned on the next line (which is
> expected since we have AlignAfterOpenBracket set).
>

But this is not what I'm seeing though, even the CI [2] confirms that.
I'm seeing

static const struct strbuf *a_really_really_large_function_name(struct
strbuf resolved,
                                                                const
char *path,
                                                                int flags);

where the consecutive lines are aligned on '('.

(note: switched tabs to spaces too)

Thanks for the back and forth though, I guess I have some feedback for
the next version:
- Add some examples in a file like Toon suggested, showing how the
clang-format would work.
- Clarify the commit message to make it clearer about how the penalties
work with other rules.

[snip]

[1]: https://gitlab.com/gitlab-org/git/-/jobs/8105793089
[2]: https://gitlab.com/gitlab-org/git/-/jobs/8105737945

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux