Kyle Lippincott <spectral@xxxxxxxxxx> writes: > On Tue, Oct 15, 2024 at 5:37 AM karthik nayak <karthik.188@xxxxxxxxx> wrote: >> >> Kyle Lippincott <spectral@xxxxxxxxxx> writes: >> >> > On Fri, Oct 11, 2024 at 6:50 PM Karthik Nayak <karthik.188@xxxxxxxxx> wrote: >> >> >> >> In 42efde4c29 (clang-format: adjust line break penalties, 2017-09-29) we >> >> adjusted the line break penalties to really fine tune what we care about >> >> while doing line breaks. Modify some of those to be more inline with >> >> what we care about in the Git project now. >> >> >> >> We need to understand that the values set to penalties in >> >> '.clang-format' are relative to each other and do not hold any absolute >> >> value. The penalty arguments take an 'Unsigned' value, so we have some >> >> liberty over the values we can set. >> >> >> >> First, in that commit, we decided, that under no circumstances do we >> >> want to exceed 80 characters. This seems a bit too strict. We do >> >> overshoot this limit from time to time to prioritize readability. >> > >> > I think that attempting to get the weights right so as to avoid cases >> > where there was an intentional affordance for readability is going to >> > be essentially impossible. Areas where there's an intentional >> > disregard for the clang-format-generated formatting should disable the >> > formatter for that line/region, instead of trying to find a way to >> > adjust the rules to produce something that's going to end up being >> > context dependent. >> > >> >> To some extent I agree. But the issue is that clang-format is still not >> enforced within the code base. So expecting users to add: >> // clang-format off >> will not hold, at least for _now_. >> >> So the next best thing we can do is to get the format rules as close as >> we can to the current styling, so the actual errors thrown by the CI job >> is something we can look at without containing too many false positives. >> >> > Example: In ref-filter.c, there's 13 lines when initializing the >> > `valid_atom` array that are >80 characters, and 20 lines that are >80 >> > columns (when using 8-space tabs). Line breaking in that block of code >> > may be undesirable, so just disable clang-format there. I don't think >> > there's a consistent set of penalties you could establish that would >> > handle that well without mishandling some other section of code. >> >> While true, we can quantify if it is better or not: >> >> ❯ ci/run-style-check.sh @~50 | wc -l >> 4718 (master) >> >> ❯ ci/run-style-check.sh @~53 | wc -l >> 4475 (with these patches) >> >> And looking through the other changes, those look like violations which >> should have been fixed. >> >> > It's also not clear what the reason for the overshoot is in many cases. >> > - difference between "80 characters" and "80 columns"? >> > - (1394 >80char lines in *.{h,c}, 4849 >80col lines in the same files) >> > - intentional for readability? >> > - refactorings pushed originally compliant lines out of compliance? >> > - no one caught it and it was just added without any intentional decision? >> > >> >> I agree with your inference here, but I'm not sure there is a smooth way >> to have this information. Either we go full in and say we enable the >> formatting and every patch must conform to it, or we simply keep the >> clang-format as a warning system. Currently we do neither. I'd say we >> should be in a state to reach the latter and then we can gradually think >> of how to move to the former. >> >> >> So >> >> let's reduce the value for 'PenaltyExcessCharacter' to 10. This means we >> >> that we add a penalty of 10 for each character that exceeds the column >> >> limit. By itself this is enough to restrict to column limit. Tuning >> >> other penalties in relation to this is what is important. >> >> >> >> The penalty `PenaltyBreakAssignment` talks about the penalty for >> >> breaking an assignment operator on to the next line. In our project, we >> >> are okay with this, so giving a value of 5, which is below the value for >> >> 'PenaltyExcessCharacter' ensures that in the end, even 1 character over >> >> the column limit is not worth keeping an assignment on the same line. >> >> >> >> Similarly set the penalty for breaking before the first call parameter >> >> 'PenaltyBreakBeforeFirstCallParameter' and the penalty for breaking >> >> comments 'PenaltyBreakComment' and the penalty for breaking string >> >> literals 'PenaltyBreakString' also to 5. >> >> >> >> Finally, we really care about not breaking the return type into its own >> >> line and we really care about not breaking before an open parenthesis. >> >> This avoids weird formatting like: >> >> >> >> static const struct strbuf * >> >> a_really_really_large_function_name(struct strbuf resolved, >> >> const char *path, int flags) >> > >> > Is this how it'd be indented without the penalties, or would it do >> > this, with the function name indented the same amount as the return >> > type (which is, in C, probably going to be the 0th column most times): >> > >> > static const struct strbuf * >> > a_really_really_large_function_name(struct strbuf resolved, >> > const char *path, int flags) >> > >> >> It will be indented, so not the 0th column. > > I'm not getting that behavior when I try it. Is this only indented > with your updated penalties? > > Currently on ef8ce8f3d4344fd3af049c17eeba5cd20d98b69f, git status is > clean, added this line (no line breaks, just in case my email client > makes a mess of this) to top of path.h (chosen arbitrarily): > > static const struct strbuf *a_really_really_large_function_name(struct > strbuf resolved, const char *path, int flags); > > and ran `clang-format path.h | head -n20` and got this (where `int > flags` is indented to align with the opening `(`, but tabs cause > problems yet again): > > static const struct strbuf * > a_really_really_large_function_name(struct strbuf resolved, const char *path, > int flags); > > `clang-format --version` shows it's a google-internal build, but it > still respects the .clang-format file, so this shouldn't matter? I'm > assuming it's a relatively recent (within the past 1 month) commit > that it's based off of. > You're absolutely right, this is also what I get. Sorry for the confusion, but I assumed you were talking about the third line, i.e. `const char *path, int flags)`. I also ran it on the CI to make it easier on the eyes (specifically with the tabs): [1] >> >> >> >> >> or >> >> >> >> static const struct strbuf *a_really_really_large_function_name( >> >> struct strbuf resolved, const char *path, int flags) >> > >> > Personal opinion: I prefer this over the version that has a single >> > argument on the first line. My preference for reading functions is: >> > >> > return_type func_name(arg1, arg2, >> > arg3, arg4, >> > arg5, arg6, ...); >> > >> > Or >> > >> > return_type func_name( >> > arg1, arg2, arg3, arg4, >> > arg5, arg6, ...); >> > >> >> I'm mostly basing my changes on the current state of the 'clang-format' >> and our code base. I'm happy to change it if everyone agrees on this :) > > I'm wondering if tabs have caused some confusion here... [thought > continued below] > >> >> > or, in some cases, putting every argument on their own line (typically >> > when the majority of the arguments are already on their own line, not >> > having one "hiding" somewhere is preferable, but at this point if >> > that's not what my formatter does, I don't fight it). >> > >> > For functions that accept an obvious first parameter, such as >> > `strbuf_add`, maybe having the first parameter on the first line is >> > acceptable/desirable, since it's "obvious" what it is/does. But for >> > many functions that's not the case, and needing to read the end of the >> > first line, potentially beyond the 80th column, feels weird. >> > >> >> >> >> to instead have something more readable like: >> >> >> >> static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, >> >> const char *path, int flags) > > Does this have `const char *path` aligned with the opening `(`? if so, > then that matches my first example and I'm fine with it. Yes, it is. Here is a CI job to demonstrate. I hope this makes it clearer: [2] > If this is > truly "first argument immediately after (, other arguments indented > one indentation level", then my original comment stands: I don't find > this readable at all, and I don't see evidence of this being an > acceptable format according to our CodingGuidelines document. > > I also don't understand how the penalties would produce t > The penalties define when the linebreak should happen. The alignment is handled by the `AlignAfterOpenBracket: Align` rule we have in '.clang-format'. >> >> >> >> This is done by bumping the values of 'PenaltyReturnTypeOnItsOwnLine' >> >> and 'PenaltyBreakOpenParenthesis' to 300. This is so that we can allow a >> >> few characters above the 80 column limit to make code more readable. >> > >> > A few examples, such as by formatting the code using the current rules >> > (since much of the codebase does not currently comply), and then >> > changing the penalties and seeing what changes, might be nice? >> > >> >> You mean apart from the example above? > > The example above is hypothetical, but I think I was thinking about > this incorrectly. Our existing codebase isn't formatted with > clang-format, and formatting it first and then adjusting the penalties > doesn't really provide much useful information. > > Setting the penalties as you have them in this patch, and running it > on a copy of the line you have there, produces this for me: > > static const struct strbuf * > a_really_really_large_function_name(struct strbuf resolved, const char *path, > int flags); > > The 80 column limit is still _strictly_ adhered to, it won't even go > over by 1 character: > > static const struct strbuf * > a_really_really_large_function_named_Ed(struct strbuf resolved, > const char *path, int flags); > > (note: switched tabs to spaces because tabs are difficult to use > during discussions like this) > > Specifically: > - 80 column hard limit applied > - return type on its own line > - continuation arguments are aligned on the next line (which is > expected since we have AlignAfterOpenBracket set). > But this is not what I'm seeing though, even the CI [2] confirms that. I'm seeing static const struct strbuf *a_really_really_large_function_name(struct strbuf resolved, const char *path, int flags); where the consecutive lines are aligned on '('. (note: switched tabs to spaces too) Thanks for the back and forth though, I guess I have some feedback for the next version: - Add some examples in a file like Toon suggested, showing how the clang-format would work. - Clarify the commit message to make it clearer about how the penalties work with other rules. [snip] [1]: https://gitlab.com/gitlab-org/git/-/jobs/8105793089 [2]: https://gitlab.com/gitlab-org/git/-/jobs/8105737945
Attachment:
signature.asc
Description: PGP signature