Hi Ævar, On Mon, Mar 20, 2023 at 3:37 AM Ævar Arnfjörð Bjarmason <avarab@xxxxxxxxx> wrote: > > > On Sun, Mar 19 2023, Elijah Newren via GitGitGadget wrote: > > > This series picks up where en/header-cleanups leaves off and does more > > header cleanups, trying to reduce the number of files depending on cache.h. > > (There are still more that could be done, but again, this is a good chunk > > for now.) > > > Elijah Newren (16): > > treewide: remove unnecessary cache.h inclusion from a few headers > > treewide: be explicit about dependence on gettext.h > > treewide: remove unnecessary inclusion of gettext.h > > treewide: remove unnecessary cache.h inclusion from several sources > > I couldn't seem to square what you were doing in 2/16 with what 4/16 > does, but I think the "issue" is that the commit message for 4/6 doesn't > match at least one change it's making, i.e.: > > By making those files explicitly include gettext.h, we can > already drop the include of cache.h in these files. > > But in fact compat/linux/procinfo.c at least (I didn't do more than skim > the others) isn't like the rest, in that it doesn't need gettext.h at > all. Yeah, there were apparently pre-existing headers that did not need cache.h, so my script to find the ones that "no longer need it" after some changes identifies not only the ones that can be cleaned out due to the gettext.h changes but the ones that could have been cleaned up beforehand. Maybe I should do a preliminary patch that removes cache.h from some headers, then do the gettext changes, then this patch becomes clearer. Or I can tweak the commit message to just spell out we already had some headers that didn't need cache.h, so this patch just combines those cleanups. > I also find the ordering hard to follow, but it's ultimately correct in > the cases I looked at. E.g. for ref-filter.c your 1/16 does: > > -#include "git-compat-util.h" > +#include "cache.h" > > Then 2/16 does: > > +#include "gettext.h" > > So I wondered, wait a minute, didn't we just end up over-including > cache.h because we needed gettext.h, which the later commit adds? > > But no, that's not the case, because as 1/16 notes ("on things from > cache.h[...]") we need *other* things from cache.h, so this is > ultimately correct. > > Still, given that, I for one would have found this easier to follow if > commits like 2/16 came before the 1/16, i.e. let's first be explicit > about e.g. gettext.h, and then change git-compat-util.h inclusions to > cache.h, to clearly note in the progression that the two are distinct. > > But this is also fine with me if you don't agree, or can't be bothered > to re-roll it. In my mind, 1/16 probably belonged in en/header-cleanup that was already merged to master and was an oversight that made other cleanups I was trying to do harder. The gettext stuff was new stuff I was doing to make other things in this series easier, so I placed it right after 1/16. But, I can see your point of view, and I don't feel strongly about the order of these patches. I'm happy to switch them around. > As a more general note on the direction here: I have some old WIP > patches to do similar split-ups of cache.h, but in doing those I was > trying to first identify cases where we had a function in cache.h that > was used by the low tens of our ~500 *.c source files. > > E.g. advice.h is such a case[1], ditto wildmatch.h. Then we have case > like ws.c[3], base85.c[4] and pager.c[5] where there's no corresponding > header, but we should probably create one (which those WIP commits of > mine do). Here was my approach: * Identify a *single* file that I know that makes no sense to include cache.h. * Try to figure out why and fix it...but also fix *all* other files that have the same nonsensical dependencies throughout the entire tree. * (And maybe also fix a few extra things you find along the way that repeatedly turn up). For my first series (en/header-cleanup which is now in master), I was looking at diffcore-rename.c. 17 patches later it was clean. For this series, I was looking at strbuf.c. It would have been clean at the end, but while waiting for my first series to merge to master, you submitted your series which conflicts. So I dropped some of the changes, but submitted the rest. (And the gettext.h changes originally weren't at the beginning of my series, but then I noticed they repeatedly were additional squashed-in changes that made things harder for others to review, AND harder for me to figure out (see below), so I went back and made the gettext changes into a preliminary big cleanup.) Your approach also makes sense, I've considered it before and still have it in my backlog, but this series focused on making strbuf.c clean and was already long enough, so I'm deferring those other changes to later. > Whereas this approach feels like the opposite of that, i.e. at the end > of this series we're including gettext.h in more than 250 files. The idea to "fix all other files with the same nonsensical dependencies" was heavily impeded by gettext.h. I had scripts to optimistically change "cache.h" to "git-compat-util.h" and try to compile. For those that only gave warnings rather than errors, I needed to wade through pages of gettext-related undeclared warnings, to see the other dependencies in the file. So, making gettext dependencies explicit just made it vastly easier to do all the subsequent cleanups. > Stepping back a bit, I think our use of cache.h falls into a few broad > categories: > > A. Things to do with the index (we should probably create an index.h > for those). > B. Used almost everywhere, i.e. "cache.h" can be though of as "git.h", > or "common-utils.h" or whatever. > C. Things used almost nowhere, or only a handful of places, > e.g. wildmatch.h and others noted above. > > Part of this series is a frontal assault on the "B" part of that. I > think if we're going to include gettext.h explicitly everywhere we're > probably saying by extension that no such thing as a "common header" > should exist. There might be a way to have a common header that doesn't devolve into a dumping ground of all dependencies the way cache.h has become, but I think we'd need to be really careful. I think git-compat-util.h would obviously make sense to include in such a thing. You might be able to convince me that gettext.h should be in such a thing, but I'm not even sure about that: we have more C files that do not use stuff from gettext (321) than those that do (245). I think cache.h is really far from what we want. It has often tripped me up in trying to understand the code, often results in needless compile-everything loops when making simple changes, is an impediment to any potential libifying or code-sharing exercise (I was hoping that e.g. merge-ort could be shared with other projects instead of being reinvented), and makes other interesting exploratory hacking harder (e.g. I've toyed with the idea of speeding up merge-ort or other parts of the code by rust-rewrites for self-educational purposes and wondered why the code depended on so many things). The focus of this series was not B; it was just "remove the cache.h header from strbuf.c, and from any files that depend on cache.h for the same dumb reasons". However, even though it wasn't the focus of my series, I would have to say that you are right that I don't believe in having a dumping ground of "common" includes as I think it just gets littered with things that aren't actually common. I may well submit future series explicitly assaulting part "B". > Which I'd be fine with, I just personally never thought it was much of a > practical problem, i.e. to have the gettext.h's in our tree included > "everywhere", ditto "strbuf.h", and even "enum object_file" and other > "mostly everyone wants these". > > Whereas it is rather annoying that we over-include things in cache.h, or > even have cache.h include other headers, as often minor changes to > related libraries result in a full re-build. The time to rebuild the code is one concern, but wasn't even my primary concern. > But maybe "B" isn't sustainable at all, and I'm just fooling myself > thinking we could have such a thing. > > A nice thing I hadn't considered is that after your topic e.g. removing > the gettext.h inclusion from parse-options.h will compile *most* of git, > but we'll fail e.g. in: > > t/helper/test-parse-pathspec-file.c: In function ‘cmd__parse_pathspec_file’: > ./parse-options.h:209:40: error: implicit declaration of function ‘N_’ [-Werror=implicit-function-declaration] > 209 | N_("file"), (h) } > > > So as we don't want gettext in t/helper/* it's nice to get *an* error > about that. I don't see why that's a "nice thing"; it doesn't even make sense to me as being a useful thing to try. parse-options.h has several macros that explicitly call N_(). Even if the code compiled with that proposed change (because no one used those particular macros), it'd be a bad idea unless those macros were first deleted or modified to not depend upon gettext.h. Since the file clearly depends upon gettext.h, I think it should explicitly include it. (In general, I believe if you depend upon something, you should explicitly include it. I was convinced otherwise for git-compat-util.h and header files[1], but I hold that as the only exception until others have solid reasons for other headers being special. In fact, I'd like to change the coding guidelines to state that you should directly include any headers you use, with the exception of git-compat-util.h in other header files. We're just too far away from that right now for it to be realistic.) [1] https://lore.kernel.org/git/CABPp-BHizCj2e454w3vtHrDNip3Rm-gUMT0oJiAsbkAvr_QvVA@xxxxxxxxxxxxxx/ > So, in the end I think I've convinced myself that even the "B" in that > "A..C" is a bad idea. Was this based only on the mistaken idea of removing gettext.h from parse-options.h, or on other factors?