On Mon, May 8, 2017 at 2:55 PM, Jeff King <peff@xxxxxxxx> wrote: > On Mon, May 08, 2017 at 10:02:58AM +0900, Junio C Hamano wrote: > >> Stefan Beller <sbeller@xxxxxxxxxx> writes: >> >> > I guess it is ok for now and in this series, but we may want >> > to split up diff.[ch] in the future into multiple finer grained files. >> >> For what end? Such a split would require more symbols internal to >> diff.[ch] to become external, which is a big downside, so we need to >> have a large reward to compensate it if we were to go there. > > I think there are two sides to that coin. Let's say you have a file with > five functions (and you can replace function with structs, global > variables, etc; any unit of complexity that might or might not be > externally visible): > > /* called by both a() and b() */ > static void a_and_b_helper(); > > /* called by a() */ > static void a_helper(); > void a(); > > /* called by b() */ > static void b_helper(); > void b(); > > When they are in the same file, b() and b_helper() can see a_helper(), > even though they don't need it. And that means increased complexity in > dealing with a_helper(), because its visibility is larger than > necessary. We have to worry about what a change to it might do to b() > (or more realistically, c(), d(), etc). > > If we split this apart, we end up with three files: > > common.c: > void a_and_b_helper(); > > a.c: > static void a_helper(); > void a(); > > b.c: > static void b_helper(); > void b(); > > The specific helpers have less visibility, which is good. The public > functions a() and b() were already public, so no change. But now the > common helper is public, too, even though nobody except a() and b() care > about it. > > So it's a tradeoff. And the important question is: is the bleed-over > between a() and b() worse than the common bits being made public? That > can't be answered without looking at how many distinct "a" and "b"-like > chunks there are in the file, and what the common bits look like. I'm > not sure of the answer for diff.c. Without digging, both ends of the > spectrum seem equally plausible to me: that it is mostly a set of N > distinct units which could be split apart, or that it really is a few > public functions calling the same common core over and over. > > And a follow-on question is what we can do to mitigate the cost of > making the common code public. We could advertise a_and_b_helper() only > in diff-internal.h, and then makes it only semi-public (anybody can > include that, of course, but including diff-internal.h seems like it > ought to be a red flag). That only helps the programmer, though; we'd > still be losing out on compiler optimizations and static analysis that > only looks at one translation unit at a time. > > Phew. That ended up a little long-winded. All of it is to say that I > don't think it makes sense to reject a split out-of-hand, but nor is it > always a good thing. It depends on what's in the file. I agree on this sentiment. It really depends on the content under discussion whether it makes sense to split. Having had some exposure recently for diff.[ch] (and I just picked up that series again, but did not send it out yet), I have the impression that we do have a lot of code in diff.c, which is quite unrelated to each other, i.e. a lot of [a,b]_helper()s, and few a_and_b_helper() for the already public functions. May first mail was based on perceived unneeded complexity, which slows down in achieving a goal. Thanks, Stefan