Re: [GIT PULL linux-next] Add Compiler Attributes tree

Miguel Ojeda <miguel.ojeda.sandonis@xxxxxxxxx> · Thu, 4 Oct 2018 11:06:46 +0200

Hi Ted,

On Thu, Oct 4, 2018 at 7:02 AM Theodore Y. Ts'o <tytso@xxxxxxx> wrote:
> In this case, yes.  Again, I emphasize that I was using the ext4.h
> cleanup as an *example*.  The point I was trying to make was that your
> change did *not* do a full set of deep ext4 regression tests because
> the your change didn't go through the ext4.git tree.  I have seen

Because it shouldn't matter from which tree they come from to get the
tests run. *That* is the problem: effectively we have independent
projects in the kernel. If you tell me that my change did not run the
ext4 tests (even if I Cc'd you or maybe some ML, etc.), it means there
is something wrong.

> cases where "simple mechanical changes" were anything but, and while
> the kernel *compiled* it resulted in the file system not working.
> (And in the worst case, it resulted actual data loss or file system
> corruption.)

Yes, I have also experienced one-liners breaking entire projects.
Ideally, any change should be assumed to break everything unless
proved otherwise, and I agree with having as much testing as possible
-- see above for the actual problem here.

> My test scripts are public, and in fact I've gone out of my way to
> make them easy for other people to run them, with documentation[1][2],
> slide sets[3], etc.  But the vast majority of people who submit
> patches to ext4.git don't bother --- and as far as I know *no one* who
> sends ext4 changes via other git trees *ever* bothered.  (And, for one
> recent patchset where the ext4 developer spent a lot of time using
> kvm-xfstests before submission, accepting that set of changes was easy
> and was a joy.  I'm a big believe in tests.)

I believe you -- but actually I would *expect* that. Nobody runs them
because nobody has either the time, the expertise, or feels the need
to do so. It is not a matter of "being easy to run". It is simply that
people won't do it unless you force them to (and if you force them,
you will lose some contributions).

The solution is not providing slides, documentation, or making it
fool-proof. The solution, in my view, is avoiding people having to do
the tests. Currently, I thought we do that in the kernel by Cc'ing
whoever is a maintainer of the code and letting that person do
whatever is required for those changes. Of course, you can automate
this in many fancy ways as many projects/companies have done (and as
you describe for Google).

However...

> So if you are willing to completely standardize your testing
> infrastructure and devote utterly lavish amounts of automated testing

...I don't agree. I don't think this follows from the above
discussion. While release management and testing are topics that go
together (pretty much always), they are two different, independent
problems. I am talking here about the need for some "software" to do
the release management (i.e. merging many trees, doing integration,
etc.), not the testing side of it (which can change irrespective of
the release management side; e.g. may have different triggers, may
have different requirements, may use different systems to help, etc.).

Actually, it is fairly easy to prove: Pick Documentation/ and imagine
it was only plain text files without processing required (as in the
old days). For that "project", you don't need any testing whatsoever,
but you still need to do release integration. For ext4, you may want
to use your testing scripts. For drivers/foo/bar.c, maybe the only way
of testing is plugging the hardware and seeing that it works. So,
different projects, different testing mechanisms; but same release
integration at the end of the day.

> which is deeply integrated into a standardized, non-email code review
> system, the challenge you have identified *has* been solved.  But

Of course it has been "solved". All projects with hundreds of
contributors/subprojects/... have to deal with the same issues, and
they have to keep moving forward at the end of the day. The point is
not how Google, or CERN, or anybody else does it. The point is that
there should be a better way to handle this which makes kernel
development better.

I explained the issue we faced at CERN years ago simply because it was
very similar to this (merging given tags of different trees,
basically). I am not sure how/if your Google example is relevant here,
since it is a completely custom system, it is very different to what
the kernel does, etc. I understand you are trying to say "this is no
news, there are other ways of solving it, it is solved at Google"; but
with my "example" I was not trying to say "we solved it at CERN" or
trying to say I am a "guru"; I was simply stating that it seems there
could be a project coming out of this to solve this space *in the
context of the kernel workflow*.

> trust me when I say that it is a very non-trivial thing to do, and it
> requires a lot of very strict code development practices that are
> imposed on all Google C++ developers working in the common tree (which
> is probably 90%+ of the SWE's inside Google).  I'm not at all
> convinced that it could be adopted or imposed on the kernel
> development process.  In fact, I'm quite confident it could not be.

This paragraph seems to follow from the fact that you connect
heavily-automated testing with release integration. I don't
necessarily agree with that as explained above, so I think I could cut
here this part of the discussion, but anyway, here it goes: coding
development practices are not completely tied to release integration.
Same as testing: i.e. you *can* do release integration (and heavy
testing) without strict coding development guidelines. We actually do
that at the kernel nowadays (and we did it at CERN too).

>
> I actually think the Linux Kernel's approach of carefully segregating
> how code and headers to avoid merge conflicts (and worse, semantic
> conflicts that may git merge and build succesfully, but then blow up

Not sure how the kernel is preventing semantic conflicts (in different
ways than other projects, I mean). What do you mean?

> in subtle ways leading to user data loss) is actually a pretty good
> way of doing things.  It works 99.99% of the the commits.  True, it

I would say many commits are local enough that it is not really a
problem. The issue is non-local commits, and I am not that confident
it works for 99.99% of those.

> wasn't optimal for the changes you were trying to make; but your
> experience is the exception, not the rule.

Sure, but regardless of how well it works, the fact is that Stephen is
managing some custom (and AFAIK, ad hoc) scripts. That usually means
there is something that may be factorized out.

>
> The risk here is that it's very easy to engineer changes in processes
> for corner cases, and which make things much worse (either taking more
> time, or more effort, or being less reliable) for the common case.

Agreed, but changes are not necessarily bad, i.e. they are not
guaranteed to be worse than the previous state. Actually, I would
argue we could reach a better state of the art for *all*
commits/cases, not only corner ones.

Cheers,
Miguel