Re: [GIT PULL] bcachefs updates for 6.8

Neal Gompa <neal@xxxxxxxxx> · Fri, 12 Jan 2024 06:11:04 -0500

On Thu, Jan 11, 2024 at 8:11 PM Kent Overstreet
<kent.overstreet@xxxxxxxxx> wrote:
>
> On Thu, Jan 11, 2024 at 09:47:26PM +0000, Mark Brown wrote:
> > On Thu, Jan 11, 2024 at 12:38:57PM -0500, Kent Overstreet wrote:
> > > On Thu, Jan 11, 2024 at 03:35:40PM +0000, Mark Brown wrote:
> >
> > > > IME the actually running the tests bit isn't usually *so* much the
> > > > issue, someone making a new test runner and/or output format does mean a
> > > > bit of work integrating it into infrastructure but that's more usually
> > > > annoying than a blocker.
> >
> > > No, the proliferation of test runners, test output formats, CI systems,
> > > etc. really is an issue; it means we can't have one common driver that
> > > anyone can run from the command line, and instead there's a bunch of
> > > disparate systems with patchwork integration and all the feedback is nag
> > > emails - after you've finished whan you were working on instead of
> > > moving on to the next thing - with no way to get immediate feedback.
> >
> > It's certainly an issue and it's much better if people do manage to fit
> > their tests into some existing thing but I'm not convinced that's the
> > big reason why you have a bunch of different systems running separately
> > and doing different things.  For example the enterprise vendors will
> > naturally tend to have a bunch of server systems in their labs and focus
> > on their testing needs while I know the Intel audio CI setup has a bunch
> > of laptops, laptop like dev boards and things in there with loopback
> > audio cables and I think test equipment plugged in and focuses rather
> > more on audio.  My own lab is built around on systems I can be in the
> > same room as without getting too annoyed and does things I find useful,
> > plus using spare bandwidth for KernelCI because they can take donated
> > lab time.
>
> No, you're overthinking.
>
> The vast majority of kernel testing requires no special hardware, just a
> virtual machine.
>
> There is _no fucking reason_ we shouldn't be able to run tests on our
> own local machines - _local_ machines, not waiting for the Intel CI
> setup and asking for a git branch to be tested, not waiting for who
> knows how long for the CI farm to get to it - just run the damn tests
> immediately and get immediate feedback.
>
> You guys are overthinking and overengineering and ignoring the basics,
> the way enterprise people always do.
>

As one of those former enterprise people that actually did do this
stuff, I can say that even when I was "in the enterprise", I tried to
avoid overthinking and overengineering stuff like this. :)

Nobody can maintain anything that's so complicated nobody can run the
tests on their machine. That is the root of all sadness.

> > > And it's because building something shiny and new is the fun part, no
> > > one wants to do the grungy integration work.
> >
> > I think you may be overestimating people's enthusiasm for writing test
> > stuff there!  There is NIH stuff going on for sure but lot of the time
> > when you look at something where people have gone off and done their own
> > thing it's either much older than you initially thought and predates
> > anything they might've integrated with or there's some reason why none
> > of the existing systems fit well.  Anecdotally it seems much more common
> > to see people looking for things to reuse in order to save time than it
> > is to see people going off and reinventing the world.
>
> It's a basic lack of leadership. Yes, the younger engineers are always
> going to be doing the new and shiny, and always going to want to build
> something new instead of finishing off the tests or integrating with
> something existing. Which is why we're supposed to have managers saying
> "ok, what do I need to prioritize for my team be able to develop
> effectively".
>
> >
> > > > > example tests, example output:
> > > > > https://evilpiepirate.org/git/ktest.git/tree/tests/bcachefs/single_device.ktest
> > > > > https://evilpiepirate.org/~testdashboard/ci?branch=bcachefs-testing
> >
> > > > For example looking at the sample test there it looks like it needs
> > > > among other things mkfs.btrfs, bcachefs, stress-ng, xfs_io, fio, mdadm,
> > > > rsync
> >
> > > Getting all that set up by the end user is one command:
> > >   ktest/root_image create
> > > and running a test is one morecommand:
> > > build-test-kernel run ~/ktest/tests/bcachefs/single_device.ktest
> >
> > That does assume that you're building and running everything directly on
> > the system under test and are happy to have the test in a VM which isn't
> > an assumption that holds universally, and also that whoever's doing the
> > testing doesn't want to do something like use their own distro or
> > something - like I say none of it looks too unreasonable for
> > filesystems.
>
> No, I'm doing it that way because technically that's the simplest way to
> do it.
>
> All you guys building crazy contraptions for running tests on Google
> Cloud or Amazon or whatever - you're building technical workarounds for
> broken procurement.
>
> Just requisition the damn machines.
>

Running in the cloud does not mean it has to be complicated. It can be
a simple Buildbot or whatever that knows how to spawn spot instances
for tests and destroy them when they're done *if the test passed*. If
a test failed on an instance, it could hold onto them for a day or two
for someone to debug if needed.

(I mention Buildbot because in a previous life, I used that to run
tests for the dattobd out-of-tree kernel module before. That was the
strategy I used for it.)

> > Some will be, some will have more demanding requirements especially when
> > you want to test on actual hardware rather than in a VM.  For example
> > with my own test setup which is more focused on hardware the operating
> > costs aren't such a big deal but I've got boards that are for various
> > reasons irreplaceable, often single instances of boards (which makes
> > scheduling a thing) and for some of the tests I'd like to get around to
> > setting up I need special physical setup.  Some of the hardware I'd like
> > to cover is only available in machines which are in various respects
> > annoying to automate, I've got a couple of unused systems waiting for me
> > to have sufficient bandwidth to work out how to automate them.  Either
> > way I don't think the costs are trival enough to be completely handwaved
> > away.
>
> That does complicate things.
>
> I'd also really like to get automated performance testing going too,
> which would have similar requirements in that jobs would need to be
> scheduled on specific dedicated machines. I think what you're doing
> could still build off of some common infrastructure.
>
> > I'd also note that the 9 hour turnaround time for that test set you're
> > pointing at isn't exactly what I'd associate with immediate feedback.
>
> My CI shards at the subtest level, and like I mentioned I run 10 VMs per
> physical machine, so with just 2 of the 80 core Ampere boxes I get full
> test runs done in ~20 minutes.
>

This design, ironically, is way more cloud-friendly than a lot of
testing system designs I've seen in the past. :)

-- 
真実はいつも一つ！/ Always, there's only one truth!