On Tue, Jun 21, 2022 at 05:07:10PM -0700, Luis Chamberlain wrote: > On Thu, Jun 16, 2022 at 11:27:41AM -0700, Leah Rumancik wrote: > > https://gist.github.com/lrumancik/5a9d85d2637f878220224578e173fc23. > > The coverage for XFS is using profiles which seem to come inspired > by ext4's different mkfs configurations. That's not correct, actually. It's using the gce-xfstests test framework which is part of the xfstests-bld[1][2] system that I maintain, yes. However, the actual config profiles were obtained via discussions from Darrick and represent the actual configs which the XFS maintainer uses to test the upstream XFS tree before deciding to push to Linus. We figure if it's good enough for the XFS Maintainer, it's good enough for us. :-) [1] https://thunk.org/gce-xfstests [2] https://github.com/tytso/xfstests-bld If you think the XFS Maintainer should be running more configs, I invite you to have that conversation with Darrick. > GCE is supported as well, so is Azure and OpenStack, and even custom > openstack solutions... The way kdevops work is quite different from how gce-xfstests work, since it is a VM native solution. Which is to say, when we kick off a test, VM's are launched, one per each config, whih provide for better parallelization, and then once everything is completed, the VM's are automatically shutdown and they go away; so it's far more efficient in terms of using cloud resources. The Lightweight Test Manager will ten take the Junit XML files, plus all of the test artifacts, and these get combined into a single test report. The lightweight test manager runs in a small VM, and this is the only VM which is consuming resources until we ask it to do some work. For example: gce-xfstests ltm -c xfs --repo stable.git --commit v5.18.6 -c xfs/all -g auto That single command will result in the LTM launching a large builder VM which quickly build the kernel. (And it uses ccache, and a persistent cache disk, but even if we've never built the kernel, it can complete the build in a few minutes.) Then we launch 12 VM's, one for each config, and since they don't need to be optimized for fast builds, we can run most of the VM's with a smaller amount of memory, to better stress test the file system. (But for the dax config, we'll launch a VM with more memory, since we need to simulate the PMEM device using raw memory.) Once each VM completes each test run, it uploads its test artifiacts and results XML file to Google Cloud Storage. When all of the VM's complete, the LTM VM will download all of the results files from GCS, combines them together into a single result file, and then sends e-mail with a summary of the results. It's optimized for developers, and for our use cases. I'm sure kdevops is much more general, since it can work for hardware-based test machines, as well as many other cloud stacks, and it's also optimized for the QA department --- not surprising, since where kdevops has come from. > Also, I see on the above URL you posted there is a TODO in the gist which > says, "find a better route for publishing these". If you were to use > kdevops for this it would have the immediate gain in that kdevops users > could reproduce your findings and help augment it. Sure, but with our system, kvm-xfstests and gce-xfstests users can *easily* reproduce our findings and can help augment it. :-) As far as sharing expunge files, as I've observed before, these files tend to be very specific to the test configuration --- the number of CPU's, and the amount of memory, the characteristics of the storage device, etc. So what works for one developer's test setup will not necessarily work for others --- and I'm not convinced that trying to get everyone standardized on the One True Test Setup is actually an advantage. Some people may be using large RAID Arrays; some might be using fast flash; some might be using some kind of emulated log structured block device; some might be using eMMC flash. And that's a *good* thing. We also have a very different philosophy about how to use expunge files. In paticular, if there is test which is only failing 0.5% of the time, I don't think it makes sense to put that test into an expunge file. In general, we are only placing tests into expunge files when it causes the system under test to crash, or it takes *WAAAY* too long, or it's a clear test bug that is too hard to fix for real, so we just suppress the test for that config for now. (Example: tests in xfstests for quota don't understand clustered allocation.) So we want to run the tests, even if we know it will fail, and have a way of annotating that a test is known to fail for a particular kernel version, or if it's a flaky test, what the expected flake percentage is for that particular test. For flaky tests, we'd like to be able automatically retry running the test, and so we can flag when a flaky test has become a hard failure, or a flaky test has radically changed how often it fails. We haven't implemented all of this yet, but this is something that we're exploring the design space at the moment. More generally, I think competition is a good thing, and for areas where we are still exploring the best way to automate tests, not just from a QA department's perspective, but from a file system developer's perspective, having multiple systems where we can explore these ideas can be a good thing. Cheers, - Ted