On Fri, Jun 24, 2022 at 01:32:23AM -0400, Theodore Ts'o wrote: > On Thu, Jun 23, 2022 at 02:31:12PM -0700, Luis Chamberlain wrote: > > > > To be clear, you seem to suggest gce-xfstests is a VM native solution. > > I'd also like to clarify that kdevops supports native VMs, cloud and > > baremetal. With kdevops you pick your bringup method. > > Yes, that was my point. Because gce-xfstests is a VM native solution, > it has some advantages, such as the ability to take advantage of the > fact that it's trivially easy to start up multiple cloud VM's which > can run in parallel --- and then the VM's shut themselves down once > they are done running the test, which saves cost and is more > efficient. Perhaps I am not understanding what you are suggesting with a VM native solution. What do you mean by that? A full KVM VM inside the cloud? Anyway, kdevops has support to bring up whatever type of node you want in the clouds providers: GCE, AWS, Azure, and OpenStack and even custom OpenStack solutions. That could be a VM or a high end bare metal node. It does this by using terraform and providing the variability through kconfig. The initial 'make bringup' brings nodes up, and then all work runs on each in parallel for fstests as you run 'make fstests-baseline'. At the end you just run 'make destroy'. > It is *because* that we are a VM-native solution that we can optimize > in certain ways because we don't have to also support a bare metal > setup. So yes, the fact that kdevops also supports bare metal is > certainly granted. That that kind of flexibility is an advantage for > kdevops, certainly; but being able to fully take advantage of the > unqiue attributes of cloud VM's can also be a good thing. Yes, agreed. That is why I focused on technology that would support all cloud providers, not just one. I had not touched code for AWS code for example in 2 years, I just went and tried a bringup and it worked in 10 minutes, most of the time was getting my .aws/credentials file set up with information from the website. > > kdevops started as an effort for kernel development and filesystems > > testing. It is why the initial guest configuration was to use 8 GiB > > of RAM and 4 vcpus, that suffices to do local builds / development. > > I always did kernel development on guests back in the day still do > > to this day. > > For kvm-xfstests, the default RAM size for the VM is 2GB. One of the > reasons why I was interested in low-memory configurations is because > ext4 is often used in smaller devices (such as embedded systesm and > mobile handsets) --- and running in memory constrained environments > can turn up bugs that otherwise are much harder to reproduce on a > system with more memory. Yes, I agree. We started with 8 GiB. Long ago while at SUSE I tried 2GiB and ran into the xfs/074 issue of requiring more due to xfs_scratch. Then later Amir ran into snags with xfs/084 and generic/627 due to the OOMs. So in terms of XFS to avoid OOMs with just the tests we need 3GiB. > Separating the kernel build system from the test VM's means that the > build can take place on a really powerful machine (either my desktop > with 48 cores and gobs and gobs of memory, or a build VM if you are > using the Lightweight Test Manager's Kernel Compilation Service) so > builds go much faster. And then, of course, we can then launch a > dozen VM's, one for each test config. If you force the build to be > done on the test VM, then you either give up parallelism, or you waste > time by building the kernel N times on N test VM's. The build is done once but I agree this can be optimized for kdevops. Right now in kdevops the git clone and build of the kernel does take place on each guest, and that requires at least 3 GiB of RAM. Shallow git clone support was added as option to help here but the ideal thing will be to just build locally or perhaps as you suggest dedicated build VM. > And in the case of the android-xfstests, which communicates with a > phone or tablet over a debugging serial cable and Android's fastboot > protocol, of *course* it would be insane to want to build the kernel > on the system under test! > > So I've ***always*** done the kernel build on a machine or VM separate > from the System Under Test. At least for my use cases, it just makes > a heck of a lot more sense. Support for this will be added to kdevops. > And that's fine. I'm *not* trying to convince everyone that my test > infrastructure everyone should standardize on. Which quite frankly, I > sometimes think you have been evangelizing. I believe very strongly > that the choice of test infrastructures is a personal choice, which is > heavily dependent on each developer's workflow, and trying to get > everyone to standardize on a single test infrastructure is likely > going to work as well as trying to get everyone to standardize on a > single text editor. What I think we *should* standardize on is at least configurations for testing. And now the dialog of how / if we track / share failures is also important. What runner you use is up to you. > (Although obviously emacs is the one true editor. :-) > > > Sure, the TODO item on the URL seemed to indicate there was a desire to > > find a better place to put failures. > > I'm not convinced the "better place" is expunge files. I suspect it > may need to be some kind of database. Darrick tells me that he stores > his test results in a postgres database. (Which is way better than > what I'm doing which is an mbox file and using mail search tools.) > > Currently, Leah is using flat text files for the XFS 5.15 stable > backports effort, plus some tools that parse and analyze those text > files. Where does not matter yet, what I'd like to refocus on is *if* sharing is desirable by folks. We can discuss *how* and *where* if we do think it is worth to share. If folks would like to evaluate this I'd encourage to do so perhaps after a specific distro release moving forward, and to not backtrack. But for stable kernels I'd imagine it may be easier to see value in sharing. > I'll also note that the number of baseline kernel versions is much > smaller if you are primarily testing an enterprise Linux distribution, > such as SLES. Much smaller than what? Android? If so then perhaps. Just recall that Enterprise supports kernels for at least 10 years. > And if you are working with stable kernels, you can > probably get away with having updating the baseline for each LTS > kernel every so often. But for upstream kernels development the > number of kernel versions for which a developer might want to track > flaky percentages and far greater, and will need to be updated at > least once every kernel development cycle, and possibly more > frequently than that. Which is why I'm not entirely sure a flat text > file, such as an expunge file, is really the right answer. I can > completely understand why Darrick is using a Postgres database. > > So there is clearly more thought and design required here, in my > opinion. Sure, let's talk about it, *if* we do find it valuable to share. kdevops already has stuff in a format which is consistent, that can change or be ported. We first just need to decide if we want to as a community share. The flakyness annotations are important too, and we have a thread about that, which I have to go and get back to at some point. > > That is not a goal, the goal is allow variability! And share results > > in the most efficient way. > > Sure, but are expunge files the most efficient way to "share results"? There are three things we want to do if we are going to talk about sharing results: a) Consuming expunges so check.sh for the Node Under Test (NUT) can expand on the expunges given a criteria (flakyness, crash requirements) b) Sharing updates to expunges per kernel / distro / runner / node-config and making patches to this easy. c) Making updates for failures easy to read for a developer / community. These would be in the form of an email or results file for a test run through some sort of kernel-ci. Let's start with a): We can adopt runners to use anything. My gut tells me postgres is a bit large unless we need socket communication. I can think of two ways to go here then. Perhaps others have some other ideas? 1) We go lightweight on the db, maybe sqlite3 ? And embrace the same postgres db schema as used by Darrick if he sees value in sharing this. If we do this I think it does't make sense to *require* sqlite3 on the NUT (nodes), for many reasons, so parsing the db on the host to a flat file to be used by the node does seem ideal. 2) Keep postgres and provide a REST api for queries from the host to this server so it can then construct a flat file / directory interpreation of expunges for the nodes under test (NUT). Given the minimum requirements desirable on the NUTs I think in the end a flat file hierarchy is nice so to not incur some new dependency on them. Determinism is important for tests though so snapshotting a reflection interpretion of expunges at a specific point in time is also important. So the database would need to be versioned per updates, so a test is checkpointed against a specific version of the expunge db. If we come to some sort of consensus then this code for parsing an expunge set can be used from directly on fstests's check script, so the interpreation and use can be done in one place for all test runners. We also have additional criteria which we may want for the expunges. For instance, if we had flakyness percentage annotated somehow then fstests's check could be passed an argument to only include expunges given a certain flakyness level of some sort, or for example only include expunges for tests which are known to crash. Generating the files from a db is nice. But what gains do we have with using a db then? Now let's move on to b) sharing the expunges and sending patches for updates. I think sending a patch against a flat file reads a lot easier except for the comments / flakyness levels / crash consideration / and artifacts. For kdevop's purposes this reads well today as we don't upload artifacts anywhere and just refer to them on github gists as best effort / optional. There is no convention yet on expression of flakyness but some tests do mention "failure rate" in one way or another. So we want to evaluate if we want to share not only expunges but other meta data associated to why a new test can be expunged or removed: * flakyness percentage * cause a kernel crash? * bogus test? * expunged due to a slew of a tons of other reasons, some of them maybe categorized and shared, some of them not And do we want to share artifacts? If so how? Perhaps an optional URL, with another component describing what it is, gist, or a tarball, etc. Then for the last part c) making failures easy to read to a developer let's review what could be done. I gather gce-xfstests explains the xunit results summary. Right now kdevop's kernel-ci stuff just sends an email with the same but also a diff to the expunge file hierarchy augmented for the target kernel directory being tested. The developer would just go and edit the line with meta data as a comment, but that is just because we lack a structure for it. If we strive to share an expunge list I think it would be wise to consider structure for this metadata. Perhaps: <test> # <crashes>|<flayness-percent-as-fraction>|<fs-skip-reason>|<artifact-type>|<artifact-dir-url>|<comments> Where: test: xfs/123 or btrfs/234 crashes: can be either Y or N flayness-percent-as-percentage: 80% fs-skip-reason: can be an enum to represent a series of fs specific reasons why a test may not be applicable or should be skipped artifact-type: optional, if present the type of artifact, can be enum to represent a gist test description, or a tarball artifact-dir-url: optional, path to the artifact comments: additional comments All the above considered, a) b) and c), yes I think a flat file model works well as an option. I'd love to hear other's feedback. > If we have a huge amount of variability, such that we have a large > number of directories with different test configs and different > hardware configs, each with different expunge files, I'm not sure how > useful that actually is. *If* you want to share I think it would be useful. At least kdevops uses a flat file model with no artifacts, just the expunges and comments, and over time it has been very useful, even to be able to review historic issues on older kernels by simply using something like 'git grep xfs/123' gives me a quick sense of history of issues of a test. > Are we expecting users to do a "git clone", > and then start browsing all of these different expunge files by hand? If we want to extend fstests check script to look for this, it could be an optional directory and an arugment could be pased to check so to enable its hunt for it, so that if passed it would look for the runner / kernel / host-type. For instance today we already have a function on initialization for the check script which looks for the fstests' config file as follows: known_hosts() { [ "$HOST_CONFIG_DIR" ] || HOST_CONFIG_DIR=`pwd`/configs [ -f /etc/xfsqa.config ] && export HOST_OPTIONS=/etc/xfsqa.config [ -f $HOST_CONFIG_DIR/$HOST ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST [ -f $HOST_CONFIG_DIR/$HOST.config ] && export HOST_OPTIONS=$HOST_CONFIG_DIR/$HOST.config } We could have something similar look for an expugne directory of say say --expunge-auto-look and that could be something like: process_expunge_dir() { [ "$HOST_EXPUNGE_DIR" ] || HOST_EXPUNGE_DIR=`pwd`/expunges [ -d /etc/fstests/expunges/$HOST ] && export HOST_EXPUNGES=/etc/fstests/expunges/$HOST [ -d $HOST_EXPUNGE_DIR/$HOST ] && export HOST_EXPUNGES=$HOST_EXPUNGE_DIR/$HOST } The runner could be specified, and the host-type ./check --runner <gce-xfstests|kdevops|whatever> --host-type <kvm-8vcpus-2gb> And so we can have it look for these directory and if any of these are used processed (commulative): * HOST_EXPUNGES/any/$fstype/ - regardless of kernel, host type and runner * HOST_EXPUNGES/$kernel/$fstype/any - common between runners for any host type * HOST_EXPUNGES/$kernel/$fstype/$hostype - common between runners for a host type * HOST_EXPUNGES/$kernel/$fstype/$hostype/$runner - only present for the runner The aggregate set of expugnes are used. Additional criteria could be passed to check so to ensure that only certain expunges that meet the criteria are used to skip tests for the run, provided we can agree on some metatdata for that. > It might perhaps be useful to get a bit more clarity about how we > expect the shared results would be used, because that might drive some > of the design decisions about the best way to store these "results". Sure. Luis