On Fri, Dec 06, 2024 at 10:38:11AM -0600, Eric Sandeen wrote: > On 12/5/24 4:52 PM, Luis Chamberlain wrote: > > On Wed, Dec 04, 2024 at 10:35:45PM -0600, Eric Sandeen wrote: > >> but that probably has more to do with the test not realizing > >> /before it starts/ that the module cannot be removed and it > >> should not even try. > > > > Right. > > > >> Darrick fixed that with: > >> > >> [PATCH 2/2] xfs/43[4-6]: implement impatient module reloading > > > > Looks good to me. > > > >> but it's starting to feel like a bit of a complex house of cards > >> by now. We might need a more robust framework for determining whether > >> a module is removable /at all/ before we decide to wait patiently > >> for a thing that cannot ever happen? > > > > I think the above is a good example of knowing userspace and knowing > > that userspace may be doing something else and we're ok to fail. > > Essentially, module removal is non-deterministic due to how finicky > > and easy it is to bump the refcnt for arbitrary reasons which are > > subsystem specific. The URLs in the commit log I added provide good > > examples of this. It is up to each subsystem to ensure a proper > > quiesce makes sense to ensure userspace won't do something stupid > > later. > > > > If one can control the test environment to quiesce first, then it > > makes sense to patiently remove the module. Otherwise the optional > > impatient removal makes sense. > > Not to belabor the point too much, but my gut feeling is there are > cases where "quiescing" is not the issue at all - if the module is > in use on the system somewhere outside of xfstests, no amount of > quiescing or waiting will make it removable. Yes indeed, that is a good point. Only if the test suite has full control to ensure the lifetime of the module could it rely on removal. But there are holes in the assumptions which can be made even on this front too. I'll explain below. > Essentially, xfstests > needs to figure out if it is the sole owner/user of a module before > it tries to do any sort of waiting for removal, IMHO. Even if it could do that, it can't prevent the user from poking around, or for some userspace package to not be present which may proactively poke around block devices of certain type for example. Such things will break the assumptions that the test suite has full control. Test runners, which do full bringup / setup / package installations have more control for a more deterministic setup, fstests itself however will have most control, but since this is about flaky tests failing for stupid reasons, this is all about *improving* from a test perspective consciosness over this problem and for the tester to make more appropriate calls for what it thinks it can have control over. Luis