On Fri, Mar 07, 2025 at 11:47:22PM +0800, Zorro Lang wrote: > On Fri, Mar 07, 2025 at 07:56:28AM +1100, Dave Chinner wrote: > > I'm talking about removing all the setsid and namspace bits that we > > added to try to solve the need for pkill --parent. i.e. this series > > of patches: > > > > 949bdf8ea check: deprecate using process sessions to isolate test instances > > 247ab01fa check: run tests in a private pid/mount namespace > > 88d60f434 common: fix pkill by running test program in a separate session > > c71349150 common/rc: hoist pkill to a helper function > > 77548e606 common/rc: create a wrapper for the su command > > > > Using private PID namespaces in check-parallel has solved the > > original problem that required the pkill --parent usage and the > > fsstress binary rename/copy workaround by wrapping process isolation > > outside of each check instance. Hence check and the internal > > check infrastructure doesn't need to ever care about process > > isolation to solve this check-parallel problem anymore. > > If "check" doesn't need these namespace isolation things, that will be good > to me. That'll help to isolate the issues on check or on check-parallel. > And we won't bring in regressions to check when try to fix check-parallel. > > Now let me merge this patch at first, to avoid the regressions on btrfs. > Then please feel free to send patches to remove/reduce the coupling of > two testing ways (check and check-parallel:) Sure. FWIW, it's not necessary for you to test whether the merged version of check-parallel runs correctly right now. If some other change breaks it, I will fix it and send patches to fix it. I'm much more concerned that changes made to support check-parallel don't break check... > > i.e. the isolation fix should have been made to check-parallel, > > not to check. > > > > It is not ideal to have to walk all this stuff back, but this is > > what happens when an RFC-scope feature prototype is merged without > > allowing adequate time for review or give the author a chance to > > address at least one round of review comments and bug fixes before > > it gets merged.... > > Sorry, I've learned from this. I thought I've given it lots of test, > and fixed all bug reports at that time. And I asked several times > if there's any objection about merging it in next release. As there's > not objection from anyone in 2 weeks, so I decided to give it a chance, > to save the super complicated rebasing workload... Sure, not having anyone review a prototype after a couple of weeks is pretty common - it does not mean it is ready for merge. It typically means that the author will then post a newer, more complete version when they are ready, and then maybe people will look at it. Hence my comment on where I wanted people to focus their review - not on the check-parallel stuff, but on the generic infrastructure changes (i.e. "get the process underway") because they were much more important to get right than the check-parallel script. IOWs, I was not asking anyone to bypass the normal cyclic review-repost process - I was simply doing what ever experienced developer does when posting a large patchset: guiding reviewers to the important areas of the patch set to focus on first. > Anyway, next time, > for each complex patchset on infrastructure, I'll always give it 3+ > release cycles time. And never merge it if without 3 acks from xfs, > ext4 and btrfs, And that, IMO, is swinging too far the other way. If the code is complete and ready to go and nobody will review it, then the above rule makes it impossible for the author to get their code merged. now we have the opposite problem: people have to carry large infrastructure changes for a long time because they cannot get anyone to review it. The normal process is to let review-and-repost cycles occur naturally (e.g. every couple of weeks for a few versions), with the author indicating what the important things to focus review on in the patchset are. When that cycle reaches it's end, a request for merge will be made (e.g. a pull request or a repost with all patches containing RVBs). This shouldn't take months, and if the reviewers focus on the things that the author asks them to look at first then the important things end up being ready for merge earlier rather than later. IOWs, there is no need for placing arbitrary bars to jump over if the process is allowed to follow it's natural path. Problems typically only arise when the process is not followed, so there needs to be a very good reason for stepping outside normal processes or having arbitrary rules that nobody really knows when/if they might trigger... -Dave. -- Dave Chinner david@xxxxxxxxxxxxx