On Thu, Nov 17, 2022 at 07:06:37AM -0800, Paul E. McKenney wrote: > On Thu, Nov 17, 2022 at 07:30:32AM +0100, Sven Schnelle wrote: > > Hi Paul, > > > > "Paul E. McKenney" <paulmck@xxxxxxxxxx> writes: > > > > >> > Yes, rcutorture has lower-level checks for CPUs being hotplugged > > >> > behind its back. Which might be sufficient. But this patch is in > > >> > response to something bad happening if the CPU is also not present in > > >> > the cpu_present_mask. Would that same bad thing happen if rcutorture saw > > >> > the CPU in cpu_online_mask, but by the time it attempted to CPU-hotplug > > >> > it, that CPU was gone not just from cpu_online_mask, but also from > > >> > cpu_present_mask? > > >> > > > >> > Or are CPUs never removed from cpu_present_mask? > > >> > > >> In the current implementation CPUs can only be added to the > > >> cpu_present_mask, but never removed. This might change in the future > > >> when we get support from firmware for that, but the current s390 code > > >> doesn't do that. > > > > > > Very good! > > > > > > Then could the patch please check that bits are never removed? > > > That way the code will complain should firmware support be added. > > > > > > Thanx, Paul > > > > I'm not sure whether i fully understand that. If the CPU could > > be removed from the system and the cpu_present_mask, that could > > happen at any time. So i don't see how we should check about that? > > Well, that is my question to you. ;-) > > Suppose we have the following sequence of events: > > o rcutorture sees that CPU 5 is in cpu_present_mask, but offline. > > o rcutorture therefore decides to online CPU 5. > > o s390 firmware removes CPU 5, and s390 architecture code then > clears it from the cpu_present_mask. > > o rcutorture proceeds with onlining CPU 5. > > Don't we then get the same problem that prompted you to change from > cpu_possible_mask to cpu_present mask? If not, why can't the rcutorture > code continue to use cpu_possible_mask? > > If it really is bad to try to online or offline a CPU that is in > cpu_possible_mask but not in cpu_present_mask, and if CPUs can be removed > from cpu_present_mask, then we need some way to synchronize the removal > of CPUs from cpu_present_mask. There are of course a lot of possible > ways to do that synchronization, for example, protecting cpu_present_mask > with a mutex or similar. > > Alternatively, s390 could restrict things. One way to do that would > be to turn off rcutorture's use of CPU hotplug when running on s390, > for example, by using the module parameters provided for that purpose. > Another way to do that would be to refrain from removing CPUs from > cpu_present_mask while rcutorture is running. > > Are there other approaches? For the near term, why not have rcutorture keep a snapshot of cpu_present_mask, and splat if a CPU is ever removed from that mask? That would catch any issues, and defer any synchronization decisions to a time at which we actually have some chance of knowing what is going on. Thanx, Paul