Frederic Weisbecker's on May 8, 2019 10:35 am: > On Tue, May 07, 2019 at 09:50:24AM +1000, Nicholas Piggin wrote: >> Frederic Weisbecker's on May 7, 2019 1:16 am: >> > On Sat, May 04, 2019 at 04:59:12PM +1000, Nicholas Piggin wrote: >> >> Frederic Weisbecker's on May 4, 2019 10:27 am: >> >> > On Fri, May 03, 2019 at 10:47:37AM -0700, tip-bot for Nicholas Piggin wrote: >> >> >> Commit-ID: 9219565aa89033a9cfdae788c1940473a1253d6c >> >> >> Gitweb: https://git.kernel.org/tip/9219565aa89033a9cfdae788c1940473a1253d6c >> >> >> Author: Nicholas Piggin <npiggin@xxxxxxxxx> >> >> >> AuthorDate: Thu, 11 Apr 2019 13:34:47 +1000 >> >> >> Committer: Ingo Molnar <mingo@xxxxxxxxxx> >> >> >> CommitDate: Fri, 3 May 2019 19:42:58 +0200 >> >> >> >> >> >> sched/isolation: Require a present CPU in housekeeping mask >> >> >> >> >> >> During housekeeping mask setup, currently a possible CPU is required. >> >> >> That does not guarantee the CPU would be available at boot time, so >> >> >> check to ensure that at least one present CPU is in the mask. >> >> > >> >> > I have a doubt about the requirements and semantics of cpu_present_mask. >> >> > IIUC a present CPU means that it is physically plugged in (from ACPI >> >> > perspective) but might not be logically plugged in (set on cpu_online_mask). >> >> >> >> Right, a superset of cpu_possible_mask, subset of cpu_online_mask. It >> >> means that CPU can be brought online at any time. >> >> >> >> > But do we have the guarantee that a present CPU _will_ be online at least once >> >> > right after the boot? After all, kernel parameters such as "maxcpus=" can prevent >> >> > from turning some CPUs on. I guess there are even more creative ways to achieve >> >> > that. >> >> > >> >> > In any case we really require the housekeeper to be forced online. Perhaps >> >> > I missed that enforcement somewhere in the patchset? >> >> >> >> No I think you're right, that may be able to boot without anything in >> >> the housekeeping mask. Maybe we can just cpu_up() a CPU in the >> >> housekeeping mask with a warning that it has overidden their SMP >> >> command line option. I'll take a look at it. >> > >> > But then what if cpu_up() fails? In this case I can think of only two >> > answers: >> > >> > * Force the boot CPU as the housekeeper. >> > * Rollback the whole thing: nohz and all isolation. >> >> If cpu_up fails despite being in the present map and we explicitly >> selected it as the housekeeper? I think it would be okay to print >> a message telling admin to correct the config, and panic. >> >> We try a best effort to make the system boot and limp along, but if >> you misconfigure it, crashing is not unreasonable. There's lots of >> command line option misconfiguration that will cause the same thing. >> >> The primary problem with my patch that needs to be addressed is that >> the error is not explicitly caught and printed if the housekeeper >> does not come up, so the system might die in non-obvious ways. > > I usually reserve panic and BUG_ON() to last resort when data integrity is > directly threatened. But indeed I guess that's all we have for now. Right, specifying a CPU for housekeeping that excluded from coming up at boot with maxcpus= or whatever, is not such a big deal to panic I think. Just need to have a clear error message. > If we take that path, I'd rather not call that cpu_up() and simply panic if > the given CPU happens not to be online after SMP bootup. Sure that's fine by me too. Thanks, Nick
![]() |