On Wed, 2023-02-01 at 14:40 +0000, Usama Arif wrote: > On 01/02/2022 20:53, David Woodhouse wrote: > > Doing the INIT/SIPI/SIPI in parallel for all APs and *then* waiting for > > them shaves about 80% off the AP bringup time on a 96-thread 2-socket > > Skylake box (EC2 c5.metal) — from about 500ms to 100ms. > > > > There are more wins to be had with further parallelisation, but this is > > the simple part. > > > > Hi, > > We are interested in reducing the boot time of servers (with kexec), and > smpboot takes up a significant amount of time while booting. When > testing the patch series (rebased to v6.1) on a server with 128 CPUs > split across 2 NUMA nodes, it brought down the smpboot time from ~700ms > to 100ms. Adding another cpuhp state for do_wait_cpu_initialized to make > sure cpu_init is reached (as done in v1 of the series + using the > cpu_finishup_mask) brought it down further to ~30ms. > > I just wanted to check what was needed to progress the patch series > further for review? There weren't any comments on v4 of the patch so I > couldn't figure out what more is needed. I think its quite useful to > have this working so would be really glad help in anything needed to > restart the review. I believe the only thing holding it back was the fact that it broke on some AMD CPUs. We don't *think* there are any remaining software issues; we think it's hardware. Either an actual hardware race in CPU or chipset, or perhaps even something as simple as a voltage regulator which can't cope with an increase in power draw from *all* the CPUs at the same time. We have prodded AMD a few times to investigate, but so far to no avail. Last time I actually spoke to Thomas in person, I think he agreed that we should just merge it and disable the parallel mode for the affected AMD CPUs. If you've already rebased to a newer kernel and tested it, perhaps now is the time to do just that.
Attachment:
smime.p7s
Description: S/MIME cryptographic signature