On Wed, Feb 01, 2012 at 05:30:15PM +0000, Colin Cross wrote: > On Wed, Feb 1, 2012 at 6:59 AM, Lorenzo Pieralisi > <lorenzo.pieralisi@xxxxxxx> wrote: > > On Wed, Feb 01, 2012 at 12:13:26PM +0000, Vincent Guittot wrote: > > > > [...] > > > >> >> In your patch, you put in safe state (WFI for most of platform) the > >> >> cpus that become idle and these cpus are woken up each time a new cpu > >> >> of the cluster becomes idle. Then, the cluster state is chosen and the > >> >> cpus enter the selected C-state. On ux500, we are using another > >> >> behavior for synchronizing the cpus. The cpus are prepared to enter > >> >> the c-state that has been chosen by the governor and the last cpu, > >> >> that enters idle, chooses the final cluster state (according to cpus' > >> >> C-state). The main advantage of this solution is that you don't need > >> >> to wake other cpus to enter the C-state of a cluster. This can be > >> >> quite worth full when tasks mainly run on one cpu. Have you also think > >> >> about such behavior when developing the coupled cpuidle driver ? It > >> >> could be interesting to add such behavior. > >> > > >> > Waking up the cpus that are in the safe state is not done just to > >> > choose the target state, it's done to allow the cpus to take > >> > themselves to the target low power state. On ux500, are you saying > >> > you take the cpus directly from the safe state to a lower power state > >> > without ever going back to the active state? I once implemented Tegra > >> > >> yes it is > > > > But if there is a single power rail for the entire cluster, when a CPU > > is "prepared" for shutdown this means that you have to save the context and > > clean L1, maybe for nothing since if other CPUs are up and running the > > CPU going idle can just enter a simple standby wfi (clock-gated but power on). > > > > With Colin's approach, context is saved and L1 cleaned only when it is > > almost certain the cluster is powered off (so the CPUs). > > > > It is a trade-off, I am not saying one approach is better than the > > other; we just have to make sure that preparing the CPU for "possible" shutdown > > is better than sending IPIs to take CPUs out of wfi and synchronize > > them (this happens if and only if CPUs enter coupled C-states). > > > > As usual this will depend on use cases (and silicon implementations :) ) > > > > It is definitely worth benchmarking them. > > > > I'm less worried about performance, and more worried about race > conditions. How do you deal with the following situation: > CPU0 goes to WFI, and saves its state > CPU1 goes idle, and selects a deep idle state that powers down CPU0 > CPU1 saves is state, and is about to trigger the power down > CPU0 gets an interrupt, restores its state, and modifies state (maybe > takes a spinlock during boot) > CPU1 cuts the power to CPU0 > > On OMAP4, the race is handled in hardware. When CPU1 tries to cut the > power to the blocks shared by CPU0 the hardware will ignore the > request if CPU0 is not in WFI. On Tegra2, there is no hardware > support and I had to handle it with a spinlock implemented in scratch > registers because CPU0 is out of coherency when it starts booting and > ldrex/strex don't work. I'm not convinced my implementation is > correct, and I'd be curious to see any other implementations. That's a problem you solved with coupled C-states (ie your example in the cover letter), where the primary waits for other CPUs to be reset before issuing the power down command, right ? At that point in time secondaries cannot wake up (?) and if wfi (ie power down) aborts you just take the secondaries out of reset and restart executing simultaneously, correct ? It mirrors the suspend behaviour, which is easier to deal with than completely random idle paths. It is true that this should be managed by the PM HW; if HW is not capable of managing these situations things get nasty as you highlighted. And it is also true ldrex/strex on cacheable memory might not be available in those early warm-boot stages. I came up with a locking algorithm on strongly ordered memory to deal with that, but I am still not sure it is something we really really need. I will test coupled C-state code ASAP, and come back with feedback. Thanks, Lorenzo -- To unsubscribe from this list: send the line "unsubscribe linux-tegra" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html