On Thu, Jan 14, 2016 at 08:45:04AM +0530, Mahesh J Salgaonkar wrote: > From: Mahesh Salgaonkar <mahesh@xxxxxxxxxxxxxxxxxx> > > When a guest is assigned to a core it converts the host Timebase (TB) > into guest TB by adding guest timebase offset before entering into > guest. During guest exit it restores the guest TB to host TB. This means > under certain conditions (Guest migration) host TB and guest TB can differ. > > When we get an HMI for TB related issues the opal HMI handler would > try fixing errors and restore the correct host TB value. With no guest > running, we don't have any issues. But with guest running on the core > we run into TB corruption issues. > > If we get an HMI while in the guest, the current HMI handler invokes opal > hmi handler before forcing guest to exit. The guest exit path subtracts > the guest TB offset from the current TB value which may have already > been restored with host value by opal hmi handler. This leads to incorrect > host and guest TB values. > > With split-core, things become more complex. With split-core, TB also gets > split and each subcore gets its own TB register. When a hmi handler fixes > a TB error and restores the TB value, it affects all the TB values of > sibling subcores on the same core. On TB errors all the thread in the core > gets HMI. With existing code, the individual threads call opal hmi handle > independently which can easily throw TB out of sync if we have guest > running on subcores. Hence we will need to co-ordinate with all the > threads before making opal hmi handler call followed by TB resync. > > This patch introduces a sibling subcore state structure (shared by all > threads in the core) in paca which holds information about whether sibling > subcores are in Guest mode or host mode. An array in_guest[] of size > MAX_SUBCORE_PER_CORE=4 is used to maintain the state of each subcore. > The subcore id is used as index into in_guest[] array. Only primary > thread entering/exiting the guest is responsible to set/unset its > designated array element. > > On TB error, we get HMI interrupt on every thread on the core. Upon HMI, > this patch will now force guest to vacate the core/subcore. Primary > thread from each subcore will then turn off its respective bit > from the above bitmap during the guest exit path just after the > guest->host partition switch is complete. > > All other threads that have just exited the guest OR were already in host > will wait until all other subcores clears their respective bit. > Once all the subcores turn off their respective bit, all threads will > will make call to opal hmi handler. > > It is not necessary that opal hmi handler would resync the TB value for > every HMI interrupts. It would do so only for the HMI caused due to > TB errors. For rest, it would not touch TB value. Hence to make things > simpler, primary thread would call TB resync explicitly once for each > core immediately after opal hmi handler instead of subtracting guest > offset from TB. TB resync call will restore the TB with host value. > Thus we can be sure about the TB state. > > One of the primary threads exiting the guest will take up the > responsibility of calling TB resync. It will use one of the top bits > (bit 63) from subcore state flags bitmap to make the decision. The first > primary thread (among the subcores) that is able to set the bit will > have to call the TB resync. Rest all other threads will wait until TB > resync is complete. Once TB resync is complete all threads will then > proceed. This patch looks pretty good; the one nit that I can see is that you have a vcpu parameter to kvmppc_realmode_hmi_handler which is completely unused; which is just as well, because you get it from r9 at the point where it is called, but at that point r9 may or may not contain a vcpu pointer, depending on the path we take to get there. Why not just remove the vcpu argument? Also, patch 3/3 seems like it is just something that was missed from this patch (2/3). Why not fold it in? Paul. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html