On Tue, Sep 06, 2011 at 11:07:26AM -0700, Jeremy Fitzhardinge wrote: > >> But, erm, does that even make sense? I'm assuming the NMI reason port > >> tells the CPU why it got an NMI. If multiple CPUs can get NMIs and > >> there's only a single reason port, then doesn't that mean that either 1) > >> they all got the NMI for the same reason, or 2) having a single port is > >> inherently racy? How does the locking actually work there? > > The reason port is for an external/system NMI. All the IPI-NMI don't need > > to access this register to process their handlers, ie perf. I think in > > general the IOAPIC is configured to deliver the external NMI to one cpu, > > usually the bsp cpu. However, there has been a slow movement to free the > > bsp cpu from exceptions like this to allow one to eventually hot-swap the > > bsp cpu. The spin locks in that code were an attempt to be more abstract > > about who really gets the external NMI. Of course SGI's box is setup to > > deliver an external NMI to all cpus to dump the stack when the system > > isn't behaving. > > > > This is a very low usage NMI (in fact almost all cases lead to loud > > console messages). > > > > Hope that clears up some of the confusion. > > Hm, not really. > > What does it mean if two CPUs go down that path? Should one do some NMI > processing while the other waits around for it to finish, and then do > some NMI processing on its own? Well the time the second one gets to the external NMI it should have been cleared by the first cpu, which would of course lead to the second cpu causing a 'Dazed and confused' message. But on most x86 machines only one cpu should be routed the external NMI. Though there is an SGI box that is designed to send an external NMI to all of its cpus. > > It sounds like that could only happen if you reroute NMI from one CPU to > another while the first CPU is actually in the middle of processing an > NMI - in which case, shouldn't the code doing the re-routing be taking > the spinlock? Perhaps, but like I said it is a slow transition because most people don't have the hardware to test this (nor does it work due to other limitations). > > Or perhaps a spinlock isn't the right primitive to use at all? Couldn't > the second CPU just set a flag/counter (using something like an atomic > add/cmpxchg/etc) to make the first CPU process the second NMI? Might be a smarter approach. Like I said it is hard to test without functioning hardware. :-( > > But on the other hand, I don't really care if you can say that this path > will never be called in a virtual machine. Does virtual machines support hot remove of cpus? Probably not considering bare-metal barely supports it. Cheers, Don -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html