On Wed, Jan 30, 2013 at 03:39:34PM -0600, Anthony Liguori wrote: > Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> writes: > > > On Wed, 2013-01-30 at 07:59 -0600, Anthony Liguori wrote: > >> An x86 CPU has a MMIO capability that's essentially 65 bits. Whether > >> the top bit is set determines whether it's a "PIO" transaction or an > >> "MMIO" transaction. A large chunk of that address space is invalid of > >> course. > >> > >> PCI has a 65 bit address space too. The 65th bit determines whether > >> it's an IO transaction or an MMIO transaction. > > > > This is somewhat an over simplification since IO and MMIO differs in > > other ways, such as ordering rules :-) But for the sake of memory > > regions decoding I suppose it will do. > > > >> For architectures that only have a 64-bit address space, what the PCI > >> controller typically does is pick a 16-bit window within that address > >> space to map to a PCI address with the 65th bit set. > > > > Sort-of yes. The window doesn't have to be 16-bit (we commonly have > > larger IO space windows on powerpc) and there's a window per host > > bridge, so there's effectively more than one IO space (as there is more > > than one PCI MMIO space, with only a window off the CPU space routed to > > each brigde). > > Ack. > > > Making a hard wired assumption that the PCI (MMIO and IO) space relates > > directly to the CPU bus space is wrong on pretty much all !x86 > > architectures. > > Ack. > > > > > .../... > > > > You make it sound like substractive decode is a chipset hack. It's not, > > it's specified in the PCI spec. > > It's a hack :-) It's a well specified hack, but it's still a hack. > > >> 1) A chipset will route any non-positively decoded IO transaction (65th > >> bit set) to a single end point (usually the ISA-bridge). Which one it > >> chooses is up to the chipset. This is called subtractive decoding > >> because the PCI bus will wait multiple cycles for that device to > >> claim the transaction before bouncing it. > > > > This is not a chipset matter. It's the ISA bridge itself that does > > substractive decoding. > > The PCI bus can have one end point that that can be the target for > subtractive decoding (not hard decoding, subtractive decoding). IOW, > you can only have a single ISA Bridge within a single PCI domain. > > You are right--chipset is the wrong word. I'm used to thinking in terms > of only a single domain :-) > > > There also exists P2P bridges doing such substractive > > decoding, this used to be fairly common with transparent bridges used for > > laptop docking. > > I'm not sure I understand how this would work. How can two devices on > the same PCI domain both do subtractive decoding? Indeed, the PCI spec > even says: > > "Subtractive decoding can be implemented by only one device on the bus > since it accepts all accesses not positively decoded by some other > agent." > > >> 2) There are special hacks in most PCI chipsets to route very specific > >> addresses ranges to certain devices. Namely, legacy VGA IO transactions > >> go to the first VGA device. Legacy IDE IO transactions go to the first > >> IDE device. This doesn't need to be programmed in the BARs. It will > >> just happen. > > > > This is also mostly not a hack in the chipset. It's a well defined behaviour > > for legacy devices, sometimes call hard decoding. Of course often those devices > > are built into the chipset but they don't have to. Plug-in VGA devices will > > hard decode legacy VGA regions for both IO and MMIO by default (this can be > > disabled on most of them nowadays) for example. This has nothing to do with > > the chipset. > > So I understand what you're saying re: PCI because the devices actually > assert DEVSEL to indicate that they handle the transaction. > > But for PCI-E, doesn't the controller have to expressly identify what > the target is? Is this done with the device class? Well you can have a PCI bridge and a legacy device behind that. I think real PCI express devices can not be mapped onto legacy address ranges. > > There's a specific bit in P2P bridge to control the forwarding of legacy > > transaction downstream (and VGA palette snoops), this is also fully specified > > in the PCI spec. > > Ack. > > > > >> 3) As it turns out, all legacy PIIX3 devices are positively decoded and > >> sent to the ISA-bridge (because it's faster this way). > > > > Chipsets don't "send to a bridge". It's the bridge itself that > > decodes. > > With PCI... > > >> Notice the lack of the word "ISA" in all of this other than describing > >> the PCI class of an end point. > > > > ISA is only relevant to the extent that the "legacy" regions of IO space > > originate from the original ISA addresses of devices (VGA, IDE, etc...) > > and to the extent that an ISA bus might still be present which will get > > the transactions that nothing else have decoded in that space. > > Ack. > > > > >> So how should this be modeled? > >> > >> On x86, the CPU has a pio address space. That can propagate down > >> through the PCI bus which is what we do today. > >> > >> On !x86, the PCI controller ought to setup a MemoryRegion for > > downstream > >> PIO that devices can use to register on. > >> > >> We probably need to do something like change the PCI VGA devices to > >> export a MemoryRegion and allow the PCI controller to device how to > >> register that as a subregion. > > > > The VGA device should just register fixed address port IOs the same way > > it would register an IO BAR. Essentially, hard coded IO addresses (or > > memory, VGA does memory too, don't forget that) are equivalent to having > > an invisible BAR with a fixed value in it. > > Ack. > > > > > There should be no "global port IO" because that concept is broken on > > real multi-domain setups. Those "legacy" address ranges are just > > hard-wired sub regions of the normal PCI space on which the device sits > > on (unless you start doing real non-PCI ISA x86). > > So, I think what you're suggesting (and I agree with), is that each PCI > device should export one or more MemoryRegions and indicate what the > MemoryRegions are for. > > Potential options are: > > - MMIO BAR > - PIO BAR > - IDE hard decode > - VGA hard decode > - subtractive decode > > I'm very much in agreement if that's what you're suggesting. > > Regards, > > Anthony Liguori > > > > > Cheers, > > Ben. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html