Re: [Qemu-devel] KVM call minutes 2013-01-29 - Port I/O

Benjamin Herrenschmidt <benh@xxxxxxxxxxxxxxxxxxx> · Thu, 31 Jan 2013 08:05:10 +1100

On Wed, 2013-01-30 at 07:59 -0600, Anthony Liguori wrote:
> An x86 CPU has a MMIO capability that's essentially 65 bits.  Whether
> the top bit is set determines whether it's a "PIO" transaction or an
> "MMIO" transaction.  A large chunk of that address space is invalid of
> course.
> 
> PCI has a 65 bit address space too.  The 65th bit determines whether
> it's an IO transaction or an MMIO transaction.

This is somewhat an over simplification since IO and MMIO differs in
other ways, such as ordering rules :-) But for the sake of memory
regions decoding I suppose it will do.

> For architectures that only have a 64-bit address space, what the PCI
> controller typically does is pick a 16-bit window within that address
> space to map to a PCI address with the 65th bit set.

Sort-of yes. The window doesn't have to be 16-bit (we commonly have
larger IO space windows on powerpc) and there's a window per host
bridge, so there's effectively more than one IO space (as there is more
than one PCI MMIO space, with only a window off the CPU space routed to
each brigde).

Making a hard wired assumption that the PCI (MMIO and IO) space relates
directly to the CPU bus space is wrong on pretty much all !x86
architectures.

 .../...

You make it sound like substractive decode is a chipset hack. It's not,
it's specified in the PCI spec.

1) A chipset will route any non-positively decoded IO transaction (65th
>    bit set) to a single end point (usually the ISA-bridge).  Which one it
>    chooses is up to the chipset.  This is called subtractive decoding
>    because the PCI bus will wait multiple cycles for that device to
>    claim the transaction before bouncing it.

This is not a chipset matter. It's the ISA bridge itself that does
substractive decoding. There also exists P2P bridges doing such substractive
decoding, this used to be fairly common with transparent bridges used for
laptop docking.

> 2) There are special hacks in most PCI chipsets to route very specific
>    addresses ranges to certain devices.  Namely, legacy VGA IO transactions
>    go to the first VGA device.  Legacy IDE IO transactions go to the first
>    IDE device.  This doesn't need to be programmed in the BARs.  It will
>    just happen.

This is also mostly not a hack in the chipset. It's a well defined behaviour
for legacy devices, sometimes call hard decoding. Of course often those devices
are built into the chipset but they don't have to. Plug-in VGA devices will
hard decode legacy VGA regions for both IO and MMIO by default (this can be
disabled on most of them nowadays) for example. This has nothing to do with
the chipset.

There's a specific bit in P2P bridge to control the forwarding of legacy
transaction downstream (and VGA palette snoops), this is also fully specified
in the PCI spec.

> 3) As it turns out, all legacy PIIX3 devices are positively decoded and
>    sent to the ISA-bridge (because it's faster this way).

Chipsets don't "send to a bridge". It's the bridge itself that decodes.

> Notice the lack of the word "ISA" in all of this other than describing
> the PCI class of an end point.

ISA is only relevant to the extent that the "legacy" regions of IO space
originate from the original ISA addresses of devices (VGA, IDE, etc...)
and to the extent that an ISA bus might still be present which will get
the transactions that nothing else have decoded in that space.

> So how should this be modeled?
> 
> On x86, the CPU has a pio address space.  That can propagate down
> through the PCI bus which is what we do today.
> 
> On !x86, the PCI controller ought to setup a MemoryRegion for
downstream
> PIO that devices can use to register on.
> 
> We probably need to do something like change the PCI VGA devices to
> export a MemoryRegion and allow the PCI controller to device how to
> register that as a subregion.

The VGA device should just register fixed address port IOs the same way
it would register an IO BAR. Essentially, hard coded IO addresses (or
memory, VGA does memory too, don't forget that) are equivalent to having
an invisible BAR with a fixed value in it.

There should be no "global port IO" because that concept is broken on
real multi-domain setups. Those "legacy" address ranges are just
hard-wired sub regions of the normal PCI space on which the device sits
on (unless you start doing real non-PCI ISA x86).

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html