On Mon, 2011-07-25 at 15:37 -0400, Don Dutile wrote: > On 07/24/2011 06:58 AM, Michael S. Tsirkin wrote: > > On Sun, Jul 24, 2011 at 11:41:10AM +0300, Michael S. Tsirkin wrote: > >> On Sun, Jul 24, 2011 at 11:12:44AM +0300, Michael S. Tsirkin wrote: > >>> On Fri, Jul 22, 2011 at 02:35:47PM -0700, Chris Wright wrote: > >>>> * Alex Williamson (alex.williamson@xxxxxxxxxx) wrote: > >>>>> On Fri, 2011-07-22 at 14:24 -0700, Chris Wright wrote: > >>>>>> * Donald Dutile (ddutile@xxxxxxxxxx) wrote: > >>>>>>> diff --git a/hw/device-assignment.c b/hw/device-assignment.c > >>>>>>> index 36ad6b0..34db52e 100644 > >>>>>>> --- a/hw/device-assignment.c > >>>>>>> +++ b/hw/device-assignment.c > >>>>>>> @@ -1419,16 +1419,18 @@ static int assigned_device_pci_cap_init(PCIDevice *pci_dev) > >>>>>>> } > >>>>>>> > >>>>>>> if ((pos = pci_find_cap_offset(pci_dev, PCI_CAP_ID_EXP, 0))) { > >>>>>>> - uint8_t version; > >>>>>>> + uint8_t version, size; > >>>>>>> uint16_t type, devctl, lnkcap, lnksta; > >>>>>>> uint32_t devcap; > >>>>>>> - int size = 0x3c; /* version 2 size */ > >>>>>>> > >>>>>>> version = pci_get_byte(pci_dev->config + pos + PCI_EXP_FLAGS); > >>>>>>> version&= PCI_EXP_FLAGS_VERS; > >>>>>>> if (version == 1) { > >>>>>>> size = 0x14; > >>>>>>> - } else if (version> 2) { > >>>>>>> + } else if (version == 2) { > >>>>>>> + /* don't include slot cap/stat/ctrl 2 regs; only support endpoints */ > >>>>>>> + size = 0x34; > >>>>>> > >>>>>> That doesn't look correct to me. The size is fixed, just that some > >>>>>> registers are Reserved Zero when they do not apply (e.g. endpoint only). > >>>>> > >>>>> Apparently it can be interpreted differently. In this case, we've seen > >>>>> a tg3 device expose a v2 PCI express capability at offset 0xcc. Using > >>>>> 0x3c bytes, we extend 8 bytes past the legacy config space area :( > >>>> > >>>> Wow, that device sounds broken to me. The spec is pretty clear. > >>> > >>> Yes, I agree it's broken. Looks like something that > >>> happens when a device is designed in parallel with the spec. > >>> > >>> What bothers me is this patch seems to make devices that do behave > >>> correctly out of spec (registers will be writeable by default) - > >>> correct? > >>> > >>> How about we check for overflow and only do the hacks > >>> if it happens? > >>> > >>> Also, the code to initialize slot and root control registers is still > >>> there: it would seem that running it will corrupt memmory beyond the > >>> config array? > >> > >> I take this last bit back: registers we touch are at offset< 0x34. > >> Sorry about the noise. But the question about read-only registers > >> still stands. > > > > Also, where does the magic 0x34 come from? I'm guessing this is > > simply what's left till the end of the config space. > > So let's be conservative specific as possible with > > this hack: > > > > I believe the spec leaves room for interpretation, and thus the > resulting 'broken' device. As I read the spec, the size of the struct can be: > > -- 0x2c for all devices, min., that are cap version 2 or higher. > -- 0x34 for devices with links, i.e., not a root-port-based device, > , a device not integrated into the root port > , or if it is, it uses a serial link anyhow > (doesn't strip out 8b/10b serial link btwn device > & internal root port) > -- 0x3c for devices with slots, i.e., a bridge with downstream slots, > i.e., not an endpoint, i.e., a PCI(e) bridge. > > Thus, 0x34 was chosen, since we don't support device assigning PCI bridges, > (not until MRIOV shows up, at least), and 0x34 fits the bug at hand, and > device cap/stat/control may be used/modified. > > So, a 'hack' is not needed. In fact, the 0x34 size is a bit of a hack, > since the case to use 0x2c could be ascertained by checking if the device > is a root port device, _and_ it's not using a serial link, but a perusal of > root port devices on a number of systems I looked at always had this structure > greater than 0x2c, so I figured the simple heuristic of 0x34 was sufficient. > > > /* A version 2 device was observed to only have a partial > > * implementation for the capability structure. Apparently, it doesn't > > * implement the registers from slot capability 2 and on (offset 0x34), > > * with the capability at offset 0xCC = 256 - 0x34. This is out of spec, > > * but let's try to support this. */ > > if (version == 2&& pos == 0xCC) { > > size = 0x34; > > } The more I look at it, the more I think that maybe this is an especially broken device and we shouldn't change the default for it. BTW, the programming reference for this device is here: http://www.broadcom.com/collateral/pg/5761-PG100-R.pdf They've burned up most of the capability area for vendor specific registers, so there's not enough room for the full pci-e capability structure. I'd be fine with adding a test specifically for this. As you suggested on IRC, print some warning and cut pci-e back to 0x34 if it extends past the legacy config space, and reject the device if it's still too small. Leave the default 0x3c since this is the only device we've found with this problem. Thanks, Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html