On Wed, Nov 19, 2014 at 10:23:50AM -0700, Bjorn Helgaas wrote: >On Wed, Nov 19, 2014 at 05:27:40PM +0800, Wei Yang wrote: >> On Tue, Nov 18, 2014 at 09:26:01PM -0700, Bjorn Helgaas wrote: >> >On Wed, Nov 19, 2014 at 11:21:00AM +0800, Wei Yang wrote: >> >> On Wed, Nov 19, 2014 at 01:15:32PM +1100, Benjamin Herrenschmidt wrote: >> >> >On Tue, 2014-11-18 at 18:12 -0700, Bjorn Helgaas wrote: > >> >> But the HW >> >> must map 256 segments with the same size. This will lead a situation like >> >> this. >> >> >> >> +------+------+ +------+------+------+------+ >> >> |VF#0 |VF#1 | ... | |VF#N-1|PF#A |PF#B | >> >> +------+------+ +------+------+------+------+ >> >> >> >> Suppose N = 254 and the HW map these 256 segments to their corresponding PE#. >> > >> >I guess these 256 segments are regions of CPU physical address space, and >> >they are being mapped to bus address space? Is there some relationship >> >between a PE and part of the bus address space? >> > >> >> PE is an entity for EEH, which may include a whole bus or one pci device. > >Yes, I've read that many times. What's missing is the connection between a >PE and the things in the PCI specs (buses, devices, functions, MMIO address >space, DMA, MSI, etc.) Presumably the PE structure imposes constraints on >how the core uses the standard PCI elements, but we don't really have a >clear description of those constraints yet. > >> When some device got some error, we need to identify which PE it belongs to. >> So we have some HW to map between PE# and MMIO/DMA/MSI address. >> >> The HW mentioned in previous letter is the one to map MMIO address to a PE#. >> While this HW must map a range with 256 equal segments. And yes, this is >> mapped to bus address space. >> ... > >> >> The difference after our expanding is the IOV BAR size is 256*4KB instead of >> >> 16KB. So it will look like this: >> >> >> >> PF pci_dev->resource[7] = [mem 0x00000000-0x000fffff] (1024KB) >> > >> >Is the idea that you want this resource to be big enough to cover all 256 >> >segments? I think I'm OK with increasing the size of the PF resources to >> >prevent overlap. That part shouldn't be too ugly. >> > >> >> Yes, big enough to cover all 256 segments. >> >> Sorry for making it ugly :-( > >I didn't mean that what you did was ugly. I meant that increasing the size >of the PF resource can be done cleanly. > >By the way, when you do this, it would be nice if the dmesg showed the >standard PF IOV BAR sizing, and then a separate line showing the resource >expansion to deal with the PE constraints. I don't think even the standard >output is very clear -- I think we currently get something like this: > > pci 0000:00:00.0 reg 0x174: [mem 0x00000000-0x00000fff] > >But that is only the size of a single VF BAR aperture. Then sriov_init() >multiplies that by the number of possible VFs, but I don't think we print >the overall size of that PF resource. I think we should, because it's >misleading to print only the smaller piece. Maybe something like this: > > pci 0000:00:00.0 VF BAR0: [mem 0x00000000-0x00003fff] (for 4 VFs) > >And then you could do something like: > > pci 0000:00:00.0 VF BAR0: [mem 0x00000000-0x000fffff] (expanded for PE alignment) > Got it, will add message to reflect it. >> >> VF1 pci_dev->resource[0] = [mem 0x00000000-0x00000fff] >> >> VF2 pci_dev->resource[0] = [mem 0x00001000-0x00001fff] >> >> VF3 pci_dev->resource[0] = [mem 0x00002000-0x00002fff] >> >> VF4 pci_dev->resource[0] = [mem 0x00003000-0x00003fff] >> >> ... >> >> and 252 4KB space leave not used. >> >> >> >> So the start address and the size of VF will not change, but the PF's IOV BAR >> >> will be expanded. >> > >> >I'm really dubious about this change to use pci_iov_resource_size(). I >> >think you might be doing that because if you increase the PF resource size, >> >dividing that increased size by total_VFs will give you garbage. E.g., in >> >the example above, you would compute "size = 1024KB / 4", which would make >> >the VF BARs appear to be 256KB instead of 4KB as they should be. >> >> Yes, your understanding is correct. >> >> >I think it would be better to solve that problem by decoupling the PF >> >resource size and the VF BAR size. For example, we could keep track of the >> >VF BAR size explicitly in struct pci_sriov, instead of computing it from >> >the PF resource size and total_VFs. This would keep the VF BAR size >> >completely platform-independent. >> >> Hmm... this is another solution. >> >> If you prefer this one, I will make a change accordingly. > >Yes, I definitely prefer to track the VF BAR size explicitly. I think that >will make the code much clearer. Got it. > >Bjorn -- Richard Yang Help you, Help me -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html