On Sep 1, 2014 12:00 AM, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote: > > On Sun, Aug 31, 2014 at 06:42:38PM -0700, Andy Lutomirski wrote: > > On Sun, Aug 31, 2014 at 5:58 PM, Rusty Russell <rusty@xxxxxxxxxxxxxxx> wrote: > > > Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: > > >> The only unusual thing about virtio's use of scatterlists is that > > >> two of the APIs accept scatterlists that might not be terminated. > > >> Using function pointers to handle this case is overkill; for_each_sg > > >> can do it. > > >> > > >> There's a small subtlely here: for_each_sg assumes that the provided > > >> count is correct, but, because of the way that virtio_ring handles > > >> multiple scatterlists at once, the count is only an upper bound if > > >> there is more than one scatterlist. This is easily solved by > > >> checking the sg pointer for NULL on each iteration. > > > > > > Hi Andy, > > > > > > (Sorry for the delayed response; I was still catching up from > > > my week at KS.) > > > > No problem. In the grand scheme of maintainer response time, I think > > you're near the top. :) > > > > > > > > Unfortunately the reason we dance through so many hoops here is that > > > it has a measurable performance impact :( Those indirect calls get inlined. > > > > gcc inlines that? That must nearly double the size of the object file. :-/ > > > > > > > > There's only one place which actually uses a weirdly-formed sg now, > > > and that's virtio_net. It's pretty trivial to fix. > > This path in virtio net is also unused on modern hypervisors, so we probably > don't care how well does it perform: why not apply it anyway? > It's the virtio_ring changes that we need to worry about. > > > > However, vring_bench drops 15% when we do this. There's a larger > > > question as to how much difference that makes in Real Life, of course. > > > I'll measure that today. > > > > Weird. sg_next shouldn't be nearly that slow. Weird. > > I think that's down to the fact that it's out of line, > so it prevents inlining of the caller. > > > > > > > Here are my two patches, back-to-back (it cam out of of an earlier > > > concern about reducing stack usage, hence the stack measurements). > > > > > > > I like your version better than mine, except that I suspect that your > > version will blow up for the same reason that my v2 patches blow up: > > you probably need the skb_to_sgvec_nomark fix, too. > > > > IOW, what happens if you apply patches 1-4 from my v3 series and then > > apply your patches on top of that? > > > > There'll be a hit on some virtio_pci machines due to use of the DMA > > API. I would argue that, if this is measurable, the right fix is to > > prod the DMA API maintainers, whoever they are, to fix it. The DMA > > API really out to be very fast on identity-mapped devices, but I don't > > know whether it is in practice. > > > > --Andy > > Right but we'd have to fix that before applying the patches > to avoid performance regressions. My pathetic attempt to benchmark it came up with no difference. What's the right benchmark? I'm having trouble compiling anything in tools/virtio -- my patch breaks that code, but it seems like it's already broken. The next version of my patches will at least not make the problem worse. I think that we can eliminate any possibility of a meaningful performance hit due to the streaming part of the mappings by checking PCI_DMA_BUS_IS_PHYS. I'll test and send out a version of the patches that does that. For what it's worth, there's another possible performance hit: on some architectures, coherent memory could be uncached. This might not matter for anyone, though. I think that x86 uses normal memory for coherent mappings. ARM is unlikely to make serious use of virtio_pci, since AFAIK virtio_mmio is strongly preferred on ARM. s390 should be unaffected for much the same reason. Are there other architectures on which this could be a problem? It's possible that the weak barrier flag could be used to disable the use of coherent memory, but it might require some DMA API trickery. In particular, the only API I see is DMA_BIDIRECTIONAL, and we'll end up with ugly code in virtio_pci to get it working. Ultimately, I think that hypervisors need a way to tell their guests that certain PCI devices are emulated and therefore fully coherent regardless our normal architectural considerations. --Andy > > -- > MST _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization