Re: [PATCH 1/3] virtio_ring: Remove sg_next indirection

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Mon, 1 Sep 2014 09:59:53 +0300

On Sun, Aug 31, 2014 at 06:42:38PM -0700, Andy Lutomirski wrote:
> On Sun, Aug 31, 2014 at 5:58 PM, Rusty Russell <rusty@xxxxxxxxxxxxxxx> wrote:
> > Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
> >> The only unusual thing about virtio's use of scatterlists is that
> >> two of the APIs accept scatterlists that might not be terminated.
> >> Using function pointers to handle this case is overkill; for_each_sg
> >> can do it.
> >>
> >> There's a small subtlely here: for_each_sg assumes that the provided
> >> count is correct, but, because of the way that virtio_ring handles
> >> multiple scatterlists at once, the count is only an upper bound if
> >> there is more than one scatterlist.  This is easily solved by
> >> checking the sg pointer for NULL on each iteration.
> >
> > Hi Andy,
> >
> >         (Sorry for the delayed response; I was still catching up from
> > my week at KS.)
> 
> No problem.  In the grand scheme of maintainer response time, I think
> you're near the top. :)
> 
> >
> > Unfortunately the reason we dance through so many hoops here is that
> > it has a measurable performance impact :(  Those indirect calls get inlined.
> 
> gcc inlines that?  That must nearly double the size of the object file. :-/
> 
> >
> > There's only one place which actually uses a weirdly-formed sg now,
> > and that's virtio_net.  It's pretty trivial to fix.

This path in virtio net is also unused on modern hypervisors, so we probably
don't care how well does it perform: why not apply it anyway?
It's the virtio_ring changes that we need to worry about.

> > However, vring_bench drops 15% when we do this.  There's a larger
> > question as to how much difference that makes in Real Life, of course.
> > I'll measure that today.
> 
> Weird.  sg_next shouldn't be nearly that slow.  Weird.

I think that's down to the fact that it's out of line,
so it prevents inlining of the caller.

> >
> > Here are my two patches, back-to-back (it cam out of of an earlier
> > concern about reducing stack usage, hence the stack measurements).
> >
> 
> I like your version better than mine, except that I suspect that your
> version will blow up for the same reason that my v2 patches blow up:
> you probably need the skb_to_sgvec_nomark fix, too.
> 
> IOW, what happens if you apply patches 1-4 from my v3 series and then
> apply your patches on top of that?
> 
> There'll be a hit on some virtio_pci machines due to use of the DMA
> API.  I would argue that, if this is measurable, the right fix is to
> prod the DMA API maintainers, whoever they are, to fix it.  The DMA
> API really out to be very fast on identity-mapped devices, but I don't
> know whether it is in practice.
> 
> --Andy

Right but we'd have to fix that before applying the patches
to avoid performance regressions.

-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe linux-s390" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html