Re: [RFC 7/11] virtio_pci: new, capability-aware driver.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 12, 2012 at 12:31:09PM +1030, Rusty Russell wrote:
> On Wed, 11 Jan 2012 12:21:30 +0200, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > On Wed, Jan 11, 2012 at 10:55:52AM +1030, Rusty Russell wrote:
> > > On Tue, 10 Jan 2012 19:03:36 +0200, "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote:
> > > > On Wed, Dec 21, 2011 at 11:03:25AM +1030, Rusty Russell wrote:
> > > > > Yes.  The idea that we can alter fields in the device-specific config
> > > > > area is flawed.  There may be cases where it doesn't matter, but as an
> > > > > idea it was holed to begin with.
> > > > > 
> > > > > We can reduce probability by doing a double read to check, but there are
> > > > > still cases where it will fail.
> > > > 
> > > > Okay - want me to propose an interface for that?
> > > 
> > > Had a brief chat with BenH (CC'd).
> > > 
> > > I think we should deprecate writing to the config space.  Only balloon
> > > does it AFAICT, and I can't quite figure out *why* it has an 'active'
> > > field.
> > 
> > Are you sure? I think net writes a mac address.
> 
> True.  We'll need to disable that, and come up with another mechanism if
> we want it back (a new feature and a VIRTIO_NET_HDR_F_SET_MAC flag in
> the virtio_net header?  Or would that mess up vhost_net?).

vhost net is only datapath, so no problem. copying in host is tricky for
vhost_net.

> > > This solves half the problem, of sync guest writes.  For the
> > > other half, I suggest a generation counter; odd means inconsistent.  The
> > > guest can poll.
> > 
> > So we get the counter until it's even, get the config, if it's changed
> > repeat? Yes it works. However, I would like to have a way to detect
> > config change just by looking at memory. ATM we need to read ISR to
> > know.  If we used a VQ, the advantage would be that the device can work
> > with a single MSIX vector shared by all VQs.
> 
> If we use a 32-bit counter, we also get this though, right?
> 
> If counter has changed, it's a config interrupt...

But we need an exit to read the counter. We can put the counter
in memory but this looks suspiciously like a simplified VQ -
so why not use a VQ then?

> > If we do require config VQ anyway, why not use it to notify
> > guest of config changes? Guest could pre-post an in buffer
> > and host uses that.
> 
> We could, but it's an additional burden on each device.  vqs are cheap,
> but not free.  And the config area is so damn convenient...

Not if you start playing with counters, checking it twice,
reinvent all kind of barriers ...

> > > BenH also convinced me we should finally make the config space LE if
> > > we're going to change things.  Since PCI is the most common transport,
> > > guest-endian confuses people.  And it sucks for really weird machines.
> > 
> > Are we going to keep guest endian for e.g. virtio net header?
> > If yes the benefit of switching config space is not that big.
> > And changes in devices would affect non-PCI transports.
> 
> Yep.  It would only make sense if we do it for everything.  And yes,
> it'll mess up everyone who is BE, so it needs to be a feature bit for
> them.
> 
> > > We should also change the ring (to a single ring, I think).  Descriptors
> > > to 24 bytes long (8 byte cookie, 8 byte addr, 4 byte len, 4 byte flags).
> > > We might be able to squeeze it into 20 bytes but that means packing.  We
> > > should support inline, chained or indirect.  Let the other side ack by
> > > setting flag, cookie and len (if written).
> > 
> > Quite possibly all or some of these things help performance
> > but do we have to change the spec before we have experimental
> > proof?
> 
> We change the spec last, once we know what we're doing, ideally.
> 
> > I did experiment with a single ring using tools/virtio and
> > I didn't see a measureable performance gain.
> 
> Interesting.  It is simpler and more standard than our current design,
> but that's not sufficient unless there are other reasons.  Needs further
> discussion and testing.
> 
> > Two rings do have the advantage of not requiring host side copy, which
> > copy would surely add to cache pressure.
> 
> Well, a simple host could process in-order and leave stuff in the ring I
> guess.  A smarter host would copy and queue, maybe leave one queue entry
> in so it doesn't get flooded?
> 
> >  Since
> > host doesn't change desriptors, we could also
> > preformat some descriptors in the current design.
> >
> > There is a fragmentation problem in theory but it can be alleviated with
> > a smart allocator.
> 
> Yeah, the complexity scares me...
> 
> > About inline - it can only help very small buffers.
> > Which workloads do you have in mind exactly?
> 
> It was suggested by others, but I think TCP Acks are the classic one.
> 12 + 14 + 20 + 40 = 86 bytes with virtio_net_hdr_mrg_rxbuf at the front.

That's only the simplest IPv4, right?
Anyway, this spans multiple descriptors so this complicates allocation
significantly.

> > BTW this seems to be the reverse from what you have in Mar 2001,
> > see 87mxkjls61.fsf@xxxxxxxxxxxxxxx :)
> 
> (s/2001/2011).  Indeed.  Noone shared my optimism that having an open
> process for a virtio2 would bring more players on board (my original
> motivation).  But technical requirements are mounting up, which means
> we're going to get there anyway.
> 
> > I am much less concerned with what we do for configuration,
> > but I do not believe we have learned all performance lessons
> > from virtio ring1. Is there any reason why we shouldn't be
> > able to experiment with inline within virtio1 and see
> > whether that gets us anything?
> 
> Inline in the used ring is possible, but those descriptors 8 bytes, vs
> 24/32.

Hmm, 86 > 32 anyway, right?

> > If we do a bunch of changes to the ring at once, we can't
> > figure out what's right, what's wrong, or back out of
> > mistakes later.
> > 
> > Since there are non PCI transports that use the ring,
> > we really shouldn't make both the configuration and
> > the ring changes depend on the same feature bit.
> 
> Yes, I'm thinking #define VIRTIO_F_VIRTIO2 (-1).  For PCI, this gets
> mapped into a "are we using the new config layout?".  For others, it
> gets mapped into a transport-specific feature.
> 
> (I'm sure you get it, but for the others) This is because I want to be
> draw a clear line between all the legacy stuff at the same time, not
> have to support part of it later because someone might not flip the
> feature bit.
> 
> Cheers,
> Rusty.

So my point is, config stuff and ring are completely separate,
they are different layers.

-- 
MST
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux