Re: Elvis upstreaming plan

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 27, 2013 at 01:02:37PM +0200, Abel Gordon wrote:
> 
> 
> "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote on 27/11/2013 12:59:38 PM:
> 
> 
> > On Wed, Nov 27, 2013 at 12:41:31PM +0200, Abel Gordon wrote:
> > >
> > >
> > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote on 27/11/2013 12:27:19 PM:
> > >
> > > >
> > > > On Wed, Nov 27, 2013 at 09:43:33AM +0200, Joel Nider wrote:
> > > > > Hi,
> > > > >
> > > > > Razya is out for a few days, so I will try to answer the questions
> as
> > > well
> > > > > as I can:
> > > > >
> > > > > "Michael S. Tsirkin" <mst@xxxxxxxxxx> wrote on 26/11/2013 11:11:57
> PM:
> > > > >
> > > > > > From: "Michael S. Tsirkin" <mst@xxxxxxxxxx>
> > > > > > To: Abel Gordon/Haifa/IBM@IBMIL,
> > > > > > Cc: Anthony Liguori <anthony@xxxxxxxxxxxxx>,
> abel.gordon@xxxxxxxxx,
> > > > > > asias@xxxxxxxxxx, digitaleric@xxxxxxxxxx, Eran Raichstein/Haifa/
> > > > > > IBM@IBMIL, gleb@xxxxxxxxxx, jasowang@xxxxxxxxxx, Joel
> Nider/Haifa/
> > > > > > IBM@IBMIL, kvm@xxxxxxxxxxxxxxx, pbonzini@xxxxxxxxxx, Razya
> Ladelsky/
> > > > > > Haifa/IBM@IBMIL
> > > > > > Date: 27/11/2013 01:08 AM
> > > > > > Subject: Re: Elvis upstreaming plan
> > > > > >
> > > > > > On Tue, Nov 26, 2013 at 08:53:47PM +0200, Abel Gordon wrote:
> > > > > > >
> > > > > > >
> > > > > > > Anthony Liguori <anthony@xxxxxxxxxxxxx> wrote on 26/11/2013
> > > 08:05:00
> > > > > PM:
> > > > > > >
> > > > > > > >
> > > > > > > > Razya Ladelsky <RAZYA@xxxxxxxxxx> writes:
> > > > > > > >
> > > > > <edit>
> > > > > > >
> > > > > > > That's why we are proposing to implement a mechanism that will
> > > enable
> > > > > > > the management stack to configure 1 thread per I/O device (as
> it is
> > > > > today)
> > > > > > > or 1 thread for many I/O devices (belonging to the same VM).
> > > > > > >
> > > > > > > > Once you are scheduling multiple guests in a single vhost
> device,
> > > you
> > > > > > > > now create a whole new class of DoS attacks in the best case
> > > > > scenario.
> > > > > > >
> > > > > > > Again, we are NOT proposing to schedule multiple guests in a
> single
> > > > > > > vhost thread. We are proposing to schedule multiple devices
> > > belonging
> > > > > > > to the same guest in a single (or multiple) vhost thread/s.
> > > > > > >
> > > > > >
> > > > > > I guess a question then becomes why have multiple devices?
> > > > >
> > > > > If you mean "why serve multiple devices from a single thread" the
> > > answer is
> > > > > that we cannot rely on the Linux scheduler which has no knowledge
> of
> > > I/O
> > > > > queues to do a decent job of scheduling I/O.  The idea is to take
> over
> > > the
> > > > > I/O scheduling responsibilities from the kernel's thread scheduler
> with
> > > a
> > > > > more efficient I/O scheduler inside each vhost thread.  So by
> combining
> > > all
> > > > > of the I/O devices from the same guest (disks, network cards, etc)
> in a
> > > > > single I/O thread, it allows us to provide better scheduling by
> giving
> > > us
> > > > > more knowledge of the nature of the work.  So now instead of
> relying on
> > > the
> > > > > linux scheduler to perform context switches between multiple vhost
> > > threads,
> > > > > we have a single thread context in which we can do the I/O
> scheduling
> > > more
> > > > > efficiently.  We can closely monitor the performance needs of each
> > > queue of
> > > > > each device inside the vhost thread which gives us much more
> > > information
> > > > > than relying on the kernel's thread scheduler.
> > > > > This does not expose any additional opportunities for attacks (DoS
> or
> > > > > other) than are already available since all of the I/O traffic
> belongs
> > > to a
> > > > > single guest.
> > > > > You can make the argument that with low I/O loads this mechanism
> may
> > > not
> > > > > make much difference.  However when you try to maximize the
> utilization
> > > of
> > > > > your hardware (such as in a commercial scenario) this technique can
> > > gain
> > > > > you a large benefit.
> > > > >
> > > > > Regards,
> > > > >
> > > > > Joel Nider
> > > > > Virtualization Research
> > > > > IBM Research and Development
> > > > > Haifa Research Lab
> > > >
> > > > So all this would sound more convincing if we had sharing between VMs.
> > > > When it's only a single VM it's somehow less convincing, isn't it?
> > > > Of course if we would bypass a scheduler like this it becomes harder to
> > > > enforce cgroup limits.
> > >
> > > True, but here the issue becomes isolation/cgroups. We can start to show
> > > the value for VMs that have multiple devices / queues and then we could
> > > re-consider extending the mechanism for multiple VMs (at least as a
> > > experimental feature).
> >
> > Sorry, If it's unsafe we can't merge it even if it's experimental.
> >
> > > > But it might be easier to give scheduler the info it needs to do what we
> > > > need.  Would an API that basically says "run this kthread right now"
> > > > do the trick?
> > >
> > > ...do you really believe it would be possible to push this kind of change
> > > to the Linux scheduler ? In addition, we need more than
> > > "run this kthread right now" because you need to monitor the virtio
> > > ring activity to specify "when" you will like to run a "specific kthread"
> > > and for "how long".
> >
> > How long is easy - just call schedule. When sounds like specifying a
> > deadline which sounds like a reasonable fit to how scheduler works now.
> 
> ... but "when" you should call schedule actually depends on the I/O
> activity of the queues. The patches we shared constantly monitor the
> virtio rings (pending items and for how long they are pending there)
> to decide if we should continue processing the same queue or switch to
> other queue.

Confused. I thought you want to give up CPU to other tasks like VCPUs
and run vhost at some later time.

If it's just between vhost threads, why isn't "run this right now" what
we want?
We just process one queue as long as we want to stay there, when we want to
switch to another one do "run that other thread right now".


> > Certainly adding an in-kernel API sounds like a better approach than
> > a bunch of user-visible ones.
> > So I'm not at all saying we need to change the scheduler - it's more
> > adding APIs to existing functionality.
> 
> Yep, but this may be also difficult to push...

If it's a reasonable thing and actually helps customers
it's not difficult to push, in my experience.




> >
> > > >
> > > > >
> > >
> > > > >
> > >
> > > > >
> > >
> > > > >  Phone: 972-4-829-6326 | Mobile: 972-54-3155635          (Embedded
> > > > image moved to file:
> > > > >  E-mail: JOELN@xxxxxxxxxx
> > > > pic39571.gif)IBM
> > > > >
> > >
> > > > >
> > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > > > > > Hi all,
> > > > > > > > >
> > > > > > > > > I am Razya Ladelsky, I work at IBM Haifa virtualization
> team,
> > > which
> > > > > > > > > developed Elvis, presented by Abel Gordon at the last KVM
> > > forum:
> > > > > > > > > ELVIS video:  https://www.youtube.com/watch?v=9EyweibHfEs
> > > > > > > > > ELVIS slides:
> > > > > > > https://drive.google.com/file/d/0BzyAwvVlQckeQmpnOHM5SnB5UVE
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > According to the discussions that took place at the forum,
> > > > > upstreaming
> > > > > > > > > some of the Elvis approaches seems to be a good idea, which
> we
> > > > > would
> > > > > > > like
> > > > > > > > > to pursue.
> > > > > > > > >
> > > > > > > > > Our plan for the first patches is the following:
> > > > > > > > >
> > > > > > > > > 1.Shared vhost thread between mutiple devices
> > > > > > > > > This patch creates a worker thread and worker queue shared
> > > across
> > > > > > > multiple
> > > > > > > > > virtio devices
> > > > > > > > > We would like to modify the patch posted in
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > 3dc6a3ce7bcbe87363c2df8a6b6fee0c14615766
> > > > > > > > > to limit a vhost thread to serve multiple devices only if
> they
> > > > > belong
> > > > > > > to
> > > > > > > > > the same VM as Paolo suggested to avoid isolation or
> cgroups
> > > > > concerns.
> > > > > > > > >
> > > > > > > > > Another modification is related to the creation and removal
> of
> > > > > vhost
> > > > > > > > > threads, which will be discussed next.
> > > > > > > >
> > > > > > > > I think this is an exceptionally bad idea.
> > > > > > > >
> > > > > > > > We shouldn't throw away isolation without exhausting every
> other
> > > > > > > > possibility.
> > > > > > >
> > > > > > > Seems you have missed the important details here.
> > > > > > > Anthony, we are aware you are concerned about isolation
> > > > > > > and you believe we should not share a single vhost thread
> across
> > > > > > > multiple VMs.  That's why Razya proposed to change the patch
> > > > > > > so we will serve multiple virtio devices using a single vhost
> > > thread
> > > > > > > "only if the devices belong to the same VM". This series of
> patches
> > > > > > > will not allow two different VMs to share the same vhost
> thread.
> > > > > > > So, I don't see why this will be throwing away isolation and
> why
> > > > > > > this could be a "exceptionally bad idea".
> > > > > > >
> > > > > > > By the way, I remember that during the KVM forum a similar
> > > > > > > approach of having a single data plane thread for many devices
> > > > > > > was discussed....
> > > > > > > > We've seen very positive results from adding threads.  We
> should
> > > also
> > > > > > > > look at scheduling.
> > > > > > >
> > > > > > > ...and we have also seen exceptionally negative results from
> > > > > > > adding threads, both for vhost and data-plane. If you have lot
> of
> > > idle
> > > > > > > time/cores
> > > > > > > then it makes sense to run multiple threads. But IMHO in many
> > > scenarios
> > > > > you
> > > > > > > don't have lot of idle time/cores.. and if you have them you
> would
> > > > > probably
> > > > > > > prefer to run more VMs/VCPUs....hosting a single SMP VM when
> you
> > > have
> > > > > > > enough physical cores to run all the VCPU threads and the I/O
> > > threads
> > > > > is
> > > > > > > not a
> > > > > > > realistic scenario.
> > > > >
> > > > > >
> > > > > > > >
> > > > > > > > > 2. Sysfs mechanism to add and remove vhost threads
> > > > > > > > > This patch allows us to add and remove vhost threads
> > > dynamically.
> > > > > > > > >
> > > > > > > > > A simpler way to control the creation of vhost threads is
> > > > > statically
> > > > > > > > > determining the maximum number of virtio devices per worker
> via
> > > a
> > > > > > > kernel
> > > > > > > > > module parameter (which is the way the previously mentioned
> > > patch
> > > > > is
> > > > > > > > > currently implemented)
> > > > > > > > >
> > > > > > > > > I'd like to ask for advice here about the more preferable
> way
> > > to
> > > > > go:
> > > > > > > > > Although having the sysfs mechanism provides more
> flexibility,
> > > it
> > > > > may
> > > > > > > be a
> > > > > > > > > good idea to start with a simple static parameter, and have
> the
> > > > > first
> > > > > > > > > patches as simple as possible. What do you think?
> > > > > > > > >
> > > > > > > > > 3.Add virtqueue polling mode to vhost
> > > > > > > > > Have the vhost thread poll the virtqueues with high I/O
> rate
> > > for
> > > > > new
> > > > > > > > > buffers , and avoid asking the guest to kick us.
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > 26616133fafb7855cc80fac070b0572fd1aaf5d0
> > > > > > > >
> > > > > > > > Ack on this.
> > > > > > >
> > > > > > > :)
> > > > > > >
> > > > > > > Regards,
> > > > > > > Abel.
> > > > > > >
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Anthony Liguori
> > > > > > > >
> > > > > > > > > 4. vhost statistics
> > > > > > > > > This patch introduces a set of statistics to monitor
> different
> > > > > > > performance
> > > > > > > > > metrics of vhost and our polling and I/O scheduling
> mechanisms.
> > > The
> > > > > > > > > statistics are exposed using debugfs and can be easily
> > > displayed
> > > > > with a
> > > > > > >
> > > > > > > > > Python script (vhost_stat, based on the old kvm_stats)
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > ac14206ea56939ecc3608dc5f978b86fa322e7b0
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > 5. Add heuristics to improve I/O scheduling
> > > > > > > > > This patch enhances the round-robin mechanism with a set of
> > > > > heuristics
> > > > > > > to
> > > > > > > > > decide when to leave a virtqueue and proceed to the next.
> > > > > > > > > https://github.com/abelg/virtual_io_acceleration/commit/
> > > > > > > > f6a4f1a5d6b82dc754e8af8af327b8d0f043dc4d
> > > > > > > > >
> > > > > > > > > This patch improves the handling of the requests by the
> vhost
> > > > > thread,
> > > > > > > but
> > > > > > > > > could perhaps be delayed to a
> > > > > > > > > later time , and not submitted as one of the first Elvis
> > > patches.
> > > > > > > > > I'd love to hear some comments about whether this patch
> needs
> > > to be
> > > > > > > part
> > > > > > > > > of the first submission.
> > > > > > > > >
> > > > > > > > > Any other feedback on this plan will be appreciated,
> > > > > > > > > Thank you,
> > > > > > > > > Razya
> > > > > > > >
> > > > > >
> > > >
> > > >
> >
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux