Re: [PATCH] KVM virtio balloon driver

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 15, 2008 at 10:32:08AM +1100, Rusty Russell wrote:
> On Tuesday 15 January 2008 07:03:57 Marcelo Tosatti wrote:
> > Hi Rusty,
> >
> > It was agreed that the balloon driver should be merged through the
> > virtio tree, so here it goes. It depends on the config_changed patch
> > posted earlier.
> 
> Hi Marcelo,
> 
>     Excellent!  Although the main user will be kvm, it'd be nice to have a 
> demonstration in-tree using lguest; any chance of you conjuring up an 
> appropriate patch?
> 
> If not, I can whip something up.
> 
> > +config KVM_BALLOON
> > +	tristate "KVM balloon driver (EXPERIMENTAL)"
> > +	depends on VIRTIO_PCI
> > +	---help---
> > +	 This driver provides support for ballooning memory in/out of a
> > +	 KVM paravirt guest.
> > +
> > +	 If unsure, say M.
> 
> Please don't define "balloon" in terms of "ballooning".  How about "This 
> driver supports increasing and decreasing the amount of memory within a KVM 
> guest." ?
> 
> > +	uint32_t target_nrpages;
> 
> I prefer u32 within the kernel, but no big deal.
> 
> > +static int send_balloon_buf(struct virtballoon *v, uint8_t cmd,
> > +			    struct balloon_buf *buf)
> > +{
> > +	struct scatterlist sg[VIRTIO_MAX_SG];
> > +	int err = 0;
> > +
> > +	buf->hdr.cmd = cmd;
> > +
> > +	sg_init_table(sg, VIRTIO_MAX_SG);
> > +	sg_set_buf(&sg[0], &buf->hdr, sizeof(buf->hdr));
> > +	sg_set_buf(&sg[1], &buf->data, sizeof(buf->data));
> 
> Since these are adjacent, can't you just combine them into one sg element?  Or 
> does the kvm code rely on the sg as a boundary between header and data?
> 
> > +static int kvm_balloon_inflate(struct virtballoon *v, int32_t npages)
> > +{
> > +	LIST_HEAD(tmp_list);
> > +	struct page *page, *tmp;
> > +	struct balloon_buf *buf;
> > +	u32 *pfn;
> > +	int allocated = 0;
> > +	int i, r = -ENOMEM;
> > +
> > +	buf = alloc_balloon_buf(v->vdev, GFP_KERNEL);
> > +	if (!buf)
> > +		return r;
> > +
> > +	pfn = (u32 *)&buf->data;
> > +	*pfn++ = (u32)npages;
> 
> OK, this seems strange.  You always use the data portion as an array of u32s, 
> yet you declare it as char[200].  You have a perfectly good header, but you 
> put the number of pages in the first element of the data array: and you hand 
> the entire data array even though you normally only use part of it.
> 
> You can intuit the number from the sg len, I suggest you do that instead.
> 
> Looking at this driver, I just don't think this needs to be so complicated: 
> it's not a high speed device, after all.  Perhaps allocate one buffer up 
> front, and just reuse that.  One simple thread loop, less locking needed.
> 
> > +	for (i = 0; i < npages; i++) {
> > +		page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY);
> > +		if (!page)
> > +			goto out_free;
> > +		list_add(&page->lru, &tmp_list);
> > +		allocated++;
> > +		*pfn = page_to_pfn(page);
> > +		pfn++;
> > +	}
> 
> I think it might be simpler to defer adding to any list until after the 
> callback is done?  Helpers to add an array to the balloon array (on 
> successful inflate, or unsuccessful deflate) and to free them (unsuccessful 
> inflate, or successful deflate) would also make the code more readable.
> 
> > +	gfns_per_buf = MAX_BALLOON_PAGES_PER_OP;
> > +
> > +	/*
> > +	 * Call the balloon in PAGE_SIZE*pfns-per-buf
> > +	 * iterations
> > +	 */
> > +	iterations = DIV_ROUND_UP(abspages, gfns_per_buf);
> > +	dprintk(&v->vdev->dev, "%s: iterations=%d\n", __func__, iterations);
> > +
> > +	for (i = 0; i < iterations; i++) {
> 
> This logic seems overly complex.  How about in the main thread:
> 
> 	/* We're woken when target changes, in config_changed() */
> 	if (wait_event_interruptible(&v->wq,
> 		(diff = atomic_read(&v->target_pages) - total_pages)) == 0) {
> 
> 		/* If we submit inflate/deflate request, wait for it to finish. */
> 		if (xflate(v, diff) == 0)
> 			wait_for_completion(&v->complete);
> 	}
> 
> xflate just does as much as it can (up to "diff"), and then the interrupt 
> handler adds/removes from the linked list, frees pages, etc and fires the 
> completion.
> 
> No need for locking, since there's only one pending request at any time.

OK, thats simpler. How about this:

[PATCH] Virtio balloon driver

Add a balloon driver for KVM, host<->guest communication is performed
via virtio.

Signed-off-by: Marcelo Tosatti <mtosatti@xxxxxxxxxx>


Index: linux-2.6-nv/drivers/virtio/Kconfig
===================================================================
--- linux-2.6-nv.orig/drivers/virtio/Kconfig
+++ linux-2.6-nv/drivers/virtio/Kconfig
@@ -23,3 +23,13 @@ config VIRTIO_PCI
 
 	  If unsure, say M.
 
+config VIRTIO_BALLOON
+	tristate "Virtio balloon driver (EXPERIMENTAL)"
+	select VIRTIO
+	select VIRTIO_RING
+	---help---
+	 This driver supports increasing and decreasing the amount 
+	 of memory within a KVM guest.
+
+	 If unsure, say M.
+
Index: linux-2.6-nv/drivers/virtio/Makefile
===================================================================
--- linux-2.6-nv.orig/drivers/virtio/Makefile
+++ linux-2.6-nv/drivers/virtio/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_VIRTIO) += virtio.o
 obj-$(CONFIG_VIRTIO_RING) += virtio_ring.o
 obj-$(CONFIG_VIRTIO_PCI) += virtio_pci.o
+obj-$(CONFIG_VIRTIO_BALLOON) += kvm_balloon.o
Index: linux-2.6-nv/drivers/virtio/kvm_balloon.c
===================================================================
--- /dev/null
+++ linux-2.6-nv/drivers/virtio/kvm_balloon.c
@@ -0,0 +1,404 @@
+/*
+ * KVM guest balloon driver
+ *
+ * Copyright (C) 2007, Qumranet, Inc., Dor Laor <dor.laor@xxxxxxxxxxxx>
+ * Copyright (C) 2007, Red Hat, Inc., Marcelo Tosatti <mtosatti@xxxxxxxxxx>
+ * 
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ */
+
+#define DEBUG
+#include <asm/uaccess.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/percpu.h>
+#include <linux/init.h>
+#include <linux/interrupt.h>
+#include <linux/mm.h>
+#include <linux/swap.h>
+#include <linux/wait.h>
+#include <linux/kthread.h>
+#include <linux/freezer.h>
+#include <linux/version.h>
+#include <linux/virtio.h>
+#include <linux/virtio_config.h>
+#include <linux/virtio_balloon.h>
+#include <linux/preempt.h>
+
+MODULE_AUTHOR ("Dor Laor");
+MODULE_DESCRIPTION ("Implements guest ballooning support");
+MODULE_LICENSE("GPL");
+MODULE_VERSION("1");
+
+static int kvm_balloon_debug;
+
+#define dprintk(dev, str...) if (kvm_balloon_debug) dev_dbg(dev, str)
+
+#define BALLOON_DATA_SIZE 200
+
+struct balloon_buf {
+	struct virtio_balloon_hdr hdr;
+	u8 data[BALLOON_DATA_SIZE];
+};
+
+#define VIRTIO_MAX_SG 2
+
+struct virtballoon {
+	struct virtio_device *vdev;
+	struct virtqueue *vq;
+	struct task_struct *balloon_thread;
+	struct balloon_buf buf;
+	wait_queue_head_t balloon_wait;
+	wait_queue_head_t rmmod_wait;
+	struct completion complete;
+	atomic_t target_nrpages;
+	int balloon_size;
+	struct list_head balloon_plist;
+	struct list_head list;
+};
+
+
+static int send_balloon_buf(struct virtballoon *v, u8 cmd,
+			    struct balloon_buf *buf, unsigned int data_len)
+{
+	struct scatterlist sg[VIRTIO_MAX_SG];
+	int err = 0;
+
+	buf->hdr.cmd = cmd;
+
+	sg_init_table(sg, VIRTIO_MAX_SG);
+	sg_set_buf(&sg[0], &buf->hdr, sizeof(buf->hdr));
+	sg_set_buf(&sg[1], &buf->data, data_len);
+
+	err = v->vq->vq_ops->add_buf(v->vq, sg, 0, 2, buf);
+	if (err) {
+		dev_printk(KERN_ERR, &v->vq->vdev->dev, "%s: add_buf err\n",
+			   __func__);
+		goto out;
+	}
+
+	v->vq->vq_ops->kick(v->vq);
+out:
+	return err;
+}
+
+static int kvm_balloon_inflate(struct virtballoon *v, int32_t npages)
+{
+	struct page *page;
+	struct balloon_buf *buf = &v->buf;
+	u32 *pfn;
+	int allocated = 0;
+	int i, r = -ENOMEM;
+
+	pfn = (u32 *)&buf->data;
+	for (i = 0; i < npages; i++) {
+		page = alloc_page(GFP_HIGHUSER | __GFP_NORETRY);
+		if (!page)
+			goto out_free;
+		allocated++;
+		*pfn = page_to_pfn(page);
+		pfn++;
+	}
+
+	r = send_balloon_buf(v, CMD_BALLOON_INFLATE, buf,
+			     allocated * sizeof(u32));
+	if (r)
+		goto out_free;
+
+	dprintk(&v->vdev->dev, "%s: current balloon size=%d\n", __func__,
+		v->balloon_size);
+	return r;
+
+out_free:
+	pfn = (u32 *)&buf->data;
+	for (i = 0; i < allocated; i++) 
+		__free_page(pfn_to_page(*pfn++));
+	return r;
+}
+
+static int kvm_balloon_deflate(struct virtballoon *v, int32_t npages)
+{
+	struct page *page;
+	struct balloon_buf *buf = &v->buf;
+	u32 *pfn;
+	int deallocated = 0;
+	int r = 0;
+
+	if (v->balloon_size < npages) {
+		dev_printk(KERN_INFO, &v->vdev->dev,
+			   "%s: balloon=%d with deflate rq=%d\n",
+			   __func__, v->balloon_size, npages);
+		npages = v->balloon_size;
+		if (!npages) {
+			r = -EINVAL;
+			goto out;
+		}
+	}
+
+	dprintk(&v->vdev->dev, "%s: current balloon size=%d\n", __func__,
+		v->balloon_size);
+
+	pfn = (u32 *)&buf->data;
+
+	list_for_each_entry(page, &v->balloon_plist, lru) {
+		*pfn++ = page_to_pfn(page);
+		if (++deallocated == npages)
+			break;
+	}
+
+	r = send_balloon_buf(v, CMD_BALLOON_DEFLATE, buf,
+			     deallocated * sizeof(u32));
+out:
+	return r;
+}
+
+#define MAX_BALLOON_PAGES_PER_OP (BALLOON_DATA_SIZE/sizeof(u32))
+#define MAX_BALLOON_XFLATE_OP 1000000
+
+static int kvm_balloon_xflate(struct virtballoon *v, int32_t npages)
+{
+	int r = -EINVAL;
+	int pages_in_iteration;
+	int abspages;
+	int maxpages = MAX_BALLOON_PAGES_PER_OP;
+
+	abspages = abs(npages);
+
+	if (abspages > MAX_BALLOON_XFLATE_OP) {
+		dev_printk(KERN_ERR, &v->vdev->dev,
+			   "%s: bad npages=%d\n", __func__, npages);
+		goto out;
+	}
+
+	dprintk(&v->vdev->dev, "%s: got %s, npages=%d\n", __func__,
+	       (npages > 0)? "inflate":"deflate", npages);
+
+	pages_in_iteration = min(maxpages, abspages);
+
+	if (npages > 0)
+		r = kvm_balloon_inflate(v, pages_in_iteration);
+	else
+		r = kvm_balloon_deflate(v, pages_in_iteration);
+
+out:
+	/* stop the thread in case of an error */
+	if (r)
+		atomic_set(&v->target_nrpages, totalram_pages);
+	return r;
+}
+
+static void free_page_array(struct balloon_buf *buf, unsigned int npages)
+{
+	struct page *page;
+	u32 *pfn = (u32 *)&buf->data;
+	int i;
+
+	for (i=0; i<npages; i++) { 
+		page = pfn_to_page(*pfn);
+		list_del_init(&page->lru);
+		 __free_page(page);
+		pfn++;
+	}
+}
+
+static void add_page_array(struct virtballoon *v, struct balloon_buf *buf,
+			   unsigned int npages)
+{
+	struct page *page;
+	u32 *pfn = (u32 *)&buf->data;
+	int i;
+
+	for (i=0; i<npages; i++) {
+		page = pfn_to_page(*pfn);
+		v->balloon_size++;
+		totalram_pages--;
+		list_add(&page->lru, &v->balloon_plist);
+		pfn++;
+	}
+}
+
+static void inflate_done(struct virtballoon *v, struct balloon_buf *buf,
+			 unsigned int npages)
+{
+	u8 status = buf->hdr.status;
+
+	/* inflate OK */
+	if (!status)
+		add_page_array(v, buf, npages);
+	else 
+		free_page_array(buf, npages);
+}
+
+static void deflate_done(struct virtballoon *v, struct balloon_buf *buf,
+			 unsigned int npages)
+{
+	u8 status = buf->hdr.status;
+
+	/* deflate OK, return pages to the system */
+	if (!status) {
+		free_page_array(buf, npages);
+		totalram_pages += npages;
+		v->balloon_size -= npages;
+	}
+	return;
+}
+
+static int balloon_thread(void *p)
+{
+	struct virtballoon *v = p;
+	DEFINE_WAIT(wait);
+
+	set_freezable();
+	while (!kthread_should_stop()) {
+		int diff, r;
+
+		try_to_freeze();
+
+		r = wait_event_interruptible(v->balloon_wait,
+					     (diff = totalram_pages -
+					      atomic_read(&v->target_nrpages))
+					     || kthread_should_stop());
+		if (r == 0)
+			if (kvm_balloon_xflate(v, diff) == 0)
+				wait_for_completion(&v->complete);
+
+		if (waitqueue_active(&v->rmmod_wait)) {
+			diff = totalram_pages - atomic_read(&v->target_nrpages);
+			if (!diff)
+				wake_up(&v->rmmod_wait);
+		}
+	}
+	return 0;
+}
+
+static bool balloon_tx_done(struct virtqueue *vq)
+{
+	struct balloon_buf *buf;
+	struct virtballoon *v = vq->vdev->priv;
+	unsigned int len;
+
+	buf = vq->vq_ops->get_buf(vq, &len);
+	if (buf) {
+		switch(buf->hdr.cmd) {
+		case CMD_BALLOON_DEFLATE:
+			deflate_done(v, buf, len/sizeof(u32));
+			break;
+		case CMD_BALLOON_INFLATE:
+			inflate_done(v, buf, len/sizeof(u32));
+			break;
+		default:
+			printk("%s: unknown cmd 0x%x\n", __func__,
+				buf->hdr.cmd);
+		}
+		complete(&v->complete);
+	}
+	return true;
+}
+
+static struct virtio_device_id id_table[] = {
+	{ VIRTIO_ID_BALLOON, VIRTIO_DEV_ANY_ID},
+	{ 0 },
+};
+
+static LIST_HEAD(balloon_devices);
+
+static int balloon_probe(struct virtio_device *vdev)
+{
+	int err = -EINVAL;
+	struct virtballoon *v;
+
+	v = kzalloc(GFP_KERNEL, sizeof(struct virtballoon));
+	if (!v)
+		return -ENOMEM;
+
+	v->vq = vdev->config->find_vq(vdev, 0, balloon_tx_done);
+	if (IS_ERR(v->vq))
+		goto out_free;
+	
+	v->vdev = vdev;
+
+	init_waitqueue_head(&v->balloon_wait);
+	init_waitqueue_head(&v->rmmod_wait);
+	INIT_LIST_HEAD(&v->balloon_plist);
+	INIT_LIST_HEAD(&v->list);
+	init_completion(&v->complete);
+	atomic_set(&v->target_nrpages, totalram_pages);
+
+	vdev->priv = v;
+
+	v->balloon_thread = kthread_run(balloon_thread, v, "kvm_balloond");
+	if (IS_ERR(v->balloon_thread)) 
+		goto out_free_vq;
+
+	list_add(&v->list, &balloon_devices);
+	dev_printk(KERN_INFO, &v->vdev->dev, "registered\n");
+
+	return 0;
+
+out_free_vq:
+	vdev->config->del_vq(v->vq);
+out_free:
+	kfree(v);
+	return err;
+}
+
+static void balloon_remove(struct virtio_device *vdev)
+{
+	struct virtballoon *v = vdev->priv;
+
+	kthread_stop(v->balloon_thread);
+	vdev->config->del_vq(v->vq);
+	list_del(&v->list);
+	kfree(v);
+}
+
+static void balloon_config_changed(struct virtio_device *vdev)
+{
+	struct virtballoon *v = vdev->priv;
+	u32 target_nrpages;
+
+	__virtio_config_val(v->vdev, 0, &target_nrpages);
+	atomic_set(&v->target_nrpages, target_nrpages);
+	wake_up(&v->balloon_wait);
+	dprintk(&vdev->dev, "%s\n", __func__);
+}
+
+static struct virtio_driver virtio_balloon = {
+	.driver.name =  KBUILD_MODNAME,
+	.driver.owner = THIS_MODULE,
+	.id_table =     id_table,
+	.probe =        balloon_probe,
+	.remove =	__devexit_p(balloon_remove),
+	.config_changed	= balloon_config_changed,
+};
+
+module_param(kvm_balloon_debug, int, 0);
+
+static int __init kvm_balloon_init(void)
+{
+	return register_virtio_driver(&virtio_balloon);
+}
+
+static void __exit kvm_balloon_exit(void)
+{
+	struct virtballoon *v;
+	
+	list_for_each_entry(v, &balloon_devices, list) { 
+		while (v->balloon_size) {
+			DEFINE_WAIT(wait);
+
+			atomic_add(v->balloon_size, &v->target_nrpages);
+			wake_up(&v->balloon_wait);
+			prepare_to_wait(&v->rmmod_wait, &wait,
+					TASK_INTERRUPTIBLE);
+			schedule_timeout(HZ*10);
+			finish_wait(&v->rmmod_wait, &wait);
+		}
+	}
+
+	unregister_virtio_driver(&virtio_balloon);
+}
+
+module_init(kvm_balloon_init);
+module_exit(kvm_balloon_exit);
Index: linux-2.6-nv/drivers/virtio/virtio_pci.c
===================================================================
--- linux-2.6-nv.orig/drivers/virtio/virtio_pci.c
+++ linux-2.6-nv/drivers/virtio/virtio_pci.c
@@ -67,6 +67,7 @@ static struct pci_device_id virtio_pci_i
 	{ 0x1AF4, 0x1000, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, /* Dummy entry */
 	{ 0x1AF4, 0x1001, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, /* Dummy entry */
 	{ 0x1AF4, 0x1002, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, /* Dummy entry */
+	{ 0x1AF4, 0x1003, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0 }, /* Balloon */
 	{ 0 },
 };
 
Index: linux-2.6-nv/include/linux/virtio_balloon.h
===================================================================
--- /dev/null
+++ linux-2.6-nv/include/linux/virtio_balloon.h
@@ -0,0 +1,20 @@
+#ifndef _LINUX_VIRTIO_BALLOON_H
+#define _LINUX_VIRTIO_BALLOON_H
+#include <linux/virtio_config.h>
+
+#define VIRTIO_ID_BALLOON 3
+
+#define CMD_BALLOON_INFLATE 0x1
+#define CMD_BALLOON_DEFLATE 0x2
+
+struct virtio_balloon_hdr {
+	__u8 cmd;
+	__u8 status;
+};
+
+struct virtio_balloon_config
+{
+	__u32 target_nrpages;
+};
+
+#endif /* _LINUX_VIRTIO_BALLOON_H */
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linux-foundation.org/mailman/listinfo/virtualization

[Index of Archives]     [KVM Development]     [Libvirt Development]     [Libvirt Users]     [CentOS Virtualization]     [Netdev]     [Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux