On 04/10/14 19:35, Michael S. Tsirkin wrote:
On Sat, Oct 04, 2014 at 12:38:24AM +0100, Eddie Chapman wrote:
Hi,
I've been regularly seeing on the 3.10 stable kernels the same problem as
reported by Romain Francoise here:
https://lkml.org/lkml/2013/1/23/492
An example from my setup is at the bottom of this mail. It's a problem as
qemu fails to run when it hits this, only solution is to do all qemu
launches with vhost=off after it happens. It starts happening after the
machine has been running for a while and after a few VMs have been started.
I guess that is the fragmentation issue as the machine is never under any
serious memory pressure when it happens.
I see this set of changes for 3.16 has a couple of fixes which appear to
address the problem:
https://lkml.org/lkml/2014/6/11/302
I was just wondering if there are any plans to backport these to 3.10, or
even if it is actually possible (I'm not a kernel dev so wouldn't know)?
If not, are there any other workarounds other than vhost=off?
thanks,
Eddie
Yes, these patches aren't hard to backport.
Go ahead and post the backport, I'll review and ack.
Thanks Michael,
Actually I just discovered that Dmitry Petuhov backported
23cc5a991c7a9fb7e6d6550e65cee4f4173111c5 ("vhost-net: extend device
allocation to vmalloc") last month to the Proxmox 3.10 kernel
https://www.mail-archive.com/pve-devel@xxxxxxxxxxxxxxx/msg08873.html
He appears to have tested it quite thoroughly himself with a heavy
workload, with no problems, though it hasn't gone into a Proxmox release
yet.
His patch applies to vanilla kernel.org 3.10.55 with only slight
fuzzines, so I've done some slight white space cleanup so it applies
cleanly. vanilla 3.10.55 compiles fine on my machine without any errors
or warnings with it. Is it OK (below)? Not sure it will meet stable
submission rules?
Dmitry also says that d04257b07f2362d4eb550952d5bf5f4241a8046d
("vhost-net: don't open-code kvfree") is not applicable in 3.10 because
there's no open-coded kvfree() function (this appears in v3.15-rc5).
Have added Dmitry to CC.
thanks,
Eddie
--- a/drivers/vhost/net.c 2014-10-05 15:34:12.282126999 +0100
+++ b/drivers/vhost/net.c 2014-10-05 15:34:15.862140883 +0100
@@ -18,6 +18,7 @@
#include <linux/rcupdate.h>
#include <linux/file.h>
#include <linux/slab.h>
+#include <linux/vmalloc.h>
#include <linux/net.h>
#include <linux/if_packet.h>
@@ -707,18 +708,30 @@
handle_rx(net);
}
+static void vhost_net_free(void *addr)
+{
+ if (is_vmalloc_addr(addr))
+ vfree(addr);
+ else
+ kfree(addr);
+}
+
static int vhost_net_open(struct inode *inode, struct file *f)
{
- struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL);
+ struct vhost_net *n;
struct vhost_dev *dev;
struct vhost_virtqueue **vqs;
int r, i;
- if (!n)
- return -ENOMEM;
+ n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT);
+ if (!n) {
+ n = vmalloc(sizeof *n);
+ if (!n)
+ return -ENOMEM;
+ }
vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL);
if (!vqs) {
- kfree(n);
+ vhost_net_free(n);
return -ENOMEM;
}
@@ -737,7 +750,7 @@
}
r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX);
if (r < 0) {
- kfree(n);
+ vhost_net_free(n);
kfree(vqs);
return r;
}
@@ -840,7 +853,7 @@
* since jobs can re-queue themselves. */
vhost_net_flush(n);
kfree(n->dev.vqs);
- kfree(n);
+ vhost_net_free(n);
return 0;
}
[1948751.794040] qemu-system-x86: page allocation failure: order:4,
mode:0x1040d0
[1948751.810341] CPU: 4 PID: 41198 Comm: qemu-system-x86 Not tainted
3.10.53-rc1 #3
[1948751.826846] Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS
S1200BT.86B.02.00.0041.120520121743 12/05/2012
[1948751.847285] 0000000000000004 ffff8802eaf3b9d8 ffffffff8162ff4d
ffff8802eaf3ba68
[1948751.864257] ffffffff810ab771 0000000000000001 ffff8802eaf3bb48
ffff8802eaf3ba68
[1948751.881209] ffffffff810abe68 ffffffff81ca2f40 ffffffff00000000
0000000200000040
[1948751.898276] Call Trace:
[1948751.909628] [<ffffffff8162ff4d>] dump_stack+0x19/0x1c
[1948751.924284] [<ffffffff810ab771>] warn_alloc_failed+0x111/0x126
[1948751.939774] [<ffffffff810abe68>] ?
__alloc_pages_direct_compact+0x181/0x198
[1948751.956650] [<ffffffff810ac5ae>] __alloc_pages_nodemask+0x72f/0x77c
[1948751.972853] [<ffffffff810ac676>] __get_free_pages+0x12/0x41
[1948751.988297] [<ffffffffa04ac71b>] vhost_net_open+0x23/0x171 [vhost_net]
[1948752.004938] [<ffffffff8130d6c3>] misc_open+0x119/0x17d
[1948752.020111] [<ffffffff810e99b4>] chrdev_open+0x134/0x155
[1948752.035604] [<ffffffff81053193>] ? lg_local_unlock+0x1e/0x31
[1948752.051436] [<ffffffff810e9880>] ? cdev_put+0x24/0x24
[1948752.066540] [<ffffffff810e46b8>] do_dentry_open+0x15c/0x20f
[1948752.082214] [<ffffffff810e484b>] finish_open+0x34/0x3f
[1948752.097234] [<ffffffff810f2737>] do_last+0x996/0xbcb
[1948752.111983] [<ffffffff810ef98e>] ? link_path_walk+0x5e/0x791
[1948752.127447] [<ffffffff810f0296>] ? path_init+0x11d/0x403
[1948752.142517] [<ffffffff810f2a32>] path_openat+0xc6/0x43b
[1948752.157207] [<ffffffff81070f08>] ? __lock_acquire+0x9ae/0xa4a
[1948752.172369] [<ffffffff815ac2ef>] ? rtnl_unlock+0x9/0xb
[1948752.186893] [<ffffffff810f2eac>] do_filp_open+0x38/0x84
[1948752.201503] [<ffffffff81633673>] ? _raw_spin_unlock+0x26/0x2a
[1948752.216719] [<ffffffff810fdfef>] ? __alloc_fd+0xf6/0x10a
[1948752.231521] [<ffffffff810e437c>] do_sys_open+0x114/0x1a6
[1948752.246396] [<ffffffff810e4438>] SyS_open+0x19/0x1b
[1948752.260709] [<ffffffff816341d2>] system_call_fastpath+0x16/0x1b
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html