On Sun, Oct 05, 2014 at 04:34:07PM +0100, Eddie Chapman wrote: > > On 04/10/14 19:35, Michael S. Tsirkin wrote: > >On Sat, Oct 04, 2014 at 12:38:24AM +0100, Eddie Chapman wrote: > >>Hi, > >> > >>I've been regularly seeing on the 3.10 stable kernels the same problem as > >>reported by Romain Francoise here: > >>https://lkml.org/lkml/2013/1/23/492 > >> > >>An example from my setup is at the bottom of this mail. It's a problem as > >>qemu fails to run when it hits this, only solution is to do all qemu > >>launches with vhost=off after it happens. It starts happening after the > >>machine has been running for a while and after a few VMs have been started. > >>I guess that is the fragmentation issue as the machine is never under any > >>serious memory pressure when it happens. > >> > >>I see this set of changes for 3.16 has a couple of fixes which appear to > >>address the problem: > >>https://lkml.org/lkml/2014/6/11/302 > >> > >>I was just wondering if there are any plans to backport these to 3.10, or > >>even if it is actually possible (I'm not a kernel dev so wouldn't know)? > >> > >>If not, are there any other workarounds other than vhost=off? > >> > >>thanks, > >>Eddie > > > >Yes, these patches aren't hard to backport. > >Go ahead and post the backport, I'll review and ack. > > Thanks Michael, > > Actually I just discovered that Dmitry Petuhov backported > 23cc5a991c7a9fb7e6d6550e65cee4f4173111c5 ("vhost-net: extend device > allocation to vmalloc") last month to the Proxmox 3.10 kernel > https://www.mail-archive.com/pve-devel@xxxxxxxxxxxxxxx/msg08873.html > > He appears to have tested it quite thoroughly himself with a heavy workload, > with no problems, though it hasn't gone into a Proxmox release yet. > > His patch applies to vanilla kernel.org 3.10.55 with only slight fuzzines, > so I've done some slight white space cleanup so it applies cleanly. vanilla > 3.10.55 compiles fine on my machine without any errors or warnings with it. > Is it OK (below)? Not sure it will meet stable submission rules? OK but pls cleanup indentation, it's all scrambled. You'll also need to add proper attribution (using >From: header), your signature etc. > > Dmitry also says that d04257b07f2362d4eb550952d5bf5f4241a8046d ("vhost-net: > don't open-code kvfree") is not applicable in 3.10 because there's no > open-coded kvfree() function (this appears in v3.15-rc5). Yes that's just a cleanup, we don't do these in stable. > Have added Dmitry to CC. > > thanks, > Eddie > --- a/drivers/vhost/net.c 2014-10-05 15:34:12.282126999 +0100 > +++ b/drivers/vhost/net.c 2014-10-05 15:34:15.862140883 +0100 > @@ -18,6 +18,7 @@ > #include <linux/rcupdate.h> > #include <linux/file.h> > #include <linux/slab.h> > +#include <linux/vmalloc.h> > > #include <linux/net.h> > #include <linux/if_packet.h> > @@ -707,18 +708,30 @@ > handle_rx(net); > } > > +static void vhost_net_free(void *addr) > +{ > + if (is_vmalloc_addr(addr)) > + vfree(addr); > + else > + kfree(addr); > +} > + > static int vhost_net_open(struct inode *inode, struct file *f) > { > - struct vhost_net *n = kmalloc(sizeof *n, GFP_KERNEL); > + struct vhost_net *n; > struct vhost_dev *dev; > struct vhost_virtqueue **vqs; > int r, i; > > - if (!n) > - return -ENOMEM; > + n = kmalloc(sizeof *n, GFP_KERNEL | __GFP_NOWARN | __GFP_REPEAT); > + if (!n) { > + n = vmalloc(sizeof *n); > + if (!n) > + return -ENOMEM; > + } > vqs = kmalloc(VHOST_NET_VQ_MAX * sizeof(*vqs), GFP_KERNEL); > if (!vqs) { > - kfree(n); > + vhost_net_free(n); > return -ENOMEM; > } > > @@ -737,7 +750,7 @@ > } > r = vhost_dev_init(dev, vqs, VHOST_NET_VQ_MAX); > if (r < 0) { > - kfree(n); > + vhost_net_free(n); > kfree(vqs); > return r; > } > @@ -840,7 +853,7 @@ > * since jobs can re-queue themselves. */ > vhost_net_flush(n); > kfree(n->dev.vqs); > - kfree(n); > + vhost_net_free(n); > return 0; > } > > > > > > > >>[1948751.794040] qemu-system-x86: page allocation failure: order:4, > >>mode:0x1040d0 > >>[1948751.810341] CPU: 4 PID: 41198 Comm: qemu-system-x86 Not tainted > >>3.10.53-rc1 #3 > >>[1948751.826846] Hardware name: Intel Corporation S1200BTL/S1200BTL, BIOS > >>S1200BT.86B.02.00.0041.120520121743 12/05/2012 > >>[1948751.847285] 0000000000000004 ffff8802eaf3b9d8 ffffffff8162ff4d > >>ffff8802eaf3ba68 > >>[1948751.864257] ffffffff810ab771 0000000000000001 ffff8802eaf3bb48 > >>ffff8802eaf3ba68 > >>[1948751.881209] ffffffff810abe68 ffffffff81ca2f40 ffffffff00000000 > >>0000000200000040 > >>[1948751.898276] Call Trace: > >>[1948751.909628] [<ffffffff8162ff4d>] dump_stack+0x19/0x1c > >>[1948751.924284] [<ffffffff810ab771>] warn_alloc_failed+0x111/0x126 > >>[1948751.939774] [<ffffffff810abe68>] ? > >>__alloc_pages_direct_compact+0x181/0x198 > >>[1948751.956650] [<ffffffff810ac5ae>] __alloc_pages_nodemask+0x72f/0x77c > >>[1948751.972853] [<ffffffff810ac676>] __get_free_pages+0x12/0x41 > >>[1948751.988297] [<ffffffffa04ac71b>] vhost_net_open+0x23/0x171 [vhost_net] > >>[1948752.004938] [<ffffffff8130d6c3>] misc_open+0x119/0x17d > >>[1948752.020111] [<ffffffff810e99b4>] chrdev_open+0x134/0x155 > >>[1948752.035604] [<ffffffff81053193>] ? lg_local_unlock+0x1e/0x31 > >>[1948752.051436] [<ffffffff810e9880>] ? cdev_put+0x24/0x24 > >>[1948752.066540] [<ffffffff810e46b8>] do_dentry_open+0x15c/0x20f > >>[1948752.082214] [<ffffffff810e484b>] finish_open+0x34/0x3f > >>[1948752.097234] [<ffffffff810f2737>] do_last+0x996/0xbcb > >>[1948752.111983] [<ffffffff810ef98e>] ? link_path_walk+0x5e/0x791 > >>[1948752.127447] [<ffffffff810f0296>] ? path_init+0x11d/0x403 > >>[1948752.142517] [<ffffffff810f2a32>] path_openat+0xc6/0x43b > >>[1948752.157207] [<ffffffff81070f08>] ? __lock_acquire+0x9ae/0xa4a > >>[1948752.172369] [<ffffffff815ac2ef>] ? rtnl_unlock+0x9/0xb > >>[1948752.186893] [<ffffffff810f2eac>] do_filp_open+0x38/0x84 > >>[1948752.201503] [<ffffffff81633673>] ? _raw_spin_unlock+0x26/0x2a > >>[1948752.216719] [<ffffffff810fdfef>] ? __alloc_fd+0xf6/0x10a > >>[1948752.231521] [<ffffffff810e437c>] do_sys_open+0x114/0x1a6 > >>[1948752.246396] [<ffffffff810e4438>] SyS_open+0x19/0x1b > >>[1948752.260709] [<ffffffff816341d2>] system_call_fastpath+0x16/0x1b > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html