On Wed, Oct 18, 2017 at 07:59:23PM +0900, Tetsuo Handa wrote: > Tetsuo Handa wrote: > > 20171016-deflate.log.xz continued printing "puff" messages without any OOM > > killer messages, for fill_balloon() always inflates faster than leak_balloon() > > deflates. > > > > Since the OOM killer cannot be invoked unless leak_balloon() completely > > deflates faster than fill_balloon() inflates, the guest remained unusable > > (e.g. unable to login via ssh) other than printing "puff" messages. > > This result was worse than 20171016-default.log.xz , for the system was > > not able to make any forward progress (i.e. complete OOM lockup). > > I tested further and found that it is not complete OOM lockup. > > It turned out that the reason of being unable to login via ssh was that fork() > was failing because __vm_enough_memory() was failing because > /proc/sys/vm/overcommit_memory was set to 0. Although virtio_balloon driver > was ready to release pages if asked via virtballoon_oom_notify() from > out_of_memory(), __vm_enough_memory() was not able to take such pages into > account. As a result, operations which need to use fork() were failing without > calling out_of_memory(). > ( http://lkml.kernel.org/r/201710181954.FHH51594.MtFOFLOQFSOHVJ@xxxxxxxxxxxxxxxxxxx ) > > Do you see anything wrong with the patch I used for emulating > VIRTIO_BALLOON_F_DEFLATE_ON_OOM path (shown below) ? > > ---------------------------------------- > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index f0b3a0b..a679ac2 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -164,7 +164,7 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > } > set_page_pfns(vb, vb->pfns + vb->num_pfns, page); > vb->num_pages += VIRTIO_BALLOON_PAGES_PER_PAGE; > - if (!virtio_has_feature(vb->vdev, > + if (virtio_has_feature(vb->vdev, > VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > adjust_managed_page_count(page, -1); > } > @@ -184,7 +184,7 @@ static void release_pages_balloon(struct virtio_balloon *vb, > struct page *page, *next; > > list_for_each_entry_safe(page, next, pages, lru) { > - if (!virtio_has_feature(vb->vdev, > + if (virtio_has_feature(vb->vdev, > VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > adjust_managed_page_count(page, 1); > list_del(&page->lru); > @@ -363,7 +363,7 @@ static int virtballoon_oom_notify(struct notifier_block *self, > unsigned num_freed_pages; > > vb = container_of(self, struct virtio_balloon, nb); > - if (!virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > + if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > return NOTIFY_OK; > > freed = parm; > ---------------------------------------- Looks right but it's probably easier to configure qemu to set that feature bit. Basically you just add deflate-on-oom=on to the balloon device. > > As I demonstrated above, VIRTIO_BALLOON_F_DEFLATE_ON_OOM can lead to complete > > OOM lockup because out_of_memory() => fill_balloon() => out_of_memory() => > > fill_balloon() sequence can effectively disable the OOM killer when the host > > assumed that it's safe to inflate the balloon to a large portion of guest > > memory and this won't cause an OOM situation. > > The other problem is that, although it is not complete OOM lockup, it is too > slow to wait if we hit out_of_memory() => fill_balloon() => out_of_memory() => > fill_balloon() sequence. > > > If leak_balloon() from out_of_memory() should be stronger than > > fill_balloon() from update_balloon_size_func(), we need to make > > sure that update_balloon_size_func() stops calling fill_balloon() > > when leak_balloon() was called from out_of_memory(). > > I tried below patch to reduce the possibility of hitting out_of_memory() => > fill_balloon() => out_of_memory() => fill_balloon() sequence. > > ---------------------------------------- > diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c > index a679ac2..9037fee 100644 > --- a/drivers/virtio/virtio_balloon.c > +++ b/drivers/virtio/virtio_balloon.c > @@ -57,7 +57,7 @@ struct virtio_balloon { > > /* The balloon servicing is delegated to a freezable workqueue. */ > struct work_struct update_balloon_stats_work; > - struct work_struct update_balloon_size_work; > + struct delayed_work update_balloon_size_work; > > /* Prevent updating balloon when it is being canceled. */ > spinlock_t stop_update_lock; > @@ -88,6 +88,7 @@ struct virtio_balloon { > > /* To register callback in oom notifier call chain */ > struct notifier_block nb; > + struct timer_list deflate_on_oom_timer; > }; > > static struct virtio_device_id id_table[] = { > @@ -141,7 +142,8 @@ static void set_page_pfns(struct virtio_balloon *vb, > page_to_balloon_pfn(page) + i); > } > > -static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > +static unsigned fill_balloon(struct virtio_balloon *vb, size_t num, > + unsigned long *delay) > { > struct balloon_dev_info *vb_dev_info = &vb->vb_dev_info; > unsigned num_allocated_pages; > @@ -152,14 +154,21 @@ static unsigned fill_balloon(struct virtio_balloon *vb, size_t num) > mutex_lock(&vb->balloon_lock); > for (vb->num_pfns = 0; vb->num_pfns < num; > vb->num_pfns += VIRTIO_BALLOON_PAGES_PER_PAGE) { > - struct page *page = balloon_page_enqueue(vb_dev_info); > + struct page *page; > + > + if (timer_pending(&vb->deflate_on_oom_timer)) { > + /* Wait for hold off timer expiracy. */ > + *delay = HZ; > + break; > + } > + page = balloon_page_enqueue(vb_dev_info); > > if (!page) { > dev_info_ratelimited(&vb->vdev->dev, > "Out of puff! Can't get %u pages\n", > VIRTIO_BALLOON_PAGES_PER_PAGE); > /* Sleep for at least 1/5 of a second before retry. */ > - msleep(200); > + *delay = HZ / 5; > break; > } > set_page_pfns(vb, vb->pfns + vb->num_pfns, page); > @@ -310,7 +319,8 @@ static void virtballoon_changed(struct virtio_device *vdev) > > spin_lock_irqsave(&vb->stop_update_lock, flags); > if (!vb->stop_update) > - queue_work(system_freezable_wq, &vb->update_balloon_size_work); > + queue_delayed_work(system_freezable_wq, > + &vb->update_balloon_size_work, 0); > spin_unlock_irqrestore(&vb->stop_update_lock, flags); > } > > @@ -366,9 +376,13 @@ static int virtballoon_oom_notify(struct notifier_block *self, > if (virtio_has_feature(vb->vdev, VIRTIO_BALLOON_F_DEFLATE_ON_OOM)) > return NOTIFY_OK; > > + /* Hold off fill_balloon() for 60 seconds. */ > + mod_timer(&vb->deflate_on_oom_timer, jiffies + 60 * HZ); > freed = parm; > num_freed_pages = leak_balloon(vb, oom_pages); > update_balloon_size(vb); > + dev_info_ratelimited(&vb->vdev->dev, "Released %u pages. Remains %u pages.\n", > + num_freed_pages, vb->num_pages); > *freed += num_freed_pages; > > return NOTIFY_OK; > @@ -387,19 +401,21 @@ static void update_balloon_size_func(struct work_struct *work) > { > struct virtio_balloon *vb; > s64 diff; > + unsigned long delay = 0; > > - vb = container_of(work, struct virtio_balloon, > + vb = container_of(to_delayed_work(work), struct virtio_balloon, > update_balloon_size_work); > diff = towards_target(vb); > > if (diff > 0) > - diff -= fill_balloon(vb, diff); > + diff -= fill_balloon(vb, diff, &delay); > else if (diff < 0) > diff += leak_balloon(vb, -diff); > update_balloon_size(vb); > > if (diff) > - queue_work(system_freezable_wq, work); > + queue_delayed_work(system_freezable_wq, to_delayed_work(work), > + delay); > } > > static int init_vqs(struct virtio_balloon *vb) > @@ -521,6 +537,10 @@ static struct dentry *balloon_mount(struct file_system_type *fs_type, > > #endif /* CONFIG_BALLOON_COMPACTION */ > > +static void timer_expired(unsigned long unused) > +{ > +} > + > static int virtballoon_probe(struct virtio_device *vdev) > { > struct virtio_balloon *vb; > @@ -539,7 +559,8 @@ static int virtballoon_probe(struct virtio_device *vdev) > } > > INIT_WORK(&vb->update_balloon_stats_work, update_balloon_stats_func); > - INIT_WORK(&vb->update_balloon_size_work, update_balloon_size_func); > + INIT_DELAYED_WORK(&vb->update_balloon_size_work, > + update_balloon_size_func); > spin_lock_init(&vb->stop_update_lock); > vb->stop_update = false; > vb->num_pages = 0; > @@ -553,6 +574,7 @@ static int virtballoon_probe(struct virtio_device *vdev) > if (err) > goto out_free_vb; > > + setup_timer(&vb->deflate_on_oom_timer, timer_expired, 0); > vb->nb.notifier_call = virtballoon_oom_notify; > vb->nb.priority = VIRTBALLOON_OOM_NOTIFY_PRIORITY; > err = register_oom_notifier(&vb->nb); > @@ -564,6 +586,7 @@ static int virtballoon_probe(struct virtio_device *vdev) > if (IS_ERR(balloon_mnt)) { > err = PTR_ERR(balloon_mnt); > unregister_oom_notifier(&vb->nb); > + del_timer_sync(&vb->deflate_on_oom_timer); > goto out_del_vqs; > } > > @@ -573,6 +596,7 @@ static int virtballoon_probe(struct virtio_device *vdev) > err = PTR_ERR(vb->vb_dev_info.inode); > kern_unmount(balloon_mnt); > unregister_oom_notifier(&vb->nb); > + del_timer_sync(&vb->deflate_on_oom_timer); > vb->vb_dev_info.inode = NULL; > goto out_del_vqs; > } > @@ -611,11 +635,12 @@ static void virtballoon_remove(struct virtio_device *vdev) > struct virtio_balloon *vb = vdev->priv; > > unregister_oom_notifier(&vb->nb); > + del_timer_sync(&vb->deflate_on_oom_timer); > > spin_lock_irq(&vb->stop_update_lock); > vb->stop_update = true; > spin_unlock_irq(&vb->stop_update_lock); > - cancel_work_sync(&vb->update_balloon_size_work); > + cancel_delayed_work_sync(&vb->update_balloon_size_work); > cancel_work_sync(&vb->update_balloon_stats_work); > > remove_common(vb); > ---------------------------------------- OK. Or if you use my patch, you can just set a flag and go if (vb->oom) msleep(1000); at beginning of fill_balloon. > While response was better than now, inflating again spoiled the effort. > Retrying to inflate until allocation fails is already too painful. > > Complete log is at http://I-love.SAKURA.ne.jp/tmp/20171018-deflate.log.xz . > ---------------------------------------- > [ 19.529096] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null) > [ 19.530721] kworker/0:2 cpuset=/ mems_allowed=0 > [ 19.531581] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302 > [ 19.532397] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 > [ 19.533285] Workqueue: events_freezable update_balloon_size_func [virtio_balloon] > [ 19.534143] Call Trace: > [ 19.535015] dump_stack+0x63/0x87 > [ 19.535844] warn_alloc+0x114/0x1c0 > [ 19.536667] __alloc_pages_slowpath+0x9a6/0xba7 > [ 19.537491] ? sched_clock_cpu+0x11/0xb0 > [ 19.538311] __alloc_pages_nodemask+0x26a/0x290 > [ 19.539188] alloc_pages_current+0x6a/0xb0 > [ 19.540004] balloon_page_enqueue+0x25/0xf0 > [ 19.540818] update_balloon_size_func+0xe1/0x260 [virtio_balloon] > [ 19.541626] process_one_work+0x149/0x360 > [ 19.542417] worker_thread+0x4d/0x3c0 > [ 19.543186] kthread+0x109/0x140 > [ 19.543930] ? rescuer_thread+0x380/0x380 > [ 19.544716] ? kthread_park+0x60/0x60 > [ 19.545426] ret_from_fork+0x25/0x30 > [ 19.546141] virtio_balloon virtio3: Out of puff! Can't get 1 pages > [ 19.547903] virtio_balloon virtio3: Released 256 pages. Remains 1984834 pages. > [ 19.659660] virtio_balloon virtio3: Released 256 pages. Remains 1984578 pages. > [ 21.891392] virtio_balloon virtio3: Released 256 pages. Remains 1984322 pages. > [ 21.894719] virtio_balloon virtio3: Released 256 pages. Remains 1984066 pages. > [ 22.490131] virtio_balloon virtio3: Released 256 pages. Remains 1983810 pages. > [ 31.939666] virtio_balloon virtio3: Released 256 pages. Remains 1983554 pages. > [ 95.524753] kworker/0:2: page allocation failure: order:0, mode:0x14310ca(GFP_HIGHUSER_MOVABLE|__GFP_NORETRY|__GFP_NOMEMALLOC), nodemask=(null) > [ 95.525641] kworker/0:2 cpuset=/ mems_allowed=0 > [ 95.526110] CPU: 0 PID: 111 Comm: kworker/0:2 Not tainted 4.14.0-rc5+ #302 > [ 95.526552] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011 > [ 95.527018] Workqueue: events_freezable update_balloon_size_func [virtio_balloon] > [ 95.527492] Call Trace: > [ 95.527969] dump_stack+0x63/0x87 > [ 95.528469] warn_alloc+0x114/0x1c0 > [ 95.528922] __alloc_pages_slowpath+0x9a6/0xba7 > [ 95.529388] ? qxl_image_free_objects+0x56/0x60 [qxl] > [ 95.529849] ? qxl_draw_opaque_fb+0x102/0x3a0 [qxl] > [ 95.530315] __alloc_pages_nodemask+0x26a/0x290 > [ 95.530777] alloc_pages_current+0x6a/0xb0 > [ 95.531243] balloon_page_enqueue+0x25/0xf0 > [ 95.531703] update_balloon_size_func+0xe1/0x260 [virtio_balloon] > [ 95.532180] process_one_work+0x149/0x360 > [ 95.532645] worker_thread+0x4d/0x3c0 > [ 95.533143] kthread+0x109/0x140 > [ 95.533622] ? rescuer_thread+0x380/0x380 > [ 95.534100] ? kthread_park+0x60/0x60 > [ 95.534568] ret_from_fork+0x25/0x30 > [ 95.535093] warn_alloc_show_mem: 1 callbacks suppressed > [ 95.535093] Mem-Info: > [ 95.536072] active_anon:11171 inactive_anon:2084 isolated_anon:0 > [ 95.536072] active_file:8 inactive_file:70 isolated_file:0 > [ 95.536072] unevictable:0 dirty:0 writeback:0 unstable:0 > [ 95.536072] slab_reclaimable:3554 slab_unreclaimable:6848 > [ 95.536072] mapped:588 shmem:2144 pagetables:749 bounce:0 > [ 95.536072] free:25859 free_pcp:72 free_cma:0 > [ 95.538922] Node 0 active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:280kB unevictable:0kB isolated(anon):0kB isolated(file):0kB mapped:2352kB dirty:0kB writeback:0kB shmem:8576kB shmem_thp: 0kB shmem_pmdmapped: 0kB anon_thp: 10240kB writeback_tmp:0kB unstable:0kB all_unreclaimable? no > [ 95.540516] Node 0 DMA free:15900kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:15992kB managed:15908kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > [ 95.542325] lowmem_reserve[]: 0 2954 7925 7925 7925 > [ 95.543020] Node 0 DMA32 free:44748kB min:25144kB low:31428kB high:37712kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB writepending:0kB present:3129308kB managed:3063740kB mlocked:0kB kernel_stack:0kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > [ 95.544949] lowmem_reserve[]: 0 0 4970 4970 4970 > [ 95.545624] Node 0 Normal free:42788kB min:42304kB low:52880kB high:63456kB active_anon:44684kB inactive_anon:8336kB active_file:32kB inactive_file:108kB unevictable:0kB writepending:0kB present:5242880kB managed:5093540kB mlocked:0kB kernel_stack:1984kB pagetables:2996kB bounce:0kB free_pcp:372kB local_pcp:220kB free_cma:0kB > [ 95.547739] lowmem_reserve[]: 0 0 0 0 0 > [ 95.548464] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15900kB > [ 95.549988] Node 0 DMA32: 3*4kB (UM) 4*8kB (UM) 2*16kB (U) 2*32kB (U) 1*64kB (U) 0*128kB 2*256kB (UM) 2*512kB (UM) 2*1024kB (UM) 2*2048kB (UM) 9*4096kB (M) = 44748kB > [ 95.551551] Node 0 Normal: 925*4kB (UME) 455*8kB (UME) 349*16kB (UME) 137*32kB (UME) 37*64kB (UME) 23*128kB (UME) 8*256kB (UME) 5*512kB (UM) 3*1024kB (UM) 6*2048kB (U) 0*4096kB = 42588kB > [ 95.553147] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB > [ 95.554030] 2224 total pagecache pages > [ 95.554874] 0 pages in swap cache > [ 95.555667] Swap cache stats: add 0, delete 0, find 0/0 > [ 95.556469] Free swap = 0kB > [ 95.557280] Total swap = 0kB > [ 95.558079] 2097045 pages RAM > [ 95.558856] 0 pages HighMem/MovableOnly > [ 95.559652] 53748 pages reserved > [ 95.560444] 0 pages cma reserved > [ 95.561262] 0 pages hwpoisoned > [ 95.562086] virtio_balloon virtio3: Out of puff! Can't get 1 pages > [ 95.565779] virtio_balloon virtio3: Released 256 pages. Remains 1984947 pages. > [ 96.265255] virtio_balloon virtio3: Released 256 pages. Remains 1984691 pages. > [ 105.498910] virtio_balloon virtio3: Released 256 pages. Remains 1984435 pages. > [ 105.500518] virtio_balloon virtio3: Released 256 pages. Remains 1984179 pages. > [ 105.520034] virtio_balloon virtio3: Released 256 pages. Remains 1983923 pages. > ---------------------------------------- > > Michael S. Tsirkin wrote: > > I think that's the case. Question is, when can we inflate again? > > I think that it is when the host explicitly asked again, for > VIRTIO_BALLOON_F_DEFLATE_ON_OOM path does not schedule for later inflation. Problem is host has no idea when it's safe. If we expect host to ask again after X seconds we might just as well do it in the guest. _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization