On 10/11/2017 10:26 AM, Tetsuo Handa wrote: > Wei Wang wrote: >> On 10/10/2017 09:09 PM, Tetsuo Handa wrote: >>> Wei Wang wrote: >>>>> And even if we could remove balloon_lock, you still cannot use >>>>> __GFP_DIRECT_RECLAIM at xb_set_page(). I think you will need to use >>>>> "whether it is safe to wait" flag from >>>>> "[PATCH] virtio: avoid possible OOM lockup at virtballoon_oom_notify()" . >>>> Without the lock being held, why couldn't we use __GFP_DIRECT_RECLAIM at >>>> xb_set_page()? >>> Because of dependency shown below. >>> >>> leak_balloon() >>> xb_set_page() >>> xb_preload(GFP_KERNEL) >>> kmalloc(GFP_KERNEL) >>> __alloc_pages_may_oom() >>> Takes oom_lock >>> out_of_memory() >>> blocking_notifier_call_chain() >>> leak_balloon() >>> xb_set_page() >>> xb_preload(GFP_KERNEL) >>> kmalloc(GFP_KERNEL) >>> __alloc_pages_may_oom() >>> Fails to take oom_lock and loop forever >> __alloc_pages_may_oom() uses mutex_trylock(&oom_lock). > Yes. But this mutex_trylock(&oom_lock) is semantically mutex_lock(&oom_lock) > because __alloc_pages_slowpath() will continue looping until > mutex_trylock(&oom_lock) succeeds (or somebody releases memory). > >> I think the second __alloc_pages_may_oom() will not continue since the >> first one is in progress. > The second __alloc_pages_may_oom() will be called repeatedly because > __alloc_pages_slowpath() will continue looping (unless somebody releases > memory). > OK, I see, thanks. So, the point is that the OOM code path should not have memory allocation, and the old leak_balloon (without the F_SG feature) don't need xb_preload(). I think one solution would be to let the OOM uses the old leak_balloon() code path, and we can add one more parameter to leak_balloon to control that: leak_balloon(struct virtio_balloon *vb, size_t num, bool oom) >>> By the way, is xb_set_page() safe? >>> Sleeping in the kernel with preemption disabled is a bug, isn't it? >>> __radix_tree_preload() returns 0 with preemption disabled upon success. >>> xb_preload() disables preemption if __radix_tree_preload() fails. >>> Then, kmalloc() is called with preemption disabled, isn't it? >>> But xb_set_page() calls xb_preload(GFP_KERNEL) which might sleep with >>> preemption disabled. >> Yes, I think that should not be expected, thanks. >> >> I plan to change it like this: >> >> bool xb_preload(gfp_t gfp) >> { >> if (!this_cpu_read(ida_bitmap)) { >> struct ida_bitmap *bitmap = kmalloc(sizeof(*bitmap), gfp); >> >> if (!bitmap) >> return false; >> bitmap = this_cpu_cmpxchg(ida_bitmap, NULL, bitmap); >> kfree(bitmap); >> } > Excuse me, but you are allocating per-CPU memory when running CPU might > change at this line? What happens if running CPU has changed at this line? > Will it work even with new CPU's ida_bitmap == NULL ? > Yes, it will be detected in xb_set_bit(): when ida_bitmap = NULL on the new CPU, xb_set_bit() will return -EAGAIN to the caller, and the caller should restart from xb_preload(). Best, Wei