On Sat, 13 Jul 2019, Yang Shi wrote: > When running ltp's oom test with kmemleak enabled, the below warning was > triggerred since kernel detects __GFP_NOFAIL & ~__GFP_DIRECT_RECLAIM is > passed in: > > WARNING: CPU: 105 PID: 2138 at mm/page_alloc.c:4608 __alloc_pages_nodemask+0x1c31/0x1d50 > Modules linked in: loop dax_pmem dax_pmem_core > ip_tables x_tables xfs virtio_net net_failover virtio_blk failover > ata_generic virtio_pci virtio_ring virtio libata > CPU: 105 PID: 2138 Comm: oom01 Not tainted 5.2.0-next-20190710+ #7 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.2-0-g5f4c7b1-prebuilt.qemu-project.org 04/01/2014 > RIP: 0010:__alloc_pages_nodemask+0x1c31/0x1d50 > ... > kmemleak_alloc+0x4e/0xb0 > kmem_cache_alloc+0x2a7/0x3e0 > ? __kmalloc+0x1d6/0x470 > ? ___might_sleep+0x9c/0x170 > ? mempool_alloc+0x2b0/0x2b0 > mempool_alloc_slab+0x2d/0x40 > mempool_alloc+0x118/0x2b0 > ? __kasan_check_read+0x11/0x20 > ? mempool_resize+0x390/0x390 > ? lock_downgrade+0x3c0/0x3c0 > bio_alloc_bioset+0x19d/0x350 > ? __swap_duplicate+0x161/0x240 > ? bvec_alloc+0x1b0/0x1b0 > ? do_raw_spin_unlock+0xa8/0x140 > ? _raw_spin_unlock+0x27/0x40 > get_swap_bio+0x80/0x230 > ? __x64_sys_madvise+0x50/0x50 > ? end_swap_bio_read+0x310/0x310 > ? __kasan_check_read+0x11/0x20 > ? check_chain_key+0x24e/0x300 > ? bdev_write_page+0x55/0x130 > __swap_writepage+0x5ff/0xb20 > > The mempool_alloc_slab() clears __GFP_DIRECT_RECLAIM, kmemleak has > __GFP_NOFAIL set all the time due to commit > d9570ee3bd1d4f20ce63485f5ef05663866fe6c0 ("kmemleak: allow to coexist > with fault injection"). > It only clears __GFP_DIRECT_RECLAIM provisionally to see if the allocation would immediately succeed before falling back to the elements in the mempool. If that fails, and the mempool is empty, mempool_alloc() attempts the allocation with __GFP_DIRECT_RECLAIM. So for the problem described here, I think what we really want is this: diff --git a/mm/mempool.c b/mm/mempool.c --- a/mm/mempool.c +++ b/mm/mempool.c @@ -386,7 +386,7 @@ void *mempool_alloc(mempool_t *pool, gfp_t gfp_mask) gfp_mask |= __GFP_NORETRY; /* don't loop in __alloc_pages */ gfp_mask |= __GFP_NOWARN; /* failures are OK */ - gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO); + gfp_temp = gfp_mask & ~(__GFP_DIRECT_RECLAIM|__GFP_IO|__GFP_NOFAIL); repeat_alloc: But bio_alloc_bioset() plays with gfp_mask itself: are we sure that it isn't the one clearing __GFP_DIRECT_RECLAIM itself before falling back to saved_gfp? In other words do we also want this? diff --git a/block/bio.c b/block/bio.c --- a/block/bio.c +++ b/block/bio.c @@ -462,16 +462,16 @@ struct bio *bio_alloc_bioset(gfp_t gfp_mask, unsigned int nr_iovecs, * We solve this, and guarantee forward progress, with a rescuer * workqueue per bio_set. If we go to allocate and there are * bios on current->bio_list, we first try the allocation - * without __GFP_DIRECT_RECLAIM; if that fails, we punt those - * bios we would be blocking to the rescuer workqueue before - * we retry with the original gfp_flags. + * without __GFP_DIRECT_RECLAIM or __GFP_NOFAIL; if that fails, + * we punt those bios we would be blocking to the rescuer + * workqueue before we retry with the original gfp_flags. */ - if (current->bio_list && (!bio_list_empty(¤t->bio_list[0]) || !bio_list_empty(¤t->bio_list[1])) && bs->rescue_workqueue) - gfp_mask &= ~__GFP_DIRECT_RECLAIM; + gfp_mask &= ~(__GFP_DIRECT_RECLAIM | + __GFP_NOFAIL); p = mempool_alloc(&bs->bio_pool, gfp_mask); if (!p && gfp_mask != saved_gfp) {